lucidluckylee / pace_2024 Goto Github PK

Pace Challenge 2024

License: MIT License

CMake 0.85% C++ 80.57% Python 8.21% Shell 0.06% Jupyter Notebook 10.31%

pace_2024's Introduction

PACE 2024 - Arcee - Student Submission

This is our submission to the Pace Challenge 2024.

This year’s challenge is about the one-sided crossing minimization problem (OCM). This problem involves arranging the nodes of a bipartite graph on two layers (typically horizontal), with one of the layers fixed, aiming to minimize the number of edge crossings. OCM is one of the basic building block used for drawing hierarchical graphs. It is NP-hard, even for trees, but admits good heuristics, can be constant-factor approximated and solved in FPT time. For an extended overview, see Chapter 13.5 of the Handbook of Graph Drawing. [Pace]

Contributors

Kimon Boehmer ([email protected])
Lukas Lee George ([email protected])
Fanny Hauser ([email protected])
Jesse Palarus ([email protected])

Requirements

CMake 3.12 or higher
a C++17 compiler (we used gcc/g++ 13.3.0)

Set Timelimit for the heuristic solver

Maybe it is useful to set a timelimit for the heuristic solver. You can do by editing the following line in the src/heuristic_solver/heuristic_solver.hpp file.

 public:
    explicit HeuristicSolver(std::chrono::milliseconds limit =
                                 std::chrono::milliseconds(1000 * 60 * 5 -
                                                           1000 * 15))
        : SolutionSolver(limit) {}

By default we set a timelimit of 4 miniutes and 45 seconds. You can change the value of 1000 * 60 * 5 - 1000 * 15 to any value you want.

Build

Build Heuristic Solver

mkdir build
cd build
cmake -DCMAKE_BUILD_TYPE=Release ..
make
./heuristic_solver < path/to/your/gr.file

Build Exact Solver and Parameterized solver

Requirements

Google OR-Tools (Highly recommended)

Ensure you have installed Google OR-Tools on your system. Take a look at the official binaries or build and install it from source. We have tested the code with Google OR-Tools version 9.10.

mkdir build
cd build
cmake -DCMAKE_BUILD_TYPE=Release -DENABLE_ILP_SOLVER=ON .. 
make
./feedback_edge_set_solver < path/to/your/gr.file

When everything is set up correctly, you should see somewhere the following output:

...
-- Check ILP Solver: ON
...

If your can not install Google OR-Tools on your system, you can use the following command to build the exact solver and parameterized solver (but it is much slower). Make sure to remove the build directory before rebuilding the project with ortools.

mkdir build
cd build
cmake -DCMAKE_BUILD_TYPE=Release  .. 
make
./feedback_edge_set_solver < path/to/your/gr.file

Submitting to Optil.io

Since Optil.io does not seems to build in Release mode, you can uncomment the following line in the CMakeLists.txt file to sumbit to Optil.io:

# Set the build type to Release (needed for optil.io)
#set(CMAKE_BUILD_TYPE Release)

Rebuilding the project

Ensure you have removed the build directory before rebuilding the project. For example, when activate/deactivate the ilp solver.

rm -rf build

pace_2024's People

Contributors

Stargazers

Watchers

pace_2024's Issues

A few suggestions for the code

I've spotted a few minor efficiency issues in the code, hopefully those can scrape a little bit of performance still :)
There is no obligation to implement any of them, they are just suggestions, and they might even be wrong because I don't fully understand all the code. If anything is unclear, I'm up for discussion :)
Here's the list:

RR1 will produce a ton of cache-misses because the two accesses to the crossing matrix are [a][b] and [b][a], which are unlikely to be in the same cache-line; it should be possible to do this more cache-friendly. Likewise, in RR3 and RRLarge, accesses to [a][b] will be WAY more frequent than accesses into [b][a], so make sure these accesses are cache-friendly.

pace_2024/src/data_reduction/data_reduction_rules.cpp

Line 18 in 72cab00

} else if (graph.crossing.matrix[b][a] == 0) {

On the subject of cache-misses, CrossingMatrix::comparable(a,b) checks in matrix[b][a] so be aware that chaining calls to comparable(a,b) will be more efficient when grouped by b, not by a.

pace_2024/src/pace_graph/crossing_matrix.cpp

Line 36 in 72cab00

return lt(a, b) || lt(b, a) || a == b;

Speaking of crossing matrices, it seems that you're restricting the matrix to only be used when there are 10.000 or less nodes, in which case it would be sufficient to store uint16_t in the matrix instead of ints to save a factor of 2 on the space.
vector<bool> is inefficiently implemented in STL because of iterator requirements (every bool occupies 32 or 64bits because iterators must point to valid (aligned) memory), I recommend std::bitset if the size is known or use boost's, or search the internet, or implement your own, or use mine (no guarantees, also needs a 2-3 header dependencies from the same repo and needs C++20 for concepts).

pace_2024/src/data_reduction/data_reduction_rules.cpp

Line 116 in 72cab00

std::vector<bool> already_deleted(graph.size_free, false);
In C++, a class with only public data members is called a struct

pace_2024/src/exact/feedback_edge_set_heuristic.hpp

Line 10 in 72cab00

class FeedbackEdgeHeuristicParameter {
It is recommended to use std::accumulate for this; in general, loops should be avoided if there is an STL algorithm that does it already, since those are highly engineered

pace_2024/src/exact/feedback_edge_set_heuristic.cpp

Line 5 in 72cab00

long cost = 0;
There is no need to declare a default-constructor, C++ will generate one for you (rule of zero).

pace_2024/src/exact/feedback_edge_set_solver.hpp

Line 16 in 72cab00

Circle() = default;
std::shared_ptr are inefficient when compared to raw pointers. If you get your ownership right, then you won't need shared pointers. Indeed, the owner of the Circles and the Edges is clearly the FeedbackEdgeInstance and it is not shared between the FeedbackEdgeInstance and its Edges. As long as no Edge or Circle lives longer than its FeedbackEdgeInstance (and they really should not), you'll be fine with FeedbackEdgeInstance storing a vector (unordered_set if you need to check containment) of unique_ptr<Edge> and replacing all other shared_ptr<Edge> with Edge* which has WAY less overhead (same with Circles). Note that I didn't check for side effects (do you extract a list of shared_ptr<Edge> from an instance that you're then destructing? In that case, keep the shared_ptr).

pace_2024/src/exact/feedback_edge_set_solver.hpp

Line 41 in 72cab00

std::vector<std::shared_ptr<Circle>> circles;
Did I get this right, that Edge has a numberOfCircles? Can this not be returned from Edge::circles::size()?

pace_2024/src/exact/feedback_edge_set_solver.hpp

Line 26 in 72cab00

int numberOfCircles = 0;
You're already using ranges when you say for(auto& e: edges), so you can also use std::ranges::sort for sorting

pace_2024/src/exact/feedback_edge_set_heuristic.cpp

Line 30 in 72cab00

std::sort(
You have a few functions that are way too long and should be split into smaller functions with descriptive names if possible

pace_2024/src/exact/feedback_edge_set_heuristic.cpp

Line 58 in 72cab00

metaRapsConstruction(FeedbackEdgeInstance &instance,
std::remove may take linear time in the vector length, are you sure you're not better off with an unordered_set from which you can remove in constant time?

pace_2024/src/exact/feedback_edge_set_heuristic.cpp

Line 159 in 72cab00

edges.erase(std::remove(edges.begin(), edges.end(), usedEdge),
I will make a seperate Issue for const-correctness, but here's one issue I saw:

pace_2024/src/heuristic_solver/greedy_insert_solver.cpp

Line 15 in 72cab00

for (int i = 1; i <= current_order.size(); ++i) {

for (int i = 1; i <= current_order.size(); ++i) -- if you don't tell the compiler that current_order is constant, it might not be able to optimize away the call to size() which will always return the same thing. If the compiler knew that this will not change, it will translate this into one call to size() in the beginning then then just use that value instead of calling size() each time in the loop.
This should not be done with vector::insert, not even with vector::push_back but with std::iota.

pace_2024/src/heuristic_solver/greedy_insert_solver.cpp

Line 40 in 72cab00

for (int i = 0; i < graph.size_free; ++i) {
vector::insert may take linear time in the vector length, it may have to re-allocate and copy everything after the insertion point. Is a std::vector even the right data-structure here or should this rather be a linked-list into which we can insert in constant time? Another possibility is using a std::deque because it has better memory management when it comes to inserting items in random positions. At least, allocate the correct size of the vector in the beginning of the function using vector::reserve to avoid re-allocations.

pace_2024/src/heuristic_solver/greedy_insert_solver.cpp

Line 63 in 72cab00

current_order.insert(current_order.begin() + best_position,
calculatingCrossingNumber is a function that will be called extensively. You should take care that the check for whether the matrix is initialized or not is not done all the time.
https://github.com/lucidLuckylee/pace_2024/blob/72cab00613e21c8a4372f41262458df7824560f5/src/pace_graph/pace_graph.cpp#L391C33-L391C58
Personally, I would template the class on whether or not the number of nodes is too big, so the check will be done only once in runtime. Also, you'd never have to call initilize_if_possible because the constructor would initialize if it was possible.
Here, the vector-initialization is a little strange.

pace_2024/src/heuristic_solver/mean_position_heuristic.cpp

Line 12 in 72cab00

std::vector<double> nodeOffset = std::vector<double>(graph.size_free);
There are checks here that don't depend on the loop variable, so they should be done outside the loop.

pace_2024/src/heuristic_solver/mean_position_heuristic.cpp

Line 15 in 72cab00

if (meanPositionParameter.useJittering && iteration != 0) {
Calling the vector constructor in the for-loop causes a memory allocation on each iteration; it's better to re-use the same vector and clear it via vector::clear which is much faster than destruction + construction.

pace_2024/src/heuristic_solver/mean_position_heuristic.cpp

Line 25 in 72cab00

std::vector<double> newNodeOffset = std::vector<double>(nodeOffset);
If you're not changing your parameters for the local search, it would be better to replace the run-time if-checks with compile-time if constexpr checks.

pace_2024/src/heuristic_solver/mean_position_heuristic.cpp

Line 34 in 72cab00

switch (meanPositionParameter.meanType) {
In the time-critical part of jittering, you're actually copying the neighborhood-vector of the node i instead of just getting a reference to it (actually, you should get a const-reference to it, since you're not going to modify the neighbors).

pace_2024/src/heuristic_solver/mean_position_heuristic.cpp

Line 30 in 72cab00

auto neighbors_of_node = graph.neighbors_free[i];

same issue here

pace_2024/src/heuristic_solver/mean_position_heuristic.cpp

Line 50 in 72cab00

auto crossing_matrix_for_i = graph.crossing.matrix[i];

and probably on multiple more occasions.
It might be worth thinking about whether or not the crossing matrix caches the sum of each row seperately, so you wouldn't have to re-compute it every time here

pace_2024/src/heuristic_solver/mean_position_heuristic.cpp

Line 50 in 72cab00

auto crossing_matrix_for_i = graph.crossing.matrix[i];
maybe use structured bindings here, as you have before (auto [x, y] = ...):

pace_2024/src/heuristic_solver/mean_position_heuristic.cpp

Line 102 in 72cab00

std::tie(crossing_matrix_u_v, crossing_matrix_v_u) =
Maybe the class Order does not need to translate from vertices to positions.
The translation is used here:

pace_2024/src/heuristic_solver/local_search.cpp

Line 16 in 72cab00

int posOfV = order.get_position(v);

and

pace_2024/src/heuristic_solver/heuristic_solver.cpp

Line 54 in 72cab00

int posOfV = bestOrder.get_position(v);

where I think it can be made obsolete by iterating through order instead of the free vertices of the graph
However, I'm not so sure that it can be avoided here:

pace_2024/src/exact/feedback_edge_set_heuristic.cpp

Line 173 in 72cab00

bool usedInGlobalUB = instance.globalUBOrder.get_position(e->end) <

I don't think there are other calls to the translation.
Why not just break here if i == 0 ?

pace_2024/src/heuristic_solver/local_search.cpp

Line 151 in 72cab00

if (i == 0) {

const-correctness

Again, just a few hints for increasing performance and correctness of code, there is no obligation whatsoever to do this.

I noticed that your code is very parsimonious with the keyword const, which is a pity because, by expressing that something is constant, you not only help the compiler optimize things (see the other issue), but you also help your future self understand an intent behind a variable or a method (in case you didn't know: methods can be annotated as const stating that they will not change the member variables; this allows calling const-methods of objects that were declared const).

There are other pitfalls: if you call operator[] on an std::unordered_map with an argument that is not in the map, then an entry in the map is created (this is called "unintentional use of brackets-operator"). If you call operator[] on a const std::unordered_map, this will instead throw an exception if the argument is not found in the map which would be way better if you didn't intend to create a new entry in the map.

Here's a great talk by Jason Turner on how using const helps avoid "Code Smells" in C++: https://www.youtube.com/watch?v=f_tLQl0wLUM

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.

Jobs

Jooble