pratikvn / schwarz-lib Goto Github PK

View Code? Open in Web Editor NEW

5.0 5.0 3.0 27.41 MB

Repository for testing asynchronous schwarz methods.

Home Page: https://pratikvn.github.io/schwarz-lib/

License: BSD 3-Clause "New" or "Revised" License

CMake 6.99% C++ 87.65% Cuda 2.48% Shell 2.32% MATLAB 0.56%

asynchronous cuda schwarz ginkgo domain-decomposition

schwarz-lib's Introduction

schwarz-lib's People

Contributors

Stargazers

Watchers

Forkers

jiacailu soumyadipghosh mfkiwl

schwarz-lib's Issues

Unable to commit due to git-cmake

I compiled the entire library again after switching off DSCHWARZ_DEVEL_TOOLS. Now I get the following error when I want to commit:
can't open file '/afs/crc.nd.edu/user/s/sghosh2/Public/schwarz-lib/build/third_party/git-cmake-format/src/git-cmake-format.py': [Errno 2] No such file or directory

Plot the residual over iterations for every PE

This would be useful to study convergence. There can be a struct similar to comm_data_struct which stores the residuals over iterations for every PE. After the solver converges, it can be printed to a file. A python script can then be used to plot data.

Add a benchmarking problem with deal.ii matrix generation.

Currently, only the 5 point laplacian matrix problem is available as a benchmarking test case. Adding problems from deal.ii should be an easy extension. See deal.ii/examples and the corresponding documentation for possible examples to use.

Oscillation of solution values

The average of the boundary values is oscillating with iterative solver and one-sided. Few things to check:

What happens if direct solver is used?
What happens for a surface plot with the entire boundary?
What happens when two-sided is used?

Does not converge with -N flag in mpirun

The following run converges:

mpirun -np 8 ./benchmarking/bench_ras --executor=reference --num_iters=2000 --explicit_laplacian --set_1d_laplacian_size=128 --set_tol=1e-6 --local_tol=1e-12 --partition=regular --local_solver=iterative-ginkgo --enable_onesided --enable_flush=flush-local --write_comm_data --timings_file=subd

But the following doesn't:

mpirun -N 1 -np 8 ./benchmarking/bench_ras --executor=reference --num_iters=2000 --explicit_laplacian --set_1d_laplacian_size=128 --set_tol=1e-6 --local_tol=1e-12 --partition=regular --local_solver=iterative-ginkgo --enable_onesided --enable_flush=flush-local --write_comm_data --timings_file=subd

I am reserving appropriate number of PEs in the 2nd case, i.e., num_pe_pernode * 8. I am using Open MPI and Boost compiled with gcc/8.3.0.

Adding RHS corresponding to a source to the solver

Right now, the RHS in the solution is a vector of 1's or a random vector. There should be a way to add a user-defined source , for e.g., an electric dipole.

Installation

@pratikvn Few queries about installation:

How do I get gingko ? Do I just clone the gingko repo? Also in which file do I set the Gingko_DIR variable ?
For MPI, I usually do a module load $mpi version$ in our cluster. Will that work or do I need to set a path somewhere?
For Boost, similar question as above ?
For Metis, similar question as above ?

Add the Optimized Restricted Additive Schwarz solver.

The optimized Schwarz solver accelerates convergence compared to the RAS method.

Challenges:

Choosing alpha (Robin boundary condition scaling parameter) (an optimal value for the synchronous version exists.)
Extension to generic problems (Open problem)

Move temporary vectors into work vectors to reduce memory footprint.

For simplicity, a lot of temporary vectors are allocated and deallocated instead of using work vectors. Using work vectors should reduce the memory footprint, allow for larger problems and improve performance.

Solve generic problems with UMFPACK(LU)+ginkgo trs

Currently only SPD matrices are solvable. Addition of factorization using UMFPACK LU with Ginkgo TRS should not be difficult to add.

Understanding the code

To start understanding the code, can you give a rough idea of the sequence in which I should look at the files in /source ? A brief description of what each file does would also be nice.

Also, is there a brief tutorial of gingko somewhere ? I found one but that looks incomplete.

Threshold selection for Event-Triggered Communication

Till now, I was using a threshold of the form alpha * beta^k where alpha is any constant, beta is a constant between 0 and 1 and k is the iteration number. This threshold decreases with iterations and goes to zero asymptotically. When an event for communication is not triggered at the sender, the receiver keeps using the last communicated values for its own local solve. I think there are two problems with this scheme (seen through experiments) :

The threshold is dependent on time (iterations) but it is not dependent on space. In other words, it decreases with iterations but every PE uses the same threshold during a particular iteration regardless of its location in the domain. This is not desired since the rate of change of boundary values in a PE will be different from another PE depending on the initial conditions. Therefore threshold has to be made space dependent.
When the receiver does not receive a new value because an event for communication was not triggered in the corresponding sender, using the last communicated value for its own local solve at the receiver will lead to the same local solution again (the "local" boundary conditions for the receiver stays the same and hence it converges to the same solution). As a result, the receiver's boundary values don't change and hence an event for communication is not triggered unless fresh values come to it from neighbors. This situation often leads to a "communication deadlock" (a processor does not trigger communication unless it receives new values, the same holds for its neighbors and so on). A possible solution is to extrapolate the ghost cells at the receiver using last received values so that the local solve yields a different local solution, leading to change in its boundary values which may trigger a communication.

Please comment what you think.

Fix push one by one option

Running bench_ras with enable_push_one_by_one flag currently leads to segmentation fault.

Add a testing framework.

As the project gets larger, I think it is necessary that a unit test/ integration test framework be added.

There are a lot of choices, but the Catch2 framework looks very promising.

Print the final solution

A way to visualize the final solution in the domain would be helpful. My suggestion would be to write the final solution to a file aand then use matplotlib in a python script to do the plotting as a heatmap.

Event-based communication

For doing event-based communication, we need a variable called boundary_solution in addition to local_solution as defined in schwarz_solver.hpp. I am wondering what would be the datatype of that variable. For a 2D problem, each PE has 4 boundaries, each of which is a vector - so making it a gko::matrix::Dense is okay ?

Confusion with the solution variables

In the run function here, there is a solution that is taken as argument. Then why is another variable global_solution is being created inside the function ?

Also, the variables local_solution and global_solution are being interchanged sometimes, leading to confusion. For e.g., in exchange_boundary here, exchange_boundary_onesided is called using global_solution but in the function definition here, local_solution is used. There might be other places where this has occurred too.

dealii installation

In order to install dealii, do I simply follow the steps here? Do I have to add any options for configuring with Ginkgo or anything else? Also, the docs specify compiling dealii in parallel, is that necessary?

Add flags for setting event parameters

I am planning to choose a threshold of the form alpha * beta ^ k for event-based communication where k is the iteration number. To try out different values of alpha and beta, it would be helpful to get them in runtime through flags.

Multi-threading and core binding.

Use pthreads/ std::threads to use one MPI rank per process and multiple threads per rank and one subdomain per thread.

Use HWLOC functions to bind each thread to a specific CPU core /GPU.

UMFPACK installation

Compiling now asks for UMFPACK directory. Is there a way to prevent that or do I have to install SuiteSparse ?

Regular2D partition crashes on more than 4 PEs

Command executed on develop branch (from Feb 25):

mpirun -np 8 ./benchmarking/bench_ras --num_iters=1000 --explicit_laplacian --set_1d_laplacian_size=64 --set_tol=1e-6 --local_tol=1e-12 --partition=regular2d --local_solver=iterative_ginkgo --enable_onesided --enable_flush=flush_local

Error: Segmentation fault

Two sided communication fails to converge

The following execution after compiling with develop branch does not converge in 500 iterations:

mpirun -np 4 ./benchmarking/bench_ras --num_iters=500 --explicit_laplacian --set_1d_laplacian_size=64 --set_tol=1e-6 --local_tol=1e-12 --partition=regular2d --local_solver=iterative_ginkgo --enable_twosided

However, when the twosided is changed to onesided, it converges!

Rethink the usage of certain variables

Some variables like temp_loc_solution and global_old_solution in schwarz_solver.cpp can be removed or optimized.

enable_twosided can invoke the functionality of enable_global_check by default

In continuation of the issue in #27, I think it would be better if the user does not need to provide the --enable_global_check flag when --enable_two_sided is provided. I believe this flag does not need to be specified for one-sided. So it might be inconvenient to write generic scripts for submitting jobs for both one-sided and two-sided.

Remove dependence on Boost

Since only one header file is used from boost, is it possible to copy that into the project and remove the dependence on Boost? We have an old version of Boost compiled with gcc/4.8.5 and I am not sure if it is producing correct results.

Add reordering for the local direct solvers.

Currently, the local direct solvers on the GPU do not re-order the local system matrices before computing the local factorization for the matrices.

Expected benefits

Accelerated solution.
Can possibly solve much larger problems.

Error with METIS linking

I am getting the following error while compiling : cannot find -lmetis.

I think the error is here. That condition should be opposite.

pratikvn / schwarz-lib Goto Github PK

schwarz-lib's Introduction

Schwarz Library

Performance results

Required components

Quick Install

Building Schwarz-Lib

Currently implemented features

Known Issues

schwarz-lib's People

Contributors

Stargazers

Watchers

Forkers

schwarz-lib's Issues

Recommend Projects

Recommend Topics

Recommend Org

Jobs