GithubHelp home page GithubHelp logo

pratikvn / schwarz-lib Goto Github PK

View Code? Open in Web Editor NEW
5.0 5.0 3.0 27.41 MB

Repository for testing asynchronous schwarz methods.

Home Page: https://pratikvn.github.io/schwarz-lib/

License: BSD 3-Clause "New" or "Revised" License

CMake 6.99% C++ 87.65% Cuda 2.48% Shell 2.32% MATLAB 0.56%
asynchronous cuda schwarz ginkgo domain-decomposition

schwarz-lib's Introduction

Schwarz Library

Build status Documentation

Performance results

  1. Paper in IJHPCA; Alternative arXiv version
  2. Two stage update

Required components

The required components include:

  1. Ginkgo: The Ginkgo library is needed. It needs to be installed and preferably the installation path provided as an environment variable in Ginkgo_DIR variable.
  2. MPI: As multiple nodes and a domain decomposition is used, an MPI implementation is necessary.

Quick Install

Building Schwarz-Lib

To build Schwarz-Lib, you can use the standard CMake procedure.

mkdir build; cd build
cmake -G "Unix Makefiles" .. && make

By default, SCHWARZ_BUILD_BENCHMARKING is enabled. This allows you to quickly run an example with the timings if needed. For a detailed list of options available see the Benchmarking page.

For more CMake options please refer to the Installation page

Currently implemented features

  1. Executor paradigm:
  • GPU.
  • OpenMP.
  • Single rank per node and threading in one node.
  1. Factorization paradigm:
  • CHOLMOD.
  • UMFPACK.
  1. Solving paradigm:
  • Direct:
  • Ginkgo.
  • CHOLMOD.
  • UMFPACK.
  • Iterative:
  • Ginkgo.
  • deal.ii.
  1. Partitioning paradigm:
  • METIS.
  • Regular, 1D.
  • Regular, 2D.
  • Zoltan.
  1. Convergence check:
  • Centralized, tree based convergence (Yamazaki 2019).
  • Decentralized, leader election based (Bahi 2005).
  1. Communication paradigm.
  • Onesided.
  • Twosided.
  • Event based.
  1. Communication strategies.
  • Remote comm strategies:
    • MPI_Put , gathered.
    • MPI_Put , one by one.
    • MPI_Get , gathered .
    • MPI_Get , one by one.
  • Lock strategies: MPI_Win_lock / MPI_Win_lock_all .
    • Lock all and unlock all.
    • Lock local and unlock local.
  • Flush strategies: MPI_Win_flush / MPI_Win_flush_local .
    • Flush all.
    • Flush local.
  1. Schwarz problem type.
  • RAS.
  • O-RAS.

Any of the implemented features can be permuted and tested.

Known Issues

  1. On Summit, the Spectrum MPI seems to have a bug with using MPI_Put with GPU buffers. MPI_Get works as expected. This bug has also been confirmed with an external micro-benchmarking library, OSU Micro-Benchmarks.

For installing and building, please check the Installation page

Credits: This code (written in C++, with additions and improvements) was inspired by the code from Ichitaro Yamazaki, ICL, UTK.

schwarz-lib's People

Contributors

pratikvn avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

schwarz-lib's Issues

Unable to commit due to git-cmake

I compiled the entire library again after switching off DSCHWARZ_DEVEL_TOOLS. Now I get the following error when I want to commit:
can't open file '/afs/crc.nd.edu/user/s/sghosh2/Public/schwarz-lib/build/third_party/git-cmake-format/src/git-cmake-format.py': [Errno 2] No such file or directory

Plot the residual over iterations for every PE

This would be useful to study convergence. There can be a struct similar to comm_data_struct which stores the residuals over iterations for every PE. After the solver converges, it can be printed to a file. A python script can then be used to plot data.

Oscillation of solution values

The average of the boundary values is oscillating with iterative solver and one-sided. Few things to check:

  • What happens if direct solver is used?
  • What happens for a surface plot with the entire boundary?
  • What happens when two-sided is used?

Does not converge with -N flag in mpirun

The following run converges:

mpirun -np 8 ./benchmarking/bench_ras --executor=reference --num_iters=2000 --explicit_laplacian --set_1d_laplacian_size=128 --set_tol=1e-6 --local_tol=1e-12 --partition=regular --local_solver=iterative-ginkgo --enable_onesided --enable_flush=flush-local --write_comm_data --timings_file=subd

But the following doesn't:

mpirun -N 1 -np 8 ./benchmarking/bench_ras --executor=reference --num_iters=2000 --explicit_laplacian --set_1d_laplacian_size=128 --set_tol=1e-6 --local_tol=1e-12 --partition=regular --local_solver=iterative-ginkgo --enable_onesided --enable_flush=flush-local --write_comm_data --timings_file=subd

I am reserving appropriate number of PEs in the 2nd case, i.e., num_pe_pernode * 8. I am using Open MPI and Boost compiled with gcc/8.3.0.

Installation

@pratikvn Few queries about installation:

  1. How do I get gingko ? Do I just clone the gingko repo? Also in which file do I set the Gingko_DIR variable ?
  2. For MPI, I usually do a module load $mpi version$ in our cluster. Will that work or do I need to set a path somewhere?
  3. For Boost, similar question as above ?
  4. For Metis, similar question as above ?

Add the Optimized Restricted Additive Schwarz solver.

The optimized Schwarz solver accelerates convergence compared to the RAS method.

Challenges:

  • Choosing alpha (Robin boundary condition scaling parameter) (an optimal value for the synchronous version exists.)
  • Extension to generic problems (Open problem)

Understanding the code

To start understanding the code, can you give a rough idea of the sequence in which I should look at the files in /source ? A brief description of what each file does would also be nice.

Also, is there a brief tutorial of gingko somewhere ? I found one but that looks incomplete.

Threshold selection for Event-Triggered Communication

Till now, I was using a threshold of the form alpha * beta^k where alpha is any constant, beta is a constant between 0 and 1 and k is the iteration number. This threshold decreases with iterations and goes to zero asymptotically. When an event for communication is not triggered at the sender, the receiver keeps using the last communicated values for its own local solve. I think there are two problems with this scheme (seen through experiments) :

  • The threshold is dependent on time (iterations) but it is not dependent on space. In other words, it decreases with iterations but every PE uses the same threshold during a particular iteration regardless of its location in the domain. This is not desired since the rate of change of boundary values in a PE will be different from another PE depending on the initial conditions. Therefore threshold has to be made space dependent.

  • When the receiver does not receive a new value because an event for communication was not triggered in the corresponding sender, using the last communicated value for its own local solve at the receiver will lead to the same local solution again (the "local" boundary conditions for the receiver stays the same and hence it converges to the same solution). As a result, the receiver's boundary values don't change and hence an event for communication is not triggered unless fresh values come to it from neighbors. This situation often leads to a "communication deadlock" (a processor does not trigger communication unless it receives new values, the same holds for its neighbors and so on). A possible solution is to extrapolate the ghost cells at the receiver using last received values so that the local solve yields a different local solution, leading to change in its boundary values which may trigger a communication.

Please comment what you think.

Add a testing framework.

As the project gets larger, I think it is necessary that a unit test/ integration test framework be added.

There are a lot of choices, but the Catch2 framework looks very promising.

Print the final solution

A way to visualize the final solution in the domain would be helpful. My suggestion would be to write the final solution to a file aand then use matplotlib in a python script to do the plotting as a heatmap.

Event-based communication

For doing event-based communication, we need a variable called boundary_solution in addition to local_solution as defined in schwarz_solver.hpp. I am wondering what would be the datatype of that variable. For a 2D problem, each PE has 4 boundaries, each of which is a vector - so making it a gko::matrix::Dense is okay ?

Confusion with the solution variables

In the run function here, there is a solution that is taken as argument. Then why is another variable global_solution is being created inside the function ?

Also, the variables local_solution and global_solution are being interchanged sometimes, leading to confusion. For e.g., in exchange_boundary here, exchange_boundary_onesided is called using global_solution but in the function definition here, local_solution is used. There might be other places where this has occurred too.

dealii installation

In order to install dealii, do I simply follow the steps here? Do I have to add any options for configuring with Ginkgo or anything else? Also, the docs specify compiling dealii in parallel, is that necessary?

Add flags for setting event parameters

I am planning to choose a threshold of the form alpha * beta ^ k for event-based communication where k is the iteration number. To try out different values of alpha and beta, it would be helpful to get them in runtime through flags.

Multi-threading and core binding.

Use pthreads/ std::threads to use one MPI rank per process and multiple threads per rank and one subdomain per thread.

Use HWLOC functions to bind each thread to a specific CPU core /GPU.

UMFPACK installation

Compiling now asks for UMFPACK directory. Is there a way to prevent that or do I have to install SuiteSparse ?

Regular2D partition crashes on more than 4 PEs

Command executed on develop branch (from Feb 25):

mpirun -np 8 ./benchmarking/bench_ras --num_iters=1000 --explicit_laplacian --set_1d_laplacian_size=64 --set_tol=1e-6 --local_tol=1e-12 --partition=regular2d --local_solver=iterative_ginkgo --enable_onesided --enable_flush=flush_local

Error: Segmentation fault

Two sided communication fails to converge

The following execution after compiling with develop branch does not converge in 500 iterations:

mpirun -np 4 ./benchmarking/bench_ras --num_iters=500 --explicit_laplacian --set_1d_laplacian_size=64 --set_tol=1e-6 --local_tol=1e-12 --partition=regular2d --local_solver=iterative_ginkgo --enable_twosided

However, when the twosided is changed to onesided, it converges!

enable_twosided can invoke the functionality of enable_global_check by default

In continuation of the issue in #27, I think it would be better if the user does not need to provide the --enable_global_check flag when --enable_two_sided is provided. I believe this flag does not need to be specified for one-sided. So it might be inconvenient to write generic scripts for submitting jobs for both one-sided and two-sided.

Remove dependence on Boost

Since only one header file is used from boost, is it possible to copy that into the project and remove the dependence on Boost? We have an old version of Boost compiled with gcc/4.8.5 and I am not sure if it is producing correct results.

Add reordering for the local direct solvers.

Currently, the local direct solvers on the GPU do not re-order the local system matrices before computing the local factorization for the matrices.

Expected benefits

  • Accelerated solution.
  • Can possibly solve much larger problems.

Error with METIS linking

I am getting the following error while compiling : cannot find -lmetis.

I think the error is here. That condition should be opposite.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.