GithubHelp home page GithubHelp logo

cuda-multigrid's Introduction

CUDA Multigrid

The code in this repository solves Poisson's equation in 2D subject to Dirichlet boundary condition using the Multigrid method with a Gauss-Seidel smoother. The PDE is discretized using second order finite differences and conservative prolongation and restriction operators. There are two implementations of the solver, a single-threaded CPU solver and a GPU solver (custom CUDA kernels). To expose parallelism, the GPU solver uses a Red-Black Gauss-Seidel smoother.

Usage

Look at the test program test/test_poisson.cu to learn how to use the code.

// Select problem to solve
using CUDAProblem = CUDAPoisson<L1NORM, Number>;
// Select smoother
using CUDASmoother = CUDAGaussSeidelRedBlack;
// Select multigrid solver
using CUDAMG = CUDAMultigrid<CUDASmoother, CUDAProblem, Number>;

CUDAProblem problem(l, h, modes);
CUDAMG solver(problem);
auto out = solve(solver, problem, opts);
printf("Iterations: %d, Residual: %g \n", out.iterations, out.residual);
CPU: Intel(R) Core(TM) i7-6700K CPU @ 4.00GHz
GPU: NVIDIA RTX 2080 Ti

test/test_poisson 
Solver: Gauss-Seidel (red-black) 
Iteration 	 Residual
10      	 6.674289 
20      	 1.369973 
30      	 0.2812023 
40      	 0.05771992 
50      	 0.01184766 
60      	 0.002431866 
70      	 0.0004991677 
80      	 0.0001024598 
90      	 2.103102e-05 
100     	 4.316853e-06 
110     	 8.860825e-07 
120     	 1.818784e-07 
130     	 3.733259e-08 
Iterations: 139, Residual: 8.97769e-09 
Solver: Multi-Grid<Gauss-Seidel (red-black)> 
Iteration 	 Residual
10      	 4.959894e-09 
Iterations: 10, Residual: 4.95989e-09 
Solver: CUDA Gauss-Seidel (red-black) 
Iteration 	 Residual
10      	 6.674289 
20      	 1.369973 
30      	 0.2812023 
40      	 0.05771992 
50      	 0.01184766 
60      	 0.002431866 
70      	 0.0004991677 
80      	 0.0001024598 
90      	 2.103102e-05 
100     	 4.316853e-06 
110     	 8.860825e-07 
120     	 1.818784e-07 
130     	 3.733259e-08 
Iterations: 139, Residual: 8.97769e-09 
Solver: CUDA Multi-Grid<CUDA Gauss-Seidel (red-black)> 
Iteration 	 Residual
10      	 4.959894e-09 
Iterations: 10, Residual: 4.95989e-09 
MMS convergence test
Solver: Multi-Grid<Gauss-Seidel (red-black)> 
Grid Size 	 Iterations 	 Time (ms) 	 Residual 	 Error 		 Rate 
   6 x 6    	 1       	 0.00406 	 2.4718e-16 	 0.9348 	 -inf  
  10 x 10   	 8       	 0.01162 	 8.3216e-09 	 0.30908 	 1.59669 
  18 x 18   	 10      	 0.03814 	 4.9599e-09 	 0.08183 	 1.91727 
  34 x 34   	 11      	 0.14534 	 1.6851e-09 	 0.02074 	 1.98024 
  66 x 66   	 11      	 0.55350 	 2.2692e-09 	 0.0052025 	 1.99512 
 130 x 130  	 11      	 2.19837 	 2.4601e-09 	 0.0013017 	 1.99878 
 258 x 258  	 11      	 8.77101 	 2.5173e-09 	 0.0003255 	 1.99970 
 514 x 514  	 11      	 40.58800 	 2.5417e-09 	 8.1378e-05 	 1.99993 
1026 x 1026 	 11      	 179.36189 	 2.5836e-09 	 2.0344e-05 	 2.00001 
2050 x 2050 	 11      	 746.79437 	 2.735e-09 	 5.0858e-06 	 2.00010 
4098 x 4098 	 11      	 3020.28027 	 3.38e-09 	 1.2711e-06 	 2.00041 
8194 x 8194 	 11      	 12569.50879 	 7.1411e-09 	 3.174e-07 	 2.00166 
MMS convergence test
Solver: CUDA Multi-Grid<CUDA Gauss-Seidel (red-black)> 
Grid Size 	 Iterations 	 Time (ms) 	 Residual 	 Error 		 Rate 
   6 x 6    	 1       	 0.08147 	 2.4718e-16 	 0.9348 	 -inf  
  10 x 10   	 8       	 0.46346 	 8.3216e-09 	 0.30908 	 1.59669 
  18 x 18   	 10      	 0.76461 	 4.9599e-09 	 0.08183 	 1.91727 
  34 x 34   	 11      	 1.04093 	 1.6851e-09 	 0.02074 	 1.98024 
  66 x 66   	 11      	 1.23590 	 2.2692e-09 	 0.0052025 	 1.99512 
 130 x 130  	 11      	 1.52275 	 2.4601e-09 	 0.0013017 	 1.99878 
 258 x 258  	 11      	 2.05760 	 2.5173e-09 	 0.0003255 	 1.99970 
 514 x 514  	 11      	 3.58195 	 2.5417e-09 	 8.1378e-05 	 1.99993 
1026 x 1026 	 11      	 10.08339 	 2.5836e-09 	 2.0344e-05 	 2.00001 
2050 x 2050 	 11      	 29.20931 	 2.735e-09 	 5.0858e-06 	 2.00010 
4098 x 4098 	 11      	 109.10480 	 3.38e-09 	 1.2711e-06 	 2.00041 
8194 x 8194 	 11      	 397.84283 	 7.1411e-09 	 3.174e-07 	 2.00166 

TODO

Currently, the GPU solver is more or less a copy-and-paste-version of the CPU solver. It would be nice to remove much of the code duplication. There's also room for improvement in terms of optimization. First, I should perform a baseline profiling to identify the current hotspots. Nonetheless, I already have some ideas on what can be improved. The red-black Gauss-Seidel implementation is currently terrible. It takes two trips to DRAM to load the data. First, once for the red points, and then once more for the black points. At each time, only half of the threads are active. It should possible to keep all threads active by loading the data into shared memory using a float2/double2 instruction. Some kernels like the restriction and residual kernel can also probably merged into a single one to further reduce memory traffic.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.