GithubHelp home page GithubHelp logo

ALPAKA support about kernels HOT 8 OPEN

jeffhammond avatar jeffhammond commented on August 17, 2024 2
ALPAKA support

from kernels.

Comments (8)

jeffhammond avatar jeffhammond commented on August 17, 2024 3

The context here is that we support a wide range of modern C++ parallel models, including Kokkos, TBB, OpenMP and C++17 Parallel STL, so adding an Alpaka port means people can compare a lot of things at once using tests that were created by people who are relatively objective. https://youtu.be/bXeDfA21-VA shows some examples of things that have been done with them before.

I also suspect that the total porting time is less than a day, since the total amount of code that needs porting is very small (nstream is ~3 lines, transpose is ~4 lines, stencil is ~4 lines plus code generation, etc).

from kernels.

jyoung3131 avatar jyoung3131 commented on August 17, 2024 2

Hi @jeffhammond - I'm helping with some benchmarks for PIConGPU and Alpaka, so I'll take a look at these kernel ports in more detail.

from kernels.

jeffhammond avatar jeffhammond commented on August 17, 2024 2

btw @jyoung3131 if you want to be a PIC boss, you'll see there is a PIC PRK with a limited number of implementations. @hattom added SOA and AOS versions in Fortran that would be great targets to study with the C++ stuff.

PRK % find . -name "pic*" | grep -v dep
./AMPI/PIC/pic.c
./Cxx11/pic-sycl.cc
./Cxx11/pic.cc
./MPI1/PIC-static/pic.c
./FORTRAN/pic_soa.F90
./FORTRAN/pic.F90
./FORTRAN/pic-openmp.F90
./FORTRAN/pic_soa-openmp.F90
./FG_MPI/PIC-static/pic.c
./SERIAL/PIC/pic.c
./OPENMP/PIC/pic.c

from kernels.

jeffhammond avatar jeffhammond commented on August 17, 2024 1

Porting guide

Do it in this order:

  1. nstream (1D parallelism)
  2. transpose (2D parallelism)
  3. stencil (2D parallelism)
  4. dgemm (3D parallelism) optional
  5. p2p (complicated parallelism)

Look at the Python implementations if you want the easiest-to-read code as a reference. Or look at whatever language you like best. The simplest implementation will be named "kernel.suffix".

Detail

transpose

It must use a standard row or column major storage. In distributed memory, you must decompose in only one dimension so the communication is all-to-all.

Blocking for cache/TLB is useful on CPUs. GPU optimizations are tricky. The CUDA implementation is not optimal. It will be fixed eventually.

stencil

Figure out one pattern (e.g. star with radius=2) and then tell me so I can roll it into the code generator.

dgemm

Read https://www.cs.utexas.edu/users/flame/pubs/blis3_ipdps14.pdf and implement that if you can but I've never done this and won't judge you at all for just writing triple loops and calling it good.

p2p

Look at slides 30-37 of https://drive.google.com/file/d/1yNQiG-wjBI4Iu6yDPV6WcQL-r8Yt9RSV/view if it helps to understand the design space. Hyperplane method is probably best on GPU unless you use cooperative groups or do other tricky stuff.

from kernels.

ax3l avatar ax3l commented on August 17, 2024

Fantastic idea! Lets ping @psychocoderHPC @BenjaminW3 @j-stephan @bernhardmgruber @sbastrakov @bussmann.

Background: The Parallel Reserach Kernels (ParRes Kernels) are a set of simple programs that can be used to explore the features of a parallel platform: https://github.com/ParRes/Kernels

from kernels.

bussmann avatar bussmann commented on August 17, 2024

Great idea. Will do our best to support this. Happy Easter holidays from Germany!

from kernels.

bussmann avatar bussmann commented on August 17, 2024

LOVE the PIC PRK stuff, @jeffhammond!

If one dares to use some more 'experimental' work I recommend looking into combining Alpaka wit Llama to tackle SoA/AoS and other data layout decisions with a single source code.

Thanks for looking into this, @jyoung3131 , please coordinate with the Alpaka team, we'll be glad to support this.

from kernels.

psychocoderHPC avatar psychocoderHPC commented on August 17, 2024

@bussmann you forgot to link llama documentation + llama github

from kernels.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.