GithubHelp home page GithubHelp logo

OMP threading in CICE about cice HOT 12 CLOSED

apcraig avatar apcraig commented on July 18, 2024
OMP threading in CICE

from cice.

Comments (12)

eclare108213 avatar eclare108213 commented on July 18, 2024

Let's look to see whether @mhrib addressed the ice_dyn_* loops in his refactorization

from cice.

eclare108213 avatar eclare108213 commented on July 18, 2024

see also #128

from cice.

mhrib avatar mhrib commented on July 18, 2024

I did find issues with the same OMPs (and a few more), but no solution other than comment-out as here. See also #252

from cice.

TillRasmussen avatar TillRasmussen commented on July 18, 2024

In addition to the ones that Mads (MHRI) found I found OMP issues in ice_history and ice_grid. I uncommented all OMP directives in these two files, which saved the model from crashing when running with Intel and GNU compilers. I have not found solutions nor specific locations for these bugs witin the file..

from cice.

eclare108213 avatar eclare108213 commented on July 18, 2024

I am uploading a set of slides here from a LANL training course on OpenMP profiling and debugging that I attended last week. Most of it is old news, but the profiling and debugging info at the end might be useful as we move forward with this task.
Workshop 6 Basic OpenMP and Profiling-2.pdf

from cice.

apcraig avatar apcraig commented on July 18, 2024

I have created a perf_suite that will be PR'ed soon. This runs a fixed suite of tests that attempt to assess CICE performance at different task and thread counts. It basically does three things.

  • It runs a few cases on 1 PE with no threading with different block sizes to assess the impact on block size on model performance.
  • It uses a fixed 16x16 block size and runs a series of scaling tests on 1 to 128 MPI tasks.
  • It uses the same 16x16 block size and runs a series of timing tests on 64 PEs with 64 to 4 MPI tasks and 1 to 16 threads (i.e. 64x1, 32x2, 16x4, 8x8, 4x16).

This is all done with the gx1 grid, roundrobin decomp, 2 day runs, basic out of the box configuration. The idea is not to optimize the performance of CICE but to compare the performance of CICE on different hardware, different compilers, and different tasks/threads for a very fixed problem. This is, in part, a starting point for further OMP tuning.

I attach an xl spreadsheet, CICE_OMP_perf.xlsx, that shows the results from testing on Narwhal with 4 compilers and Cheyenne with 3 compilers in table and graph form. This is for hash 9fb518e of CICE dated Dec 21, 2021, but also includes the Narwhal port and the perf_suite (which will be PR'ed soon).

There are lots of interesting insights. But with regard to OMP, we see that in this version of CICE (which has lots of OMP loops turned off that still need debugging), OMP is still doing something. In these tests, OMP is never faster than just using all MPI for the same total PE count. But for a given MPI task count, threads run faster than running the same MPI task count but single threaded (i.e. 16x4 vs 16x1), at least on Narwhal. Cheyenne shows less benefit from threading. This establishes a performance baseline and provides a starting point to improve OMP performance, probably using Narwhal gnu or cray to continue OMP tuning efforts.

from cice.

apcraig avatar apcraig commented on July 18, 2024

Note that CICE_OMP_perf.xlsx has an error, the 4x16 run is actually 8x16. I've fixed the error in perf_suite in my sandbox for future use. Ignore the 4x16 results for now.

from cice.

apcraig avatar apcraig commented on July 18, 2024

I attach an updated OMP results table and graphs, CICE_OMP_perf.xlsx. This also has a second sheet that shows all timing info for the threading and unthreaded tests. If you look closely, you can see that Advection is just about the only section that threads reasonably. Column and Dynamics do not thread well and maybe not at all. I'll try to understand this better.

from cice.

TillRasmussen avatar TillRasmussen commented on July 18, 2024

For the dynamic part most of the OMP has been commented out including the one in the subcycling iteration.

from cice.

apcraig avatar apcraig commented on July 18, 2024

I believe #680 largely addresses this PR. Will close this issue when #680 is merged. We'll need to remain diligent with respect to OpenMP validation and performance.

from cice.

apcraig avatar apcraig commented on July 18, 2024

This has largely been addressed in #680 and apcraig#64. There are still some known issues in VP and 1d EVP.

from cice.

apcraig avatar apcraig commented on July 18, 2024

I will close, VP and 1d EVP has their own issues. FYI, added omp_suite and perf_suite to check OpenMP and evaluate performance.

from cice.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.