Comments (12)
Let's look to see whether @mhrib addressed the ice_dyn_* loops in his refactorization
from cice.
see also #128
from cice.
I did find issues with the same OMPs (and a few more), but no solution other than comment-out as here. See also #252
from cice.
In addition to the ones that Mads (MHRI) found I found OMP issues in ice_history and ice_grid. I uncommented all OMP directives in these two files, which saved the model from crashing when running with Intel and GNU compilers. I have not found solutions nor specific locations for these bugs witin the file..
from cice.
I am uploading a set of slides here from a LANL training course on OpenMP profiling and debugging that I attended last week. Most of it is old news, but the profiling and debugging info at the end might be useful as we move forward with this task.
Workshop 6 Basic OpenMP and Profiling-2.pdf
from cice.
I have created a perf_suite that will be PR'ed soon. This runs a fixed suite of tests that attempt to assess CICE performance at different task and thread counts. It basically does three things.
- It runs a few cases on 1 PE with no threading with different block sizes to assess the impact on block size on model performance.
- It uses a fixed 16x16 block size and runs a series of scaling tests on 1 to 128 MPI tasks.
- It uses the same 16x16 block size and runs a series of timing tests on 64 PEs with 64 to 4 MPI tasks and 1 to 16 threads (i.e. 64x1, 32x2, 16x4, 8x8, 4x16).
This is all done with the gx1 grid, roundrobin decomp, 2 day runs, basic out of the box configuration. The idea is not to optimize the performance of CICE but to compare the performance of CICE on different hardware, different compilers, and different tasks/threads for a very fixed problem. This is, in part, a starting point for further OMP tuning.
I attach an xl spreadsheet, CICE_OMP_perf.xlsx, that shows the results from testing on Narwhal with 4 compilers and Cheyenne with 3 compilers in table and graph form. This is for hash 9fb518e of CICE dated Dec 21, 2021, but also includes the Narwhal port and the perf_suite (which will be PR'ed soon).
There are lots of interesting insights. But with regard to OMP, we see that in this version of CICE (which has lots of OMP loops turned off that still need debugging), OMP is still doing something. In these tests, OMP is never faster than just using all MPI for the same total PE count. But for a given MPI task count, threads run faster than running the same MPI task count but single threaded (i.e. 16x4 vs 16x1), at least on Narwhal. Cheyenne shows less benefit from threading. This establishes a performance baseline and provides a starting point to improve OMP performance, probably using Narwhal gnu or cray to continue OMP tuning efforts.
from cice.
Note that CICE_OMP_perf.xlsx has an error, the 4x16 run is actually 8x16. I've fixed the error in perf_suite in my sandbox for future use. Ignore the 4x16 results for now.
from cice.
I attach an updated OMP results table and graphs, CICE_OMP_perf.xlsx. This also has a second sheet that shows all timing info for the threading and unthreaded tests. If you look closely, you can see that Advection is just about the only section that threads reasonably. Column and Dynamics do not thread well and maybe not at all. I'll try to understand this better.
from cice.
For the dynamic part most of the OMP has been commented out including the one in the subcycling iteration.
from cice.
I believe #680 largely addresses this PR. Will close this issue when #680 is merged. We'll need to remain diligent with respect to OpenMP validation and performance.
from cice.
This has largely been addressed in #680 and apcraig#64. There are still some known issues in VP and 1d EVP.
from cice.
I will close, VP and 1d EVP has their own issues. FYI, added omp_suite and perf_suite to check OpenMP and evaluate performance.
from cice.
Related Issues (20)
- NLON, ELAT not computed when TLAT, TLON, ANGLET on grid file HOT 14
- Some CMIP variables are computed using a mix of U and T quantities HOT 1
- dxT and other grid length variables
- dsnow optional argument in icepack_step_therm1 HOT 2
- evp1d performance evaluation
- Support netcdf-4 compression & chunking HOT 1
- hist_avg on multiple streams writes the same filenames when .false. HOT 9
- Potential instability related to explicit treatment of Coriolis for C-grid HOT 3
- Arguments in update_state need if present. HOT 2
- dynamics U points are active when T points are not HOT 5
- Commit/PR process HOT 3
- PIO and hdf5 failures HOT 4
- Test various restart formats and add Derecho port that uses pio spack HOT 1
- tripole initial/restart file with inconsistent values on the tripole seam HOT 15
- CICE C-grid crash on 1/12 degree tripole HOT 11
- Can we remove `nprocs` from `ice_in` ? HOT 6
- FSD heat / water / salt conservation
- PIO createfile where path is a symlink
- Minor modifications required in documentation HOT 3
- Unnecessary calculations for uvel, vvel for the C-grid HOT 5
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from cice.