cice-consortium / cice Goto Github PK
View Code? Open in Web Editor NEWDevelopment repository for the CICE sea-ice model
License: Other
Development repository for the CICE sea-ice model
License: Other
A few problematic OMP loops were unthreaded due to reproducibility problems found during testing. grep for TCXOMP. These are in ice_dyn_eap, ice_dyn_evp, and ice_transport_remap. One issue may be thread safety in icepack_ice_strength, but it requires additional debugging.
More generally, we need to review and validate that threading is working properly in CICE and Icepack.
I attempted to run a test on Conrad (Cray XC40) with the following configuration:
The modules that are loaded for the compile:
The error that I am encountering is:
Rank 0 [Mon Aug 7 15:24:19 2017] [c2-0c1s2n2] Fatal error in PMPI_Bcast: Invalid root, error stack:
PMPI_Bcast(1614): MPI_Bcast(buf=0x7fffffff4640, count=1, dtype=0x4c000829, root=-999, comm=0x84000004) failed
PMPI_Bcast(1576): Invalid root (value given was -999)
forrtl: error (76): Abort trap signal
Image PC Routine Line Source
cice 000000000079DC32 Unknown Unknown Unknown
cice 0000000000787250 Unknown Unknown Unknown
cice 0000000000787396 Unknown Unknown Unknown
cice 0000000000769629 Unknown Unknown Unknown
cice 000000000075F03A Unknown Unknown Unknown
cice 0000000000456F96 ice_broadcast_mp_ 81 ice_broadcast.F90
cice 00000000004670DF ice_diagnostics_m 759 ice_diagnostics.F90
cice 0000000000402DBD cice_runmod_mp_ci 63 CICE_RunMod.F90
cice 000000000040064B MAIN__ 50 CICE.F90
cice 00000000004005DE Unknown Unknown Unknown
cice 0000000000DA7991 Unknown Unknown Unknown
cice 00000000004004B9 Unknown Unknown Unknown
The error appears to be a result of the model not being able to find the first diagnostic point:
Find indices of diagnostic points
found point 1
lat lon TLAT TLON i j block task
90.0 0.0 -999.0 -999.0 0 0 0 -999
found point 2
lat lon TLAT TLON i j block task
-65.0 -45.0 -64.2 -45.0 25 10 2 1
I should note that the model runs successfully for hybrid MPI+OpenMP configurations (conrad_smoke_gx3_8x2_diag1_run5day) and pure-MPI implementations (conrad_smoke_gx3_4x1_debug_diag1_run5day). The error only seems to show up when running with CICE_THREADED enabled.
PASS conrad_smoke_gx3_8x2_diag1_run5day build
PASS conrad_smoke_gx3_8x2_diag1_run5day run
PASS conrad_smoke_gx3_8x2_diag24_run1year_medium build
PASS conrad_smoke_gx3_4x1_debug_diag1_run5day build
PASS conrad_smoke_gx3_4x1_debug_diag1_run5day run
PASS conrad_smoke_gx3_8x2_debug_diag1_run5day build
PASS conrad_smoke_gx3_8x2_debug_diag1_run5day run
PASS conrad_smoke_gx3_4x2_diag1_run5day build
PASS conrad_smoke_gx3_4x2_diag1_run5day run
PASS conrad_smoke_gx3_4x2_diag1_run5day bfbcomp conrad_smoke_gx3_8x2_diag1_run5day.t00
PASS conrad_smoke_gx3_4x1_diag1_run5day_thread build
FAIL conrad_smoke_gx3_4x1_diag1_run5day_thread run
Log file from the run: cice.runlog.txt
Log of the compile: cice.buildlog.txt
We are getting errors in a limited number of test cases and machines,
conrad_pgi_smoke_gx3_1x2_debug_diag1_run5day
conrad_pgi_smoke_gx3_2x1_debug_diag1_run5day
throws this error,
0: Null pointer for a2d (/p/home/apcraig/cice-consortium/cice.travis/cicecore/cicedynB/analysis/ice_history.F90: 1344)
but the following non-debug runs pass
PASS conrad_pgi_smoke_gx3_1x2_diag1_run5day run
PASS conrad_pgi_smoke_gx3_1x1_diag1_run5day_thread run
PASS conrad_pgi_smoke_gx3_2x1_diag1_run5day_thread run
PASS conrad_pgi_restart_gx3_2x1_diag1 exact-restart
PASS conrad_pgi_restart_gx3_1x2_diag1 exact-restart
PASS conrad_pgi_restart_gx3_2x1_diag1_pondcesm exact-restart
PASS conrad_pgi_restart_gx3_2x1_diag1_pondtopo exact-restart
Not seeing this problem with other compilers. It could be pgi but we should look into it.
From: Frederic Dupont [email protected]
Subject: small inconsistency in CICE512
Date: October 14, 2016 at 2:14:04 PM MDT
To: Elizabeth Hunke [email protected]
Hi Elizabeth,
I think I found a small inconsistency in CICE 5.1.2 for shift_ice (in
ice_itd.F90):
for each receiving category, at each point:
do n = 1, ncat-1
[ some checks done ...]
[ prepare ishift pointers for cat n ]
do ij = 1, ishift
i = indxii(ij)
j = indxjj(ij)
m = indxij(ij)
nd = donor(m,n)
!echmod worka(i,j) = dvice(m,n) / vicen(i,j,nd)
worka(i,j) = daice(m,n) / aicen(i,j,nd)
if (nd == n) then
nr = nd+1
else ! nd = n+1
nr = n
endif
aicen(i,j,nd) = aicen(i,j,nd) - daice(m,n)
aicen(i,j,nr) = aicen(i,j,nr) + daice(m,n)
vicen(i,j,nd) = vicen(i,j,nd) - dvice(m,n)
vicen(i,j,nr) = vicen(i,j,nr) + dvice(m,n)
dvsnow = vsnon(i,j,nd) * worka(i,j)
vsnon(i,j,nd) = vsnon(i,j,nd) - dvsnow
vsnon(i,j,nr) = vsnon(i,j,nr) + dvsnow
workb(i,j) = dvsnow
enddo ! ij
[ followed by the actual transport in thickness space for all tracers, including enthalpies, etc...]
enddo ! boundaries, 1 to ncat-1
The issue is for snow:
because the worka/workb arrays are dependent on aicen(i,j,nd) and vsnon(i,j,nd). In case of growth (shift towards thicker cat, i.e.
increasing n), aicen(i,j,nd) and vsnon(i,j,nd) might have already been updated. It would be more in line with the rest of the transport
equation in thickness space that worka be computed with aicen and vsnon at the previous timestep (Euler-forward approach), therefore saving
aicen/vnson at the beginning of the routine in separate arrays. It is not a serious inconsistency but I don't see any reasons why the cases of growth and melt would be treated differently. (correction attached with some goodies from bigger ITDs)
Anyway, the code is already a nice improvement to what was done in CICE4!
I would have an additional question regarding the formulation of worka since I noted that it changes from a fraction in volume (CICE 4.1) to a fraction in area (5.1.2). Is it because it was found that there was not much correlation between volume of snow and ice being advected to the next category (i.e. volume of snow is not necessarily proportional to volume of ice)?
Fred.
Fix rEVP as suggested by Martin Losch.
CICE needs to be ported and tested in the CESM and RASM frameworks. The current cesm driver will not work due to the changes to the Icepack implementation and the use statements.
CDash or github wiki? Let's decide. Examples:
http://my.cdash.org/index.php?project=myCICE
http://my.cdash.org/index.php?project=myIcepack
https://github.com/CICE-Consortium/Icepack/wiki/ff9ef4a8957a79620f96600f3e70212c0f6ad0ee
See also @apcraig PR CICE-Consortium/Icepack#94
We need to write down requirements and how they can be addressed, pros and cons, etc -- essentially a design document for posting test results. Some issues that have already come up in discussions (random order):
gfortran with debug flags is trapping several underflows in CICE. Some are known (ice_dyn_evp/eap) and others may not be (construct_fields in ice_transport_remap.F90). gfortran does not have an underflow to zero flag, and I'm not sure what happens when this arises without trapping. I assume it is underflowing to zero gracefully, but not sure. there are several things we could try. we could
We have coded underflow to zeros in icepack when needed without much regard to cost. But maybe that approach should be re-evaluated too so we bring some consistency to this issue. Mostly, we should continue to use a compiler's underflow to zero flag, and in those cases, we don't want to introduce a bunch of cost or complexity. See also CICE-Consortium/Icepack#170
@eclare108213 @dabail10 @mattdturner any thoughts?
Testing how this works. Assigned to @duvivier. Need @eclare108213 's input.
Refactor subroutine stress for improved computational efficiency, as suggested by DMI collaborators.
Store shortwave radiation for all blocks in the restart files, not just the active blocks, if possible. This should not change the answers when restarting from identical configurations, but could fix inexact restart problems when restarting with different block configurations.
(Issue noted by DMI collaborators)
Have the code read the current version number from version.txt (or whatever is most appropriate) and use it in the netcdf metadata, e.g.
:source = "Los Alamos Sea Ice Model (CICE) Version 5" ;
Check other output formats to see if it can be used there also.
Do we want to change the current version number in the CICE code and docs to v6-alpha? That is how it is referred to in Icepack and my recent presentations.
clean up unused variables in CICE
remove svn headers from all modules
For vector history variables, create options for the vector averages (e.g. speed and direction) rather than simply averaging the vector components.
Record timing results as part of the testing procedure, or at least flag them if changes are substantial in comparison tests. Preferably include at least these:
Timer 2: TimeLoop
Timer 3: Dynamics
Timer 5: Column
The Cray compiler complains about num_avail_hist_fields in ice_history_shared.f90, having to do with an inconsistency in the order of evaluations. Issue reported by DMI collaborators, who will need to provide details.
We are running into reproducibility problems on travisCI as well as a problem with a 1x1 test. The test output from a broader suite is below. We are turning off failed tests for now so travis will pass. But this needs to be sorted out.
The main issues are the answers are changing according to the testing when they shouldn't. It would be good to see/get output from the tests and maybe we can have travis ftp data out so we can look at it. The other problem is the 1x1 test failure. That fails with
Current forcing data year = 1997
Finished writing ./history/iceh_ic.1998-01-01-00000.nc
mpirun noticed that process rank 0 with PID 13802 on node travis-job-anders-dc-cice-360441545.travisci.net exited on signal 11 (Segmentation fault).
PASS travisCI_gnu_smoke_gx3_2x1_debug_diag1_run5day build
PASS travisCI_gnu_smoke_gx3_2x1_debug_diag1_run5day run
#---
PASS travisCI_gnu_smoke_gx3_1x2_debug_diag1_run5day build
PASS travisCI_gnu_smoke_gx3_1x2_debug_diag1_run5day run
#---
PASS travisCI_gnu_smoke_gx3_1x1_diag1_run5day build
PASS travisCI_gnu_smoke_gx3_1x1_diag1_run5day run
#---
PASS travisCI_gnu_smoke_gx3_2x1_diag1_run5day build
PASS travisCI_gnu_smoke_gx3_2x1_diag1_run5day run
FAIL travisCI_gnu_smoke_gx3_2x1_diag1_run5day bfbcomp travisCI_gnu_smoke_gx3_1x1_diag1_run5day.travisCItest different-data
#---
PASS travisCI_gnu_smoke_gx3_1x2_diag1_run5day build
PASS travisCI_gnu_smoke_gx3_1x2_diag1_run5day run
#---
PASS travisCI_gnu_smoke_gx3_1x1_diag1_run5day_thread build
FAIL travisCI_gnu_smoke_gx3_1x1_diag1_run5day_thread run
#---
PASS travisCI_gnu_smoke_gx3_2x1_diag1_run5day_thread build
PASS travisCI_gnu_smoke_gx3_2x1_diag1_run5day_thread run
FAIL travisCI_gnu_smoke_gx3_2x1_diag1_run5day_thread bfbcomp travisCI_gnu_smoke_gx3_1x2_diag1_run5day.travisCItest different-data
#---
PASS travisCI_gnu_restart_gx3_2x1_diag1 build
PASS travisCI_gnu_restart_gx3_2x1_diag1 run-initial
PASS travisCI_gnu_restart_gx3_2x1_diag1 run-restart
PASS travisCI_gnu_restart_gx3_2x1_diag1 exact-restart
#---
PASS travisCI_gnu_restart_gx3_1x2_diag1 build
PASS travisCI_gnu_restart_gx3_1x2_diag1 run-initial
PASS travisCI_gnu_restart_gx3_1x2_diag1 run-restart
PASS travisCI_gnu_restart_gx3_1x2_diag1 exact-restart
#---
PASS travisCI_gnu_restart_gx3_2x1_diag1_pondcesm build
PASS travisCI_gnu_restart_gx3_2x1_diag1_pondcesm run-initial
PASS travisCI_gnu_restart_gx3_2x1_diag1_pondcesm run-restart
PASS travisCI_gnu_restart_gx3_2x1_diag1_pondcesm exact-restart
#---
PASS travisCI_gnu_restart_gx3_2x1_diag1_pondtopo build
PASS travisCI_gnu_restart_gx3_2x1_diag1_pondtopo run-initial
PASS travisCI_gnu_restart_gx3_2x1_diag1_pondtopo run-restart
PASS travisCI_gnu_restart_gx3_2x1_diag1_pondtopo exact-restart
I will do this.
The current CICE implementation controls run length via settting npt which is the number of timesteps, and the run length in time depends on the model timestep. It might be nice to be able to specify a number of hours, days, months, or years to run the model rather than the number of timesteps.
especially ICE_RUNLENGTH
Implement a namelist option for freezing temperature. This has already been done for CESM/RASM, but I don't think it made it into the main CICE repo.
The report_results test reporting needs to be reviewed. Pushing back to the wiki has been disabled in CICE because there are problems. Both CICE and Icepack need to be synchronized and made consistent with the wiki with stability and backwards compatibility required moving forward.
We need an overview description of the fast-ice scheme, referencing publication(s), and including full documentation of the new namelist variables. Also describe hwater, bathymetry, and normalization of principal stresses. Remove prs_sig from the namelist index, and add sigP.
@rallard77 and I have tested the most recent CICE code on 2 Cray machines using the CCE environment. On both machines, every run fails to complete successfully. The issue appears to be related to reading the input files.
Here is a snippet from the log file:
max_blocks = 4
Number of ghost cells: 1
read_global 11, 1, -1.47038251591136854E+308, 7.17269300631787991E+307, -1.72150506148753856E+308
read_global 12, 1, 0., 419430400., 2820535222272.
Processors (X x Y) = 2 x 2
For reference, here is the same snippet section from a run with the Intel compiler:
max_blocks = 4
Number of ghost cells: 1
read_global 11 1 -1.36148077740934
1.56905100613449 916.469935499633
read_global 12 1 0.000000000000000E+000
25.0000000000000 168117.000000000
Processors (X x Y) = 2 x 2
The first read_global
output line is information regarding the grid file. I have read the grid file in python, using both big endian and little endian:
Big-Endian: Min = -1.36148077741
Big-Endian: Max = 1.56905100613
Little-Endian: Min = -1.47038251591e+308
Little-Endian: Max = 7.17269300632e+307
The CCE Macros.*
files all have -h byteswapio
included in the FFLAGS
variable, which should correctly read big endian files. I have tested compiles both with and without -h byteswapio
and it does not have any effect. So, for whatever reason, it seems that the CCE-compiled executable is always reading the file as little-endian.
The default setting for CICE and Icepack is NTRAERO=1 with tr_aero=false. This leads to some problems in the indexing, specifically in the call to icepack_step_radiation where this array,
trcrn(i,j,nt_aero:nt_aero+4*n_aero-1,:,iblk), &
is passed in as an argument. When tr_aero is false, this array is not used, but because NTRAERO=1, it's passing a section of the array that is not allocated. The fix for this is to introduce this logic in ice_init.F90,
nt_aero = max_ntrcr - 4*n_aero
if (tr_aero) then
nt_aero = ntrcr + 1
ntrcr = ntrcr + 4*n_aero ! 4 dEdd layers, n_aero species
endif
Which kludges the nt_aero indexing so that a valid array section is passed even though it's not the section associated with aero.
There are several things that should happen. First, when tr_aero=false, n_aero=NTRAERO=0. That would eliminate the need for the kludgy setting of nt_aero. Second, it might be worth looking into the icepack_step_radiation and think about whether we really want to send array sections for arrays that aren't needed. This might also be part of a broader review of how tracers are managed in CICE and Icepack.
Evaluate and possibly switch documentation to readthedocs, for better automation
the NAG compiler reports lonu_bounds is uninitialized for iconrner=1, iblk=1, at line 1912 of ice_grid.f90.
(issue reported by DMI collaborators)
Include bathymetric information (vertical grid) as part of standard grid input, for use by fast-ice parameterization and icebergs.
What is the overhead for doing the things needed for the quality control comparisons ("-s qc" to generate and position necessary files) for all simulations? I've done a bunch of simulations without it, and now it would be nice to go back and check the qc output without having to re-run the simulations.
Add subname string to each interface and sed "subname" to subname so they are used in error messages.
Some (but not all) fluxes are divided by aice before being sent to the cesm coupler. This is highly confusing because normally, multiplying a non-flux value (e.g. thickness) specific to the ice-covered area by aice (or similarly area-averaging over categories) produces a grid-cell-mean value. The subroutine scale_fluxes divides by aice, so multiplying by aice then just brings it back to the ice-covered area value. It would be better to save the coupling fluxes separately so that the physical interpretations of the primary variables aren't changing. E.g. from a user's question on how to compute net longwave from the history output:
flwdn in history is flw elsewhere in the code, and that is the value for any point in the grid cell, whether it’s ice or ocean. It’s not multiplied by aice, but it is still a grid-cell average because it’s the same over ice and ocean alike.
flwup in history is flwout elsewhere in the code, and in the code calculations this is the value only over sea ice. However it’s later divided by aice for the cesm coupler, and that happens before it’s sent to history. flwup_ai in history is flwout*aice, so it’s back to the only-over-ice value.
So the net longwave is
over ice: flwnet = flwup_ai-flwdn
over the grid cell: flwnet(ice) + flwnet(ocn) = (flwup_ai-flwdn)aice + (sigmaSST^4 - flwdn)*(1-aice)
Is that right?
Setting f_hi = 'md' did not produce monthly output. (Need to check if this is really a bug or if I just didn't have the various history flags set correctly.)
Implement testing options to maximize how much of the code is exercised. We can use the Icepack tests (though some namelist options are different) and will need additional tests for dynamics and infrastructure. Consider using codecov.io, as suggested by @anders-dc:
Some automated tools can be used to test for "code coverage", a metric for measuring what parts of the code are being covered by the applied test suite. I use the tool codecov.io to visualize the results and you can see an example here:
This is the status page for a repository of mine:
https://codecov.io/github/anders-dc/Granular.jl?branch=masterHere is an overview of the source folder:
https://codecov.io/gh/anders-dc/Granular.jl/tree/master/srcAnd this is what a single source file looks like:
https://codecov.io/gh/anders-dc/Granular.jl/src/master/src/temporal.jlGreen lines are hit by the test suite and red lines are never invoked. You can see that my tests hit most of the lines of code in this file, while not being perfect. This overview can help me design tests for parts that are untouched, and also get rid of abandoned code that was never used anyway. The algorithm is clever enough to not count comments and other irrelevant lines.
I think it is excessive to strive for 100% code coverage, and a high code coverage does (obviously) not guarantee a bug free code. However, I have found that the metric helps build confidence in the testing, and keeps me on my toes for writing tests for newly developed code.
Check for potential bug. This might be easier (or already fixed) as part of the CMIP6/SIMIP history output (#93). Alex West's original email:
Hello Elizabeth,
Hope that you’re well. I’ve been having a go at adding two new diagnostics to CICE to report ice and snow evaporation separately – apart from being in the SIMIP data request, I think they may be useful for my PhD on the Arctic energy budget. However, I’ve been having some unexpected problems, and wondered if there are aspects of the evaporation I’m still not aware of (and if I’ve come across a possible reason why this hasn’t been done before!).
Basically, I’ve
defined the new diagnostics, evap_ice_ai and evap_snow_ai, in the history routines in the normal way;
defined new aggregate and category fields evap_ice, evapn_ice etc in ice_flux;
added code to thickness_changes in ice_therm_vertical to calculate the fields evapn_ice, evapn_snow in the same way as evapn, and divide them by timestep length;
added the new evaporation fields to the routine merge_fluxes in in_flux.
It looks to my naive eye that the new fields are then calculated in much the same way as the existing evapn field. However, when I look at the resulting diagnostic files, the new snow and ice evaporation fields are of a completely different order of magnitude to the total evaporation field (about 1.e3 bigger). Moreover, the sum of the two new fields actually has a different structure to the total evaporation field.
Do you have any idea what I might be missing here? I attach the copies of ice_flux, ice_step_mod, ice_therm_vertical, ice_history and ice_history_shared from my branch; unfortunately the output file with the three evaporation fields is too large to send by email.
Because this branch is a general ‘new diagnostics for SIMIP’ branch it also contains Dave Bailey’s new diagnostic changes, so a diff with the 5.1.2 trunk will show these differences also.
replace 2017 with 2018 in copyright statements everywhere
When we have a "general DOI" for the repo, we should add that information to the README.md as well as the documentation so that we can point users to that and ask them to please cite that when using this code.
Current Info:
https://zenodo.org/record/1205675#.Wru_HZPwbUI
https://zenodo.org/record/1205675#.Wru_SpPwbUI
We may also want to create a CICE-Consortium community to tag these releases and point folks to that.
Hi all,
I have tried running the CICE test suite on Travis-CI, but am encountering "Division by 0" errors. Is this a well-known problem? Please see the build log here:
https://travis-ci.org/anders-dc/CICE/jobs/343922056#L764
I am invoking Travis-CI from my own fork, with the following Macros and env configuration. I used these settings because they were successful with Icepack.
Cheers, Anders
the NAG compiler reports a missing initialization in the uvm halo, line 1399 of ice_grid.f90
(issue noted by DMI collaborators)
Let's make the templates (both CICE and Icepack) a little less cryptic and hopefully a little more useful.
Timing values are incorrect
incorporate and test computational performance improvements from U. Reading
Consolidate ice_constants files as needed. There are only a few differences between cice, cesm, and hadgem. Do we want to keep them separate, un-parameter them and add init/query methods, or what?
Also, we need to make sure the icepack_constants are set in hadgem and cesm mode during CICE initialization via the init methods.
CICE currently uses MPI_WTIME for calculating timings. CESM uses GPTL. Should we change this?
(revision 1121 in svn repository, cesm_cice branch)
Track and post accumulated test results, automated if possible.
Have the scripts set up all of the elements of the run directory before beginning to compile the code.
From: Pedro Duarte [email protected]
Sent: Monday, May 22, 2017 8:13 AM
I have just noticed that in ice_forcing.F90, in the subroutine compute_shortwave there is the following equation to calculate solar time:
solar_time = mod(real(sec,kind=dbl_kind),secday)/c3600 &
+ c12sin(p5TLON(i,j))
As I understand, the second term is to correct solar time as a function of longitude. However, I think there may be an error here since c12sin(p5TLON(i,j)) should return 6 hours when TLON = 90 and it returns ~8.5 hours. I wonder why not using a simple linear function here since solar time varies one hour per 15 degrees? Or am I missing something here?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.