esmvalgroup / esmvalcore Goto Github PK

View Code? Open in Web Editor NEW

38.0 38.0 36.0 137.13 MB

ESMValCore: A community tool for pre-processing data from Earth system models in CMIP and running analysis scripts.

Home Page: https://www.esmvaltool.org

License: Apache License 2.0

Dockerfile 0.01% Python 94.14% HTML 2.14% R 0.03% TeX 0.01% Jinja 0.30% Jupyter Notebook 3.24% JavaScript 0.12%

esmvalcore's People

Contributors

Stargazers

Watchers

esmvalcore's Issues

Handling ACCESS datasets and daily fields in piControl scenarios

@jvegasbsc

I open this issue because I am having two different problems when handling datasets from CMIP5 piControl scenario.

The preprocessor does not handle ACCESS (ACCESS1-0 and ACCESS1-3) monthly averaged fields, because of the following problem with the time coordinate: ValueError: day is out of range for month
. I wrote a few lines of code to be included in the fix_file esmvaltool/cmor/_fixes/CMIP5/ACCESS1_0.py:

import cf_units
from iris.coords import DimCoord
from iris.exceptions import CoordinateNotFoundError

from ..fix import Fix

class time(Fix):
    """Fixes for time"""

    def fix_metadata(self, cube):
        """
        Fix metadata
        Fixes cube units
        Parameters
        ----------
        cube: iris.cube.Cube
        Returns
        -------
        iris.cube.Cube
        """
        cube.units = Unit("days")
        return cube

but it does not seem to help. Moreover, this would cause problems, when handling both daily and monthly fields, as in my case.

It could be that it does not mean at all, but the problem does not arise when processing the same fields from the same model in the historical scenario (this makes me wondering if the issue is related to the absence of the fix_file).

When handling both daily fields of HadGEM2-ES in piControl scenario, I got this error: ValueError: The shape of the bounds array should be points.shape + (n_bounds,). In this case, a fix_file for HadGEM2-ES is present, but it does not seem to fulfill the task of fixing the daily fields. Is it possible to update it accordingly?

Synda download improvements

esmvaltool can automatically download data using synda (if it is installed and configured with a user account) by specifying the --synda-download flag. The currently implemented functionality is sufficient for getting some data if you're a new developer on the project and want to be able to run the tool, but various improvements can be made. The following improvements have been suggested by @valeriupredoi

a check for the synda executable should be performed (this is needed e.g in case the code will be run in parallel and we don't want the thing to break);
whereas sufficiently robust as it's now, there are corner cases that should be accounted for: files may exist locally that partially span the data request, and synda will find these plus a few more that complete the data request but in a partial manner and not FULLY -- in that case I believe the best practice is NOT to download anything IF the diagnostic can not be run on partial data sets;
I would switch from synda get to synda install, the latter is performing a number of checks that get doesn't -- versioning and file integrity checks; in this case the install is done inside the synda DRS (I don't think we can tell synda to install in a custom path);
careful with checks of file existence and writing to paths -- files may exist on data archives (BADC, DKRZ) but writing to disk is forbidden there;
Have a look at the cache file creator tool I wrote and works pretty well for what I reckon are all the corner cases -- it's a large code but you can mix and match quite a few things that are already there and put them in _download.py :)

https://github.com/valeriupredoi/cmip5datafinder/blob/master/cmip5datafinder_v2.py

Availability of CMOR tables

In PR #195 the option to define custom CMOR tables in the config-developer.yml file was introduced. This raises the question of how to retrieve such tables and what happens if you try to run the tool but do not have the CMOR tables matching your data. It would be best to install such tables as dependencies, or include them with the tool, or add some other automatic mechanism for retrieving them.

Fix iris (>2) warnings

Since we will switch to iris 2 very soon now (at least I hope so 😁), it would be nice to address warnings which still appear in the new version. I will list some of them which regularly appear in my runs, but this is definitely not a complete list. Anyone is invited to add further items.

/iris/coords.py:1355: UserWarning: Collapsing a non-contiguous coordinate. Metadata may not be fully descriptive for 'year'.
This warning appears when the time coordinate is collapsed. The reason are the temporal AuxCoords year, day_of_year, etc. which do not have bounds (which makes sense), so it would be best to disable this warning for those certain coordinates. It also appears when any other coordinate without bounds is collapsed.
/numpy/ma/core.py:3174: FutureWarning: Using a non-tuple sequence for multidimensional indexing is deprecated; use arr[tuple(seq)] instead of arr[seq]. In the future this will be interpreted as an array index, arr[np.array(seq)], which will result either in an error or a different result.
This appears somewhere in the iris code (I do not know exactly where) and should probably be fixed by iris itself. edit This one will be fixed in the next Iris release: SciTools/iris#3212 (comment)
iris/fileformats/cf.py:798: UserWarning: Missing CF-netCDF measure variable 'areacella', referenced by netCDF variable 'tas'
Warnings like this appear very often in the preprocessor. Most likely it's because of the cell_measures attribute in the raw cubes (e.g. tas:cell_measures = "area: areacella"). Would it break something if we removed this attribute?
iris/__init__.py:237: IrisDeprecation: setting the 'Future' property 'cell_datetime_objects' is deprecated and will be removed in a future release. Please remove code that sets this property.
This should be solved by removing the appearances of this property in the code (in esmvaltool/cmor/check.py and esmvaltool/diag_scripts/autoassess/stratosphere/strat_metrics_1.py).
iris/analysis/cartography.py:394: UserWarning: Using DEFAULT_SPHERICAL_EARTH_RADIUS.
This appears when grid cell areas are calculated by iris.analysis.cartography.area_weights. The warning does not appear if the Earth radius is specified in the cube. Alternatively, we could use the fx_files to calculate areas.

Not iris warnings, but still appear very often:

WARNING There were warnings in variable nbp: nbp: attribute positive not present
This appears for almost every CMIP5 model with this attribute, so in my opinion we should either disable this warning completely or demote it to a debug message. Another possibility would be to add it automatically during the CMOR check.

I think these are the most frequent ones and it should not be too hard to get rid of them.

Preprocessor generality

Some preprocessors like zonal_mean and area_average have recently (#825) been expanded to allow more generic operation beyond just a simple mean. This means that while they're called mean or average, they can also do medians, standard deviations, variance, etc.

It would be great to add this functionality to the other spatial and temporal preprocessors:

average_volume
seasonal_mean
time_average

While we're there, we should also change zonal_mean's argument from mean_type to operation to match average_region.

We wouldn't even need to change our recipes if we kept the current preprocessors names like zonal_mean or average_volume as special cases of the more generic functions.

CMIP6 fx files stored in different directories, depending on realm

Not all fx files for CMIP6 models are located in the fx/ directory. For example, 'areacello' is stored under Ofx/, 'sandfrac' under Efx/. See also example below.

Therefore, config-developer.yml needs to be extended to also check the other *fx/ directories when searching for the fx files specified in the recipe.

Example:
/CNRM-CERFACS/CNRM-CM6-1/historical/r1i1p1f2/fx/
/CNRM-CERFACS/CNRM-CM6-1/historical/r1i1p1f2/Efx/
/CNRM-CERFACS/CNRM-CM6-1/historical/r1i1p1f2/Ofx/

Add preprocessor settings to recipe schema

At the moment there is no check that the values provided as preprocessor function arguments are within range (for numerical values) or allowed strings. This could easily be implemented by adding a preprocessor definition in esmvaltool/namelist_schema.yml.

Preprocessor feature request: Add a way to specify Dataset start_month and end_month

Hi,

I'm attempting to look at the South Hemisphere Summer seasonal mean, (DJF), so I've used the extract_season preprocessor. If I run the analysis from January to December, then the first seasonal_year includes only January and February, and the final seasonal_year only includes December. These data are not comparable, and shows a huge bias at the first and last points:

. To remove the erroneous months from the analysis, I've added the start_month and end_month fields to the dataset dictionary.

I'm using the following dataset:

{dataset: HadGEM2-CC, project: CMIP5, exp: historical, ensemble: r1i1p1, start_year: 1989, end_year: 2004, start_month: 3, end_month: 10}

However, the start_month and end_month fields are not taken into account by the preprocessor, and the preprocessed data runs from January 1889 to December 2004.

Is this a bug? Can someone reproduce it? what is the problem? Cheers!

Can't read models with hybrid sigma pressure coordinate

I need to use models with hybrid sigma pressure coordinate:
e.g.: Model: CMIP5 MPI-ESM-LR historical, Variable: cli

Who could extend the preprocessor to recognize these files and convert them to pressure coordinate?

plot_file in diagnostic_provenance.yml cannot be a list

More than one plot file could be produced from the same diagnostic file, so it makes sense for 'plot_file' to be a list but currently this fails in esmvaltool/_provenance.py

Add YAML schema for diagnostic_provenance.yml and improve error messages/checks

At the moment diagnostic developers either get no error message at all, or various obscure error messages if they make a mistake in their diagnostic_provenance.yml file. It would be nicer if we could get a more clear error message by using a Yamale schema to check it.

Extract odd-shaped (irregular) regions

I would like to extract regions in the preprocessor of the ESMValTool which are not rectangular but as they are defined in the IPCC AR5 or AR6. For example to create the panels for a figure like this:

Is there a possibility to include this in the preprocessor functions?

At the moment I do not have the specific region definitions but I could share them when I received them.

Multi-model after time average

Hey,

I'm hoping that someone can help me figure out whats going wrong here. I'm trying produce a multi-model mean of a 2D (x-z dimensional) field. It's a fairly complex preprocessor, several of the stages can be quite slow, and I'll need to run it over lots (dozens?) of model datasets. With that in mind, I'm trying to keep it lightweight:

  prep_transect: # For extracting a transect
    custom_order: true
    time_average:
    regrid:
      target_grid: 1x1
      scheme: linear
    zonal_means:
      coordinate: longitude
      mean_type: mean
    extract_levels:
      levels: [0.1, 0.5, 1, 10, 20, 40, 80, 120, 160, 200, 240, 280, 320, 360, 400, 440, 480, 520, 560, 600, 640, 680, 720, 760, 800, 840, 880, 920, 960, 1000, 1200, 1400, 1600, 1800, 2000, 2200, 2400, 2600, 2800, 3000, 3200, 3400, 3600, 3800, 4000, 4200, 4400, 4600, 4800, 5000, 5200, 5400, 5600, 5800]
      scheme: linear
    multi_model_statistics:
      span: full
      statistics: [mean, ]

(The extract_levels field is a bit silly, please don't worry about it too much.)

The problem that I'm seeing now is that the multi_model_statistics part doesn't produce any results. I think that this is because it can't find a time overlap between the files:

2019-03-12 15:56:35,921 UTC [29013] DEBUG   esmvaltool.preprocessor._multimodel:304 Multimodel statistics: computing: ['mean']
2019-03-12 15:56:35,923 UTC [29013] INFO    esmvaltool.preprocessor._multimodel:313 Time overlap between cubes is none or a single point.check datasets: will not compute statistics.

The first step of the preprocessor is to take a time average, as this reduces the workload of the function by an order of magnitude or more. However, I suspect that this is the reason why it can't find any overlap in the time range between the models.

Perhaps people can suggest a better way to do this - or perhaps a way to get the multi-model mean function to ignore the time overlap?

Cheers!

Write diagnostic output to stdout when running tasks sequentially

At the moment, diagnostic output is always written to file only. It would be nice to also write it to stdout when running tasks sequentially, so developers and users can see what is going on without having to open the log file.

Problem preprocessing ACCESS1-3 and IPSL-CM5A-LR piControl data

This issue is related to porting the diagnostic tool for thermodynamics into version 2.0 of ESMValTool. Progress is tracked in this branch.

EDIT: the recipe has been ported, but there are remaining issues with preprocessing piControl data for experiments ACCESS1-3 and IPSL-CM5A-LR.

Problem with native grids in `_check_dim_names` for CMIP6 data

The cmor checker complains as follows about some CMIP6 data:

Coordinate longitude has var name longitude instead of lon
 Coordinate latitude has var name latitude instead of lat

I think this is because it checks whether the var_name agrees with the out_name of the respective cmip6 table. Unfortunately, this seems not to be in line with the intentions of CMIP6, see this discussion at the cmip6 tables.

So requiring presence and var_name for coordinates is too much. I don't know how to solve this generally. Maybe a special exception for spatial coordinates is the best approach?

additional_datasets - Obs4MIPS `day of year` failure

Hi,

I'm evaluating the CMIP5 sea surface temperature field, tos, using the Obs4MIPs dataset, tos_ATSR_L3_ARC-v1.1.1_199701-201112.nc. This dataset can be obtained here.

For an unknown reason, the preprocessor fails to calculate the day_of_year value and instead sets all values to 1. Ie:

In [2]: nc01 = ncdfView('01_fix_metadata.nc')
In [3]: nc01('day_of_year')
Out[3]: 
array([1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1])

This occurs in the early stages of the preprocessor, between 00_load_cubes.nc (which does not include the day_of_year field) and 01_fix_metadata.nc, where it is broken.

I can assume that might be fixable by setting a fix_metadata function for the additional dataset. How do I do that?

Thanks,

Lee

Specify directory stucture per path instead of per project in config-user.yml

At the moment all drs need to be the same per project, while it would make much more sense to be able to specify the drs per rootpath.

To make this more clear, now we have this:

rootpath:
  # Rootpath to CMIP5 data
  CMIP5:
    - /badc/cmip5/data/cmip5/output1
    - ~/some_other_path

# Directory structure for input data: [default]/BADC/DKRZ/ETHZ/etc
# See config-developer.yml for definitions.
drs:
  CMIP5: BADC

while I think this would be much more useful:

# Directory structure for input data in the format
# input_data:
#   project_a:
#     /path/to/dir_1: drs_1
#     /path/to/dir_2: drs_2
#   project_b:
#     /path/to/dir_3: drs_2
# See config-developer.yml for valid drs definitions per project.
input_data:
  CMIP5:
    /badc/cmip5/data/cmip5/output1: BADC
    ~/some_other_path: default

cmor check fails for cmip6 variable residualFrac

When trying to ingest variable residualFrac (an area fraction of all that is not bare, crop, trees, grass o pastures) from cmip6 data (for example the CNRM-CM6-1 dataset on jasmin) esmvaltool fails complaining:

esmvaltool.cmor.check.CMORCheckError: There were errors in variable residualFrac:
type: standard_name should be , not None

The problem is that in "esmvaltool/cmor/tables/cmip6/Tables/CMIP6_coordinate.json" we find

"typeresidual": {
            "standard_name": "",

While this is verbatim what is written in the official cmor3 tables, it is also quite absurd because the types for other similar variables like grassFrac have a "typenatgr": { "standard_name": "area_type"}.
Anyway the problem comes from the fact that the CNRM-CM6-1 cmor3 files for variable residualFrac do not have any attribute type:standard_name, hence the error "type: standard_name should be , not None".

I tried to signal this issue upstream (in my opinion it seems a real inconsistency in the cmor3 tables), see PCMDI/cmip6-cmor-tables#232 and cmip6dr/CMIP6_DataRequest_VariableDefinitions#381 but the discussion ended in leaving things as they are for now.

The problem can be reproduced on jasmin by adding variable residualFrac to any recipe, for the cmip6 dataset CNRM-CM6-1 .
I was able to make temporarily things work by adding the attribute "type:standard_name=area_type" to residualFrac_Lmon_CNRM-CM6-1_amip_r1i1p1f2_gr_197901-201412.nc and changing esmvaltool/cmor/tables/cmip6/Tables/CMIP6_coordinate.json to say

"typeresidual": {
            "standard_name": "area_type",

but this is only a temporary patch.

So how do we make sure that esmvaltool accepts these files anyway leaving the original cmor3 json files and the input files untouched?

Regridding to reference dataset fails sometimes

Not an urgent issue, but something I noticed:

Regridding to a reference dataset, e.g.

regrid_to_ref:
  regrid:
    target_grid: GFDL-CM3
    scheme: linear

invokes a simple iris.load_cube

https://github.com/ESMValGroup/ESMValTool/blob/43598ee48a327fc3ef2a6e53768324d02ad61863/esmvaltool/preprocessor/_regrid.py#L169

on the raw nc file, which may fail in rare cases if the file contains multiple variables (e.g rsdt of GFDL-CM3), which leads to errors like this:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "~/anaconda3/envs/esmvaltool/lib/python3.6/site-packages/iris/__init__.py", line 376, in load_cube
    raise iris.exceptions.ConstraintMismatchError(str(e))
iris.exceptions.ConstraintMismatchError: failed to merge into a single cube.
  cube.long_name differs: 'Length of average period' != 'End time for average period'
  cube.var_name differs: 'average_DT' != 'average_T2'
  cube.units differs: Unit('days') != Unit('days since 1860-01-01 00:00:00', calendar='gregorian')

I think we need an advanced loading function here.

Add CMORizer for the EMAC model

The tool has the capability to convert raw model output of specific models to the CF/CMOR standard.

A NCL-based reformatter is available in v1 (for EMAC, GFDL, EC-EARTH and GO).

It can be used as a template to rewrite it in v2 (using Iris?) based on the functionalities already available in _reformat.py.

Testing new preprocessor

We should add tests for the new backend.

A lot of work has already been done, most recently @jvegasbsc added to the tests here:

https://github.com/ESMValGroup/ESMValTool/tree/REFACTORING_fixes/tests

Collapsing cubes with weights kwarg very inefficient (full realization of data, twice)

two used cases from running a recipe and profiling it via debug and resource file:

First instance example
-----------------------
2019-02-26 14:03:40.811397      7250.1  7576.5  108     2.7     0       0.606   0.006
2019-02-26 14:03:41.897243      7251.2  7577.7  108     3.8     0       0.606   0.006
2019-02-26 14:03:42.984943      7252.3  7578.9  108     5.8     0       0.606   0.006
2019-02-26 14:03:44.071390      7253.4  7580.0  108     6.4     0       0.606   0.006
2019-02-26 14:03:45.156149      7254.5  7581.2  109     7.9     0       0.606   0.006
2019-02-26 14:03:46.243115      7255.5  7582.4  108     9.4     0       0.606   0.006
2019-02-26 14:03:47.341183      7256.6  7583.6  109     11.1    1       0.606   0.006
2019-02-26 14:03:48.465922      7257.8  7584.8  111     11.7    1       0.606   0.006
2019-02-26 14:03:49.567116      7258.9  7586.0  110     13.2    1       0.606   0.006
2019-02-26 14:03:50.656928      7260.0  7587.1  96      0.6     0       0.606   0.007

2019-02-26 14:03:41,277 UTC [16489] DEBUG   esmvaltool.preprocessor:197 Running preprocessor step average_region
2019-02-26 14:03:41,280 UTC [16489] DEBUG   esmvaltool.preprocessor:187 Running area_average(precipitation_flux / (kg m-2 s-1)   (time: 1740; latitude: 360; longitude: 720)
2019-02-26 14:03:41,284 UTC [16489] INFO    esmvaltool.preprocessor._area_pp:179 Calculated grid area:(1740, 360, 720)
2019-02-26 14:03:50,494 UTC [16489] DEBUG   esmvaltool.preprocessor:197 Running preprocessor step cmor_check_data


Second instance example
-----------------------
2019-02-26 14:36:54.558816      9243.9  9778.3  108     12.4    1       0.708   0.007
2019-02-26 14:36:55.667920      9245.0  9779.5  111     13.5    1       0.708   0.007
2019-02-26 14:36:58.053553      9247.4  9781.9  100     0.5     0       0.708   0.007

2019-02-26 14:17:33,671 UTC [16489] INFO    esmvaltool.preprocessor._area_pp:179 Calculated grid area:(1740, 360, 720)
2019-02-26 14:36:57,901 UTC [16489] DEBUG   esmvaltool.preprocessor:197 Running preprocessor step cmor_check_data

I am assigning this to me since if I can't optimize it myself then I can talk to Bill next week.

Make user configuration file more user friendly

Do not use the template as a default because it never works
If no config file is suppllied, look if a file ~/.esmvaltool/config-user.yml exists and use that
If it doesn't exist, create it and tell the user to customize it (at least the paths section). If the user is on a known cluster, e.g. Jasmin, Mistral, etc, automatically use the right path

Missing frequency using CMIP6 and obs4mips datasets

Executing a recipe with variables from CMIP6 Omon table (e.g., chl) generates an error in fix_metadata of model dataset ..

Here below the error message

2019-03-18 17:14:48,358 UTC [17891] ERROR   Failed to run fix_metadata([<iris 'Cube' of mass_concentration_of_phytoplankton_expressed_as_chlorophyll_in_sea_water / (kg m-3) (time: 180; Vertical T levels: 75; -- : 294; -- : 362)>], {'project': 'CMIP6', 'dataset': 'CNRM-ESM2-1', 'short_name': 'chl', 'cmor_table': 'CMIP6', 'mip': 'Omon', 'frequency': ''})
2019-03-18 17:14:48,991 UTC [17891] ERROR   Program terminated abnormally, see stack trace below for more information
Traceback (most recent call last):
  File "/users/home/ans033/GIT/ESMValTool/esmvaltool/_main.py", line 228, in run
    conf = main(args)
  File "/users/home/ans033/GIT/ESMValTool/esmvaltool/_main.py", line 156, in main
    process_recipe(recipe_file=recipe, config_user=cfg)
  File "/users/home/ans033/GIT/ESMValTool/esmvaltool/_main.py", line 206, in process_recipe
    recipe.run()
  File "/users/home/ans033/GIT/ESMValTool/esmvaltool/_recipe.py", line 1050, in run
    self.tasks, max_parallel_tasks=self._cfg['max_parallel_tasks'])
  File "/users/home/ans033/GIT/ESMValTool/esmvaltool/_task.py", line 581, in run_tasks
    _run_tasks_sequential(tasks)
  File "/users/home/ans033/GIT/ESMValTool/esmvaltool/_task.py", line 592, in _run_tasks_sequential
    task.run()
  File "/users/home/ans033/GIT/ESMValTool/esmvaltool/_task.py", line 223, in run
    input_files.extend(task.run())
  File "/users/home/ans033/GIT/ESMValTool/esmvaltool/_task.py", line 226, in run
    self.output_files = self._run(input_files)
  File "/users/home/ans033/GIT/ESMValTool/esmvaltool/preprocessor/__init__.py", line 392, in _run
    product.apply(step, self.debug)
  File "/users/home/ans033/GIT/ESMValTool/esmvaltool/preprocessor/__init__.py", line 259, in apply
    self.cubes = preprocess(self.cubes, step, **self.settings[step])
  File "/users/home/ans033/GIT/ESMValTool/esmvaltool/preprocessor/__init__.py", line 201, in preprocess
    result.append(_run_preproc_function(function, items, settings))
  File "/users/home/ans033/GIT/ESMValTool/esmvaltool/preprocessor/__init__.py", line 187, in _run_preproc_function
    return function(items, **kwargs)
  File "/users/home/ans033/GIT/ESMValTool/esmvaltool/cmor/fix.py", line 116, in fix_metadata
    checker(cube).check_metadata()
  File "/users/home/ans033/GIT/ESMValTool/esmvaltool/cmor/check.py", line 104, in check_metadata
    self.report_errors()
  File "/users/home/ans033/GIT/ESMValTool/esmvaltool/cmor/check.py", line 121, in report_errors
    raise CMORCheckError(msg)
esmvaltool.cmor.check.CMORCheckError: There were errors in variable chl:
time: Frequency  not supported by checker
in cube:

Note that in fix_metadata the frequency field is empty.

For some reason, the code is not picking up the frequency field from the variable of the CMIP6 table, but if I hack the CMIP6_Omon.json table by adding the frequency in the file header (as it was before the last CMIP6 tables update) the code goes through the step.

As a first guess, something maybe wrong in getting frequency from variables for CMIP6 table here around
https://github.com/ESMValGroup/ESMValTool/blob/f5eb5a3657325cc5d7c153a13856114c1ff29be0/esmvaltool/cmor/table.py#L104-L116

Do not realize data and use dask in derivation functions

As reported by @bouweandela, many derivation functions still realize the data and use numpy instead of dask. This is detrimental for the performance and should be changed.

Affected variables:

amoc
gtfgco2 should not be needed anymore, there is a preprocessor function for this
sm
toz

Improved debug statements needed in preprocessor (please)

I'd really like some quality of life improvements with regards to improved debug statements.

In complex recipes, I'm finding it very hard to figure out exactly where a debug statement comes from.
ie:

2019-02-20 11:37:37,841 UTC [25393] DEBUG   Running preprocessor step extract_levels
2019-02-20 11:37:37,842 UTC [25393] DEBUG   Running extract_levels(sea_water_potential_temperature / (K) (time: 1; depth: 102; latitude: 180; longitude: 360)
     Dimension coordinates:
          time                             x         -              -               -
          depth                            -         x              -               -
          latitude                         -         -              x               -
          longitude                        -         -              -               x
     Auxiliary coordinates:
          day_of_month                     x         -              -               -
          day_of_year                      x         -              -               -
          month_number                     x         -              -               -
          year                             x         -              -               -
     Attributes:
          conventions: CF/CMOR
          host: pmpc1564.npm.ac.uk
          lonFlip: longitude coordinate variable has been reordered via lonFlip
          reference: Locarnini et al., World Ocean Atlas 2013, Vol. 1: Temperature, 2013
          source: https://data.nodc.noaa.gov/woa/WOA13/DATAv2/
          tier: 2
          title: WOA data reformatted for the ESMValTool v2.0
          user: ledm, {'levels': [0.0], 'scheme': 'linear_horizontal_extrapolate_vertical'})

Tells me which preprocessor is applied to which cube, but can't say anything about which diagnostic is calling it or which preprocessor chain is being applied. In a simple recipe, this isn't a problem, but when recipes start to contain multiple preprocessor chains and diagnostics it's very confusing.

This information will become incresingly crucial for creating and debugging more complex recipes.

In my own local branch, I've tried to print out as much as possible in PreprocessorFile.apply() and _run_preproc_function but neither of these methods appear to be aware of the overarching diagnostic or preprocessor chain name.

It would be really nice to link these in, so that debug messages are more useful. Thanks!

How to regrid fx files

I would like to regrid an fx file - sftlf - onto a 5x5 grid. I need this as input to an existing python package which is expecting tas, tos, siconc and sftlf in NetCDF files on a 5x5 grid, to produce masked and blended GMST. I can regrid tas, tos and siconc onto a 5x5 grid using the regrid preprocessor, by adding this to the recipe like this:

preprocessors:

  regrid_5_5:
    regrid:
      target_grid: 5x5
      scheme: linear

diagnostics:

  fig_test_attribute:
    description: Test of masked and blended surface temperature.
    variables:
      tas:
        preprocessor: regrid_5_5   
        mip: Amon
        field: T2Ms
        project: CMIP6
        exp: historical
        grid: gr
        start_year: 1850
        end_year: 2014
    additional_datasets:
      - {dataset: CNRM-CM6-1, ensemble: r1i1p1f2}

This works OK. But if I try to do the same for sftlf by adding this:

      sftlf:
        preprocessor: regrid_5_5
        mip: fx
        field: F2Ms
        project: CMIP6
        exp: historical
        grid: gr

Then I get this:
esmvaltool._recipe_checks.RecipeError: Missing keys {'end_year', 'start_year'} from variable sftlf in diagnostic fig_test_attribute
and the recipe does not run.

If I add a dummy start year and end year, then I get:

  raise RecipeError("No input files found for variable {}".format(var))
esmvaltool._recipe_checks.RecipeError: No input files found for variable {'preprocessor': 'regrid_5_5', 'mip': 'fx', 'field': 'F2Ms', 'project': 'CMIP6', 'exp': 'historical', 'grid': 'gr', 'start_year': 1850, 'end_year': 2014, 'variable_group': 'sftlf', 'short_name': 'sftlf', 'diagnostic': 'fig_test_attribute', 'dataset': 'CNRM-CM6-1', 'ensemble': 'r1i1p1f2', 'recipe_dataset_index': 0, 'cmor_table': 'CMIP6', 'institute': ['CNRM-CERFACS'], 'standard_name': 'land_area_fraction', 'long_name': 'Land Area Fraction', 'units': '%', 'modeling_realm': ['atmos'], 'frequency': 'fx', 'filename': '/scratch/b/b380746/esmvaltool_output/test_attribute_20190307_231334/preproc/fig_test_attribute/sftlf/CMIP6_CNRM-CM6-1_fx_historical_r1i1p1f2_F2Ms_sftlf_1850-2014.nc'}

and again the recipe doesn't run.

I thought that specifying the field F2Ms should mean that years are not required, but this doesn't seem to be implemented. I don't actually want to mask the tas, tos, or siconc with sftlf at this point, I just want the regridded sftlf file. (The masking a blending code for example uses tas over sea ice, so masking tas with sftlf would not be helpful).

Thanks very much!

Add tests for recipe.py and task.py

Once most functionality is supported.

Data finder: add test cases for observations, SMHi, BSC, CMIP6

If anyone has access to one of the SMHI/BSC filesystems or CMIP6/OBS/ana4mips/obs4mips/CCMVal1/CCMVal2/EMAC datasets, can you add test cases in the file tests/integration/data_finder.yml?

Irregular grid preprocessors (beyond regriding)

Two preprocessors deploy the area_weights function from iris.analysis.cartograhy.py:

area_average from _area_pp.py
-volume_average from _volume_pp.py

Unfortunately, the area_weights function is not able to work with irregular grids. Even the recent iris version 2.2.0 contains the following code for irregular latitude/longitude grids.

    if lat.ndim > 1:
        raise iris.exceptions.CoordinateMultiDimError(lat)
    if lon.ndim > 1:
        raise iris.exceptions.CoordinateMultiDimError(lon)

To resolve these issues, we could:

Raise an issue with Iris, and wait for iris to add irregular grid support.
Add a regrid preprocessor before using these preprocessors.
Write a new area calculation function.
Load the area fields from somewhere else.

I'm leaning towards option 4, but cell area/volume doesn't seem to be built into the cmip5 data at the moment. However, it should be somewhere, right? I would have thought that cell area/volume would be available from an iris cube, without having to recalculate.

Make preprocessor dictionary available to the diagnostic script

The preprocessor settings (levels, target_grid, etc.) used by a given diagnostic shall be made available to the diagnostic script itself via the temporary file (ncl.interface for NCL diagnostics).

Feature request - ESMValTool html reports

For the past two years, we've been using BGC-val to evaluate the marine component of the spin up and CMIP6 runs of the UK Earth System Model (UKESM). One of the main selling points of BGC-val, was that it can summarise the results of its metrics in a simple, clear, shareable and mobile-friendly html website. Here's an older example report that we showed at EGU2017.

The report maker tool was crucial for the uptake in use of the tool and it allowed several users to share monitoring and evaluating the model. Similarly, the ILAMB interface is very popular among land surface modellers for this reason, among others. I'd like to implement a similar feature for ESMValTool. I've discussed this with @valeriupredoi a few times, and he recommended starting a discussion issue on github.

This is how I envisage the interface with ESMValTool:

A MakeReport Boolean will need to be added to the global config.
The html will be added to a new directory, report, alongside plots, preproc, run and logs. This new directory will contain an index.html file, an images folder and an assets directory. The index.html file will contain the bulk of the report. The report directory would be fully portable, and would contain everything needed to show the report.
The report maker would have to run run after the diagnostics have completed, and search recursively through the plots folder of the output directory, adding every figure to the index.html page.
Each image would be shown alongside some metadata in a table. The metadata would include information like original dataset, scenario, ensemble member, preprocessors, diagnostic scripts, etc.... In some cases, metadata can be added directly into the image file metadata. However, this might not be compatible with all supported image formats. In either case, we'd need to make a wrapper for pyplot.savefig/iris.savefig which adds metadata into the image.
The report maker would likely not be compatible with tools that already do this, like autoassess, but would instead function as a "native" ESMValTool version of such tools.

I'm happy to reuse the code that we had in BGC-val. It would need a bit of tweaking, but wouldn't be a major issue. The HTML was initially taken from html5up.net and was under a creative commons licence. This was a html5 ready, mobile friendly, scalable website template. However, I'm not 100% confident that this would be the wisest option. It may be more sensible to built a new report template from scratch. Do we have any html experience in ESMValTool?

I'm not the best html coder, nevertheless, I'm happy to take a crack at this. However, I'd like to have the discussion first, to determine whether there is support for the idea, but also whether it would be feasible and worthwhile.

Do ESMValTool developers think this would be worthwhile? Would people use it? Is there something like this already implemented?
Would there be licensing issues with the html5up licence?
Is it sensible or possible to run a python tool after the diagnostics have completed?
How can we associate image metadata with individual figures? What metadata would we want to include?
Can we add a ESMValTool wrapper for pyplot.savefig?
Is the format that we showed at EGU2017 appropriate or do you think we could make some improvements?

Incidentally, point 5 here may be a good way to enforce a specific ESMValTool image format and style. This would be a great way to build an ESMValTool standard image style, which would help with brand recognition. Or it would be a way to make sure our images are appropriate for AR6 or specific journals with style requirements.

Validate diagnostic settings and provide default values for settings not specified in recipe

At the moment diagnostic settings are validated inside the diagnostic script and default values are also provided there. This means that every diagnostic script needs to provide it's own validation code, leading to duplicated effort. Also, default values are not visible for the user. It would be better if esmvaltool could provide the diagnostic scripts it runs with already validated settings, which it derives from the namelist and augments with default values if required. See also the discussion here: ESMValGroup/ESMValTool#189.

Validation could be done using Yamale schema against a schema provided by the diagnostic script developer, but that does not provide a way to specify default values for arguments not specified in the namelist. Another option could be to use the python argparse module. Or maybe there are even better alternatives available?

Add command line option to display available recipes

Create an overview table of custom CMOR entries.

When more observational datasets are added to ESMValTool, the number of custom CMOR entries will grow. The .dat files have consistent but short names. Therefore, it can be hard to find if a certain ('flavour' of a) variable is listed already. An overview table of all the inputs in the folder
'~/ESMValTool/esmvaltool/cmor/tables/custom' would be helpful.

Would you be able to help out?
Yes, from early May onwards.

Standard variable names in OBS

For cmorizing the observational data we adopt the CMOR standard from the CMIP5 Tables.

The problem is that the metadata defined in the CMIP6 Tables are sometimes different even for the same variable.

For example "Sea Ice Area Fraction" is sic in CMIP5 and siconc in CMIP6, and the mip is also different (OImon vs. SImon). In some cases even the units are different: "Mole Fraction of Ozone" is tro3 with units of 1.e-9 (ppb) in CMIP5 and o3 with units mol/mol in CMIP6.

Since we want our OBS to be usable across the various CMIP phase and we do not want to duplicate the data just changing the name or the units, we would need to find a way to map the CMIP5 metadata
to CMIP6 (or other projects if needed).

The idea would be to have a yml file containing this mapping, using the CMIP5 Tables as reference.

Would that work? @jvegasbsc @bouweandela @valeriupredoi

Outdated CMOR CMIP6 Tables

In the current version2_development branch (up to commit d72f6c3) CMIP6 tables are quite old (about 11 months) and several changes occurred in the data request definition.

I guess it would be good to have a fresh version in the repository and possibly add also a note about the reference Dreq version (likely in cmip6 table README.md).

Add YAML schema for config-user.yml

At the moment users get various obscure error messages if they make a mistake in their config-user.yml file. It would be nicer if we could get a more clear error message by using a Yamale schema to check the input.

Output pdf report with all plots and metadata for recipe

For the MAGIC project we would like to offer users (of our web portal) a way to download information on a namelist. For this we would like ESMValTool to be able to create a multi-page report for a namelist, with all plots, references, perhaps provenance info, etc, for the namelist run.

Preprocessor memory usage is excessive and error messages unclear.

This issue was raised in pr #763 and was also raised by Rob Parker, from the UKESM project.

The recipe recipe_ocean_bgc.yml memory explodes when the time range is changed to a more realistic range. The time range is currently 2001-2004, but we may be interested in, for instance, the range 1950-2010.

The second problem is that the warning messages and error messages related to Memory problems in ESMValTool are unclear. It's not obvious which part of this recipe causes the memory problems.

Apply cmor checks (and fixes) to fx-files

Going through the fx data in CMIP5, I've noticed that there are several issues with the mask variables sftlf and sftof which needs to be fixed (wrong units, wrong coordinates, wrong fillvalues, etc.), but this is currently not implemented, although some fixes are available (see for example here).

This affects both, the use of fx variables in the diagnostics (via fx-files key in the variable dict) and the use of fx variables for the masking (via mask_landsea key in the preprocessor dict).

In principle, whenever a fx-variable is needed, it should be read, cmor-checked, (fixed) and saved in the work_dir / preproc_dir, as for any other variable. The corresponding path should be then made available to the diagnostics if the fx-files key is specified in the recipe (as it happens now with the original input fx-file).

Check if all current fix_file are needed

After #638 is merged, some of the fix_file that we have can probably be converted to fix_metadata. If we made the change, models affected will be processed much faster than now

Extensions to PR #763

Comments from @bouweandela about PR ESMValGroup/ESMValTool#763.

There are still several things to be done, but if you're really a in hurry to merge, @ledm can put them in an issue and do it later:

Preprocessor

I realised this only late, but the new preprocessor functionality is not covered by unit tests, this reduces the reliability and maintainability of the preprocessor and should be addressed.
There new fx_files argument for those two preprocessor functions is still strange: why do you have a dictionary in which you will only ever expect one item. I think it should just be fx_file and the argument should be the name of the file or None if you don't have a file.
There are Codacy issues remaining, the complexity/too many local variables issues are serious because they make the volume_average function difficult to understand and maintain

esmvaltool/preprocessor/_area_pp.py
Use % formatting in logging functions and pass the % parameters as arguments
179
logger.info('Calculated grid area:{}'.format(grid_areas.shape))

esmvaltool/preprocessor/_volume_pp.py
Too many local variables (24/15)
162

def volume_average(

volume_average is too complex (11)
162

def volume_average(

Ocean Diagnostics package

A lot of duplication i.e. copy/paste style of coding is still reported by Codacy, ESMValGroup/ESMValTool#763 (comment). This code should be moved into functions to keep it maintainable. If you later want to change something about this code, you will have to search all over the place for copies.

Determine `activity` from `exp`

In CMIP6 the activity_id that is part of the path is determined by the experiment_id. Hence, I suggest we determine it from that information in a similar way as we already find institutes from source_ids, thus making it optional to specify them explicitly in the recipe.

Preprocessing overlapping data not implemented yet

Hi,
I am having an issue with the preprocessing when trying to run a diagnostic which uses two variables from a single model:
https://github.com/ESMValGroup/ESMValTool/blob/MAGIC_BSC/esmvaltool/recipes/recipe_diurnal_temperature_index_wp7.yml

An unexpected problem prevented concatenation.
Expected only a single cube, found 2.

I guess this is just a minor error in our .yml file. Do you have an example of any other diagnostics which load two (or more) variables from a single model for use as input for a single R script (or Python script).

CESM1-BGC co2 fix outdated -> Deletion or include version-specific checks?

The co2 fix for CESM1-BGC for fixing the units seems to have originated in v1 and was only restricted to the "esmFixClim1" experiment there. Checking the CMIP5 errata it seems as if this was fixed in 2013 and the current files no longer require this fix.

While for this case it is probably the best to simply remove this fix, I think there is a deeper issue here which will probably also come up with CMIP6 - do we expect people to always have the latest versions available or include version-specific fixes for older data wherever possible? Using this case as an example there would be 3 choices:

Implicitly expect people to have newest data. -> Lazy way
Return a warning/error when encountering versions before a known fix, pointing the user to update his data.
For simple fixes: apply fix to older versions, but also give out a warning to update data.

Comparing several single model ensembles

Hi all,

two very common things that users will want to do in ESMValTool are:

Compare single model ensemble means.
For instance, I have a HadGEM2-ES 4 member ensemble and UKESM 12 member ensemble. I want to make some time series plots showing the time development of the two single-model ensemble means.
How do I do that? Is there a way to use the multi_model_statistics preprocessor so that it only compares single models? Or perhaps we need a single_model_ensemble_statistics preprocessor?
Calculate the differences between two specific time periods in the preprocessor stage.
For instance, I want to look at average surface temperature in the years 1975-2000 minus the average surface temperature between 1875-1900.
I don't know of a way to do this in the preprocessor, it seems to be such a common job that it would be great to get it as a preprocessor instead of in the diagnostic stage.

Any ideas?

Lee

Better integration between synda and data repositories

There will be two improvements that will make synda downloads really useful:

1.(Optionally) Put synda downloads on the data repository.

Be able to complete the data in the repository with synda downloads. For example, I downloaded 5 years of data for testing and then tried to run a test for 50 years. It will fail because of the missing data even if I put the synda download option

It is not urgent nor I have time for that but wanted to keep this ideas documented

Changing unit dimensionality in depth integration pre-processor

Depth integration is a common methods used to evalaute ocean behaviour, notably in the "integrated primary production" marine biogeochemistry metric. Depth integration typically converts concentration in volume into concentration in area. This means that the units go from mol m-3 to mol m-2, and so the units need to be changed.

This function already exists in the volume preprocessor, but does not currently change the units. Basically, I want to include this command in the preprocessor:

result.units = Unit('m') * result.units

but this does not work for me.

@bouweandela suggested in issue #604: "This would require a minor modification to the code so the unit is read from the cube instead of from the cmor table of the input data when extracting the metadata, but should be possible. Please make an issue if you would like this functionality."

So here is the issue. Fingers crossed it is actually easy to resolve!

esmvalgroup / esmvalcore Goto Github PK

esmvalcore's People

Contributors

Stargazers

Watchers

Forkers

esmvalcore's Issues

Recommend Projects

Recommend Topics

Recommend Org

Jobs