Comments (22)
- A module for ensemble statistic would indeed be very useful. The ensemble statistics can be calculated at an earlier stage in the preprocessor chain, since they do not require regridding. I would suggest creating a dedicated preprocessor
ensemble_statistics
, with similar functionalities asmulti_model_statistics
(but maybe without thespan
/overlap
options?). We also need to discuss how to specify the ensemble members in the recipe: I would prefer an explicit approach, e.g. each ensemble get adataset
dictionary:
- {project: CMIP5, dataset: ACCESS1-0, ensemble: r1i1p1, ...}
- {project: CMIP5, dataset: ACCESS1-0, ensemble: r1i1p2, ...}
- {project: CMIP5, dataset: ACCESS1-0, ensemble: r1i1p3, ...}
That will probably result in a long datasets
dictionary, but it's more transparent than, e.g., automatically retrieving all available ensembles given the model name. We can also have an option to exclude the single members from the preprocessing and consider only the ensemble mean / median in the chain.
- That sounds a bit more complicated, especially how to store this in a
preproc/
nc file with a proper time coordinate and how to name this file. It could be a significant effort, we should really make sure this is going to be convenient to have it in the preprocessor chain.
from esmvalcore.
Is there a way to use the multi_model_statistics preprocessor so that it only compares single models?
Yes, just define your variables as in the examples/recipe_variable_groups.yml with one group per model. The preprocessor will process each group separately, so the multi model mean function will compute the ensemble mean.
Or perhaps we need a single_model_ensemble_statistics preprocessor?
Probably not, I think this can be done with the existing multi model preprocessor.
from esmvalcore.
But you also need to change the order of operation, since the ensemble statistics should not be performed after regridding (while the multimodel statistics do). So, in case you need both ensemble and multi model stats, I do not know whether the same module doing both would work.
from esmvalcore.
in case you need both ensemble and multi model stats
That would indeed be a bit difficult with the current setup, in that case a dedicated ensemble statistic function would probably be a good solution. The implementation could re-use the current multi model function with a little bit of extra code to group the input cubes by model.
from esmvalcore.
a la:
def _ensemble_multi_model_statistics(cfg, products, output_products, statistics):
"""Inter-ensemble multimodel statistic."""
# get data
input_data = cfg['input_data'].values()
products.cubes = iris.load(group_metadata(input_data, 'dataset'))
# apply multi model
return _multimodel.multi_model_statistics(products, 'overlap', output_products, statistics)
eh?
from esmvalcore.
@jvegasbsc and I have been looking into creating a preprocessing function to able to compute ensemble means. I have a draft in a branch that groups the input products and then calls the multi_model_statistics function per grouped dataset. But then I found this issue so maybe it's better to continue here the discussion.
from esmvalcore.
Any new insights on this? @sloosvel have you abandoned that branch or do you think it's still a viable option?
from esmvalcore.
It must be quite outdated, but I think that with #637 it will be easier to implement.
from esmvalcore.
So you'd suggest starting from scratch, or is there anything that can be re-used from your branch?
If starting from scratch, and considering that
the ensemble statistics should not be performed after regridding (while the multimodel statistics do)
would it make sense to write a new function, rather than calling multimodel stats under the hood? I suppose doing multimodel stats on consistent (same model/grid/etc) datasets could be more straightforward and maybe even lazy?
from esmvalcore.
I think it may work as a starting point because what that branch does is create an ensemble mean product and then group the input in an ensemble_statistics functions. So I can open a draft.
However I did minimal testing, so I may have overlooked issues. And with the changes in #637 I think that the function can be more flexible in terms of how the input is grouped. So that could be improved as well.
from esmvalcore.
Cool, thanks! I'll have a look
from esmvalcore.
To expand a bit on this comment: #52 (comment), the way to do ensemble statistics with the current functionality would be to create a recipe that looks e.g. like this:
preprocessors:
statistics:
multi_model_statistics:
span: overlap
statistics: [mean, median]
diagnostics:
diagnostic1:
variables:
ta_CanESM2: &var
short_name: ta
mip: Amon
project: CMIP5
preprocessor: statistics
start_year: 2000
end_year: 2002
exp: historical
additional_datasets:
- {dataset: CanESM2, ensemble: r(1:5)i(1:2)p1}
ta_MPI-ESM-LR:
<<: *var
additional_datasets:
- {dataset: MPI-ESM-LR, ensemble: r(1:5)i1p1}
ta_bcc-csm1-1:
<<: *var
additional_datasets:
- {dataset: bcc-csm1-1, ensemble: r(1:5)i1p1}
ta_GFDL-ESM2G:
<<: *var
additional_datasets:
- {dataset: GFDL-ESM2G, ensemble: r1i1p(1:5)}
Is this something that could be useful?
Another idea: would it be more useful if the groups could be created automatically? So the recipe above would be shortened to
preprocessors:
statistics:
multi_model_statistics:
span: overlap
statistics: [mean, median]
groupby: [dataset]
diagnostics:
diagnostic1:
variables:
ta:
mip: Amon
project: CMIP5
preprocessor: statistics
start_year: 2000
end_year: 2002
exp: historical
additional_datasets:
- {dataset: CanESM2, ensemble: r(1:5)i(1:2)p1}
- {dataset: MPI-ESM-LR, ensemble: r(1:5)i1p1}
- {dataset: bcc-csm1-1, ensemble: r(1:5)i1p1}
- {dataset: GFDL-ESM2G, ensemble: r1i1p(1:5)}
from esmvalcore.
@bettina-gier I believe you also mentioned you were interested in computing ensemble statistics at one of our monthly meetings, could you comment on what kind of features you would need?
from esmvalcore.
So, in case you need both ensemble and multi model stats, I do not know whether the same module doing both would work.
This can be achieved with the current functionality by doing the multi model stats in a Python diagnostic.
from esmvalcore.
But you also need to change the order of operation, since the ensemble statistics should not be performed after regridding
There is the custom: true
option for preprocessor to change the default order
from esmvalcore.
Having a dedicated ensemble_statistics preprocessor would probably be nicer than the current solution, but it would be good to have more clear idea of the use cases, especially those that cannot be done/are inconvenient with current functionality.
from esmvalcore.
My idea for a feature was more of comparing different projects, e.g. CMIP ensembles. With the CMIP6 data a lot of people will be looking at what has improved and currently there's no easy way to do the "groupby: [CMIP5, CMIP6]" as you called it above to get the multi-model mean for all CMIP5 or CMIP6 models and compare those.
I suppose with accepting cubes you can now do it in the diagnostic - but only if using python while a lot of the old IPCC figures are written in ncl.
from esmvalcore.
Thanks for all the info Bouwe! My use case would be to combine both ensemble means and multimodel statistics, to prevent models with many ensemble member getting excessive weight in the MM stats.
Like you say, this can be achieved with the current functionality by doing the multi model stats in a Python diagnostic, but a separate preprocessor function would be nicer. An alternative solution for this use case would be for multimodel stats to support weights.
from esmvalcore.
Having a dedicated ensemble_statistics preprocessor would probably be nicer than the current solution
I think that this combined with the groupby option is the most convenient solution. Because at some point I tried to extend the multi_model function and it ended up being very messy.
from esmvalcore.
Just adding onto this discussion, I talked with members from the 4C project yesterday who were asking about averaging observational datasets in the esmvaltool. Considering there are currently different "projects" for observations like obs4mips, OBS, OBS6, it could be needed to add some kind of optional tag or group to datasets in the recipe, which could then also be the input for the "groupby" option Bouwe proposed above, e.g. hijacking the recipe idea above
preprocessors:
statistics:
multi_model_statistics:
span: overlap
statistics: [mean, median]
groupby: [obs_avg, CMIP5]
diagnostics:
diagnostic1:
variables:
ta:
mip: Amon
project: CMIP5
preprocessor: statistics
start_year: 2000
end_year: 2002
exp: historical
additional_datasets:
- {dataset: CRU, project: OBS, tag: obs_avg, ...}
- {dataset: ERA-Interim, project: OBS6, tag: obs_avg, ...}
- {dataset: CanESM2, ensemble: r(1:5)i(1:2)p1}
- {dataset: MPI-ESM-LR, ensemble: r(1:5)i1p1}
- {dataset: bcc-csm1-1, ensemble: r(1:5)i1p1}
- {dataset: GFDL-ESM2G, ensemble: r1i1p(1:5)}
Additionally they were asking if there is a possibility for giving weights to observations if this mean was computed, say 70% weight for dataset 1 and only 30% for dataset 2. Is this currently possible with the preprocessor and if not, would we want it to be? Or should this rather be given as parameters to diagnostics, potentially introducing a class of "preprocessor diagnostics" between preprocessor and diagnostics for more involved computations which can then be forwarded to "normal" diagnostics using the ancestors tag?
from esmvalcore.
potentially introducing a class of "preprocessor diagnostics"
This is something I've been thinking about as well in the context of ESMValGroup/ESMValTool#1640. Several different weighting methods are developed in EUCP, and we're considering to add several more of them to ESMValTool. I guess it's a natural way to start implementing them as diagnostics and maybe, once that converges to some sort of common interface, we could consider porting them to the preprocessor. Does that make sense?
from esmvalcore.
Just adding onto this discussion, I talked with members from the 4C project yesterday who were asking about averaging observational datasets in the esmvaltool. Considering there are currently different "projects" for observations like obs4mips, OBS, OBS6, it could be needed to add some kind of optional tag or group to datasets in the recipe, which could then also be the input for the "groupby" option Bouwe proposed above, e.g. hijacking the recipe idea above
In #673 we made some progress to make this possible. Internally we added support for grouping data to do ensemble statistics, but in principle, anything can be grouped, as long as it is defined in the dataset attributes. This functionality can be exposed quite easily.
What you would then do would be something like:
preprocessors:
statistics:
multi_model_statistics:
span: overlap
statistics: [mean, median]
groupby: [tag, dataset]
from esmvalcore.
Related Issues (20)
- Add `esmvaltool compare` command to run `esmvaltool/utils/testing/regression/compare.py` HOT 4
- GFDL-CM4 SfcWind height Issues HOT 1
- [testing/mamba] No useful output from mamba
- Improve the ESMValCore release documentation
- Automatically select a lazy regridding scheme in `regrid` preprocessor function for 2D lat/2D lon grids when possible HOT 6
- Drop support for Python 3.9 HOT 3
- Update Iris pin to avoid using versions with memory issues HOT 4
- Increase the version number for 2.11.0 release
- Add draft release notes for v2.11.0rc1 HOT 1
- Remove `compilers` from `environment.yml`
- Recipe test results for ESMValCore v2.11.0rc1 HOT 15
- Diagnostic failure for `recipe_martin18grl.yml` on `v2.11.0rc1` HOT 1
- Diagnostic failure for `recipe_pcrglobwb.yml` on `v2.11.0rc1` HOT 1
- Diagnostic failure for `recipe_lauer22jclim_fig3-4_zonal.yml` on `v2.11.0rc1` HOT 1
- Add native6, OBS6 and RAWOBS rootpaths to metoffice config-user.yml template, and remove temporary dir
- Add control over chunks at load time
- Interpolation in `resample_hours`
- Adding ETCCDI custom CMOR tables for HadEX3 CMORizer and IPCC diagnotics HOT 1
- Make extract_point/extract_location/extract_coordinate_points preprocessor functions lazy HOT 1
- Lazy derive preprocessor function
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from esmvalcore.