GithubHelp home page GithubHelp logo

Comments (22)

mattiarighi avatar mattiarighi commented on July 24, 2024 1
  1. A module for ensemble statistic would indeed be very useful. The ensemble statistics can be calculated at an earlier stage in the preprocessor chain, since they do not require regridding. I would suggest creating a dedicated preprocessor ensemble_statistics, with similar functionalities as multi_model_statistics (but maybe without the span / overlap options?). We also need to discuss how to specify the ensemble members in the recipe: I would prefer an explicit approach, e.g. each ensemble get a dataset dictionary:
  - {project: CMIP5, dataset: ACCESS1-0, ensemble: r1i1p1, ...}
  - {project: CMIP5, dataset: ACCESS1-0, ensemble: r1i1p2, ...}
  - {project: CMIP5, dataset: ACCESS1-0, ensemble: r1i1p3, ...}

That will probably result in a long datasets dictionary, but it's more transparent than, e.g., automatically retrieving all available ensembles given the model name. We can also have an option to exclude the single members from the preprocessing and consider only the ensemble mean / median in the chain.

  1. That sounds a bit more complicated, especially how to store this in a preproc/ nc file with a proper time coordinate and how to name this file. It could be a significant effort, we should really make sure this is going to be convenient to have it in the preprocessor chain.

from esmvalcore.

bouweandela avatar bouweandela commented on July 24, 2024

Is there a way to use the multi_model_statistics preprocessor so that it only compares single models?

Yes, just define your variables as in the examples/recipe_variable_groups.yml with one group per model. The preprocessor will process each group separately, so the multi model mean function will compute the ensemble mean.

Or perhaps we need a single_model_ensemble_statistics preprocessor?

Probably not, I think this can be done with the existing multi model preprocessor.

from esmvalcore.

mattiarighi avatar mattiarighi commented on July 24, 2024

But you also need to change the order of operation, since the ensemble statistics should not be performed after regridding (while the multimodel statistics do). So, in case you need both ensemble and multi model stats, I do not know whether the same module doing both would work.

from esmvalcore.

bouweandela avatar bouweandela commented on July 24, 2024

in case you need both ensemble and multi model stats

That would indeed be a bit difficult with the current setup, in that case a dedicated ensemble statistic function would probably be a good solution. The implementation could re-use the current multi model function with a little bit of extra code to group the input cubes by model.

from esmvalcore.

valeriupredoi avatar valeriupredoi commented on July 24, 2024

a la:

def _ensemble_multi_model_statistics(cfg, products, output_products, statistics):
    """Inter-ensemble multimodel statistic."""
    # get data
    input_data = cfg['input_data'].values()
    products.cubes = iris.load(group_metadata(input_data, 'dataset'))
    # apply multi model 
    return _multimodel.multi_model_statistics(products, 'overlap', output_products, statistics)

eh?

from esmvalcore.

sloosvel avatar sloosvel commented on July 24, 2024

@jvegasbsc and I have been looking into creating a preprocessing function to able to compute ensemble means. I have a draft in a branch that groups the input products and then calls the multi_model_statistics function per grouped dataset. But then I found this issue so maybe it's better to continue here the discussion.

from esmvalcore.

Peter9192 avatar Peter9192 commented on July 24, 2024

Any new insights on this? @sloosvel have you abandoned that branch or do you think it's still a viable option?

from esmvalcore.

sloosvel avatar sloosvel commented on July 24, 2024

It must be quite outdated, but I think that with #637 it will be easier to implement.

from esmvalcore.

Peter9192 avatar Peter9192 commented on July 24, 2024

So you'd suggest starting from scratch, or is there anything that can be re-used from your branch?

If starting from scratch, and considering that

the ensemble statistics should not be performed after regridding (while the multimodel statistics do)

would it make sense to write a new function, rather than calling multimodel stats under the hood? I suppose doing multimodel stats on consistent (same model/grid/etc) datasets could be more straightforward and maybe even lazy?

from esmvalcore.

sloosvel avatar sloosvel commented on July 24, 2024

I think it may work as a starting point because what that branch does is create an ensemble mean product and then group the input in an ensemble_statistics functions. So I can open a draft.

However I did minimal testing, so I may have overlooked issues. And with the changes in #637 I think that the function can be more flexible in terms of how the input is grouped. So that could be improved as well.

from esmvalcore.

Peter9192 avatar Peter9192 commented on July 24, 2024

Cool, thanks! I'll have a look

from esmvalcore.

bouweandela avatar bouweandela commented on July 24, 2024

To expand a bit on this comment: #52 (comment), the way to do ensemble statistics with the current functionality would be to create a recipe that looks e.g. like this:

preprocessors:
  statistics:
    multi_model_statistics:
      span: overlap
      statistics: [mean, median]

diagnostics:
  diagnostic1:
    variables:
      ta_CanESM2: &var
        short_name: ta
        mip: Amon
        project: CMIP5
        preprocessor: statistics
        start_year: 2000
        end_year: 2002
        exp: historical
        additional_datasets:
          - {dataset: CanESM2, ensemble: r(1:5)i(1:2)p1}
      ta_MPI-ESM-LR:
        <<: *var
        additional_datasets:
          - {dataset: MPI-ESM-LR, ensemble: r(1:5)i1p1}
      ta_bcc-csm1-1:
        <<: *var
        additional_datasets:
          - {dataset: bcc-csm1-1, ensemble: r(1:5)i1p1}
      ta_GFDL-ESM2G:
        <<: *var
        additional_datasets:
          - {dataset: GFDL-ESM2G, ensemble: r1i1p(1:5)}      

Is this something that could be useful?

Another idea: would it be more useful if the groups could be created automatically? So the recipe above would be shortened to

preprocessors:
  statistics:
    multi_model_statistics:
      span: overlap
      statistics: [mean, median]
      groupby: [dataset]

diagnostics:
  diagnostic1:
    variables:
      ta:
        mip: Amon
        project: CMIP5
        preprocessor: statistics
        start_year: 2000
        end_year: 2002
        exp: historical
        additional_datasets:
          - {dataset: CanESM2, ensemble: r(1:5)i(1:2)p1}
          - {dataset: MPI-ESM-LR, ensemble: r(1:5)i1p1}
          - {dataset: bcc-csm1-1, ensemble: r(1:5)i1p1}
          - {dataset: GFDL-ESM2G, ensemble: r1i1p(1:5)}      

from esmvalcore.

bouweandela avatar bouweandela commented on July 24, 2024

@bettina-gier I believe you also mentioned you were interested in computing ensemble statistics at one of our monthly meetings, could you comment on what kind of features you would need?

from esmvalcore.

bouweandela avatar bouweandela commented on July 24, 2024

So, in case you need both ensemble and multi model stats, I do not know whether the same module doing both would work.

This can be achieved with the current functionality by doing the multi model stats in a Python diagnostic.

from esmvalcore.

bouweandela avatar bouweandela commented on July 24, 2024

But you also need to change the order of operation, since the ensemble statistics should not be performed after regridding

There is the custom: true option for preprocessor to change the default order

from esmvalcore.

bouweandela avatar bouweandela commented on July 24, 2024

Having a dedicated ensemble_statistics preprocessor would probably be nicer than the current solution, but it would be good to have more clear idea of the use cases, especially those that cannot be done/are inconvenient with current functionality.

from esmvalcore.

bettina-gier avatar bettina-gier commented on July 24, 2024

My idea for a feature was more of comparing different projects, e.g. CMIP ensembles. With the CMIP6 data a lot of people will be looking at what has improved and currently there's no easy way to do the "groupby: [CMIP5, CMIP6]" as you called it above to get the multi-model mean for all CMIP5 or CMIP6 models and compare those.
I suppose with accepting cubes you can now do it in the diagnostic - but only if using python while a lot of the old IPCC figures are written in ncl.

from esmvalcore.

Peter9192 avatar Peter9192 commented on July 24, 2024

Thanks for all the info Bouwe! My use case would be to combine both ensemble means and multimodel statistics, to prevent models with many ensemble member getting excessive weight in the MM stats.
Like you say, this can be achieved with the current functionality by doing the multi model stats in a Python diagnostic, but a separate preprocessor function would be nicer. An alternative solution for this use case would be for multimodel stats to support weights.

from esmvalcore.

sloosvel avatar sloosvel commented on July 24, 2024

Having a dedicated ensemble_statistics preprocessor would probably be nicer than the current solution

I think that this combined with the groupby option is the most convenient solution. Because at some point I tried to extend the multi_model function and it ended up being very messy.

from esmvalcore.

bettina-gier avatar bettina-gier commented on July 24, 2024

Just adding onto this discussion, I talked with members from the 4C project yesterday who were asking about averaging observational datasets in the esmvaltool. Considering there are currently different "projects" for observations like obs4mips, OBS, OBS6, it could be needed to add some kind of optional tag or group to datasets in the recipe, which could then also be the input for the "groupby" option Bouwe proposed above, e.g. hijacking the recipe idea above

preprocessors:
  statistics:
    multi_model_statistics:
      span: overlap
      statistics: [mean, median]
      groupby: [obs_avg, CMIP5]

diagnostics:
  diagnostic1:
    variables:
      ta:
        mip: Amon
        project: CMIP5
        preprocessor: statistics
        start_year: 2000
        end_year: 2002
        exp: historical
        additional_datasets:
          - {dataset: CRU, project: OBS, tag: obs_avg, ...}
          - {dataset: ERA-Interim, project: OBS6, tag: obs_avg, ...}
          - {dataset: CanESM2, ensemble: r(1:5)i(1:2)p1}
          - {dataset: MPI-ESM-LR, ensemble: r(1:5)i1p1}
          - {dataset: bcc-csm1-1, ensemble: r(1:5)i1p1}
          - {dataset: GFDL-ESM2G, ensemble: r1i1p(1:5)}      

Additionally they were asking if there is a possibility for giving weights to observations if this mean was computed, say 70% weight for dataset 1 and only 30% for dataset 2. Is this currently possible with the preprocessor and if not, would we want it to be? Or should this rather be given as parameters to diagnostics, potentially introducing a class of "preprocessor diagnostics" between preprocessor and diagnostics for more involved computations which can then be forwarded to "normal" diagnostics using the ancestors tag?

from esmvalcore.

Peter9192 avatar Peter9192 commented on July 24, 2024

potentially introducing a class of "preprocessor diagnostics"

This is something I've been thinking about as well in the context of ESMValGroup/ESMValTool#1640. Several different weighting methods are developed in EUCP, and we're considering to add several more of them to ESMValTool. I guess it's a natural way to start implementing them as diagnostics and maybe, once that converges to some sort of common interface, we could consider porting them to the preprocessor. Does that make sense?

from esmvalcore.

stefsmeets avatar stefsmeets commented on July 24, 2024

Just adding onto this discussion, I talked with members from the 4C project yesterday who were asking about averaging observational datasets in the esmvaltool. Considering there are currently different "projects" for observations like obs4mips, OBS, OBS6, it could be needed to add some kind of optional tag or group to datasets in the recipe, which could then also be the input for the "groupby" option Bouwe proposed above, e.g. hijacking the recipe idea above

In #673 we made some progress to make this possible. Internally we added support for grouping data to do ensemble statistics, but in principle, anything can be grouped, as long as it is defined in the dataset attributes. This functionality can be exposed quite easily.

What you would then do would be something like:

preprocessors:
  statistics:
    multi_model_statistics:
      span: overlap
      statistics: [mean, median]
      groupby: [tag, dataset]

from esmvalcore.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.