roocs / clisops Goto Github PK

View Code? Open in Web Editor NEW

21.0 8.0 9.0 13.42 MB

Climate Simulation Operations

Home Page: https://clisops.readthedocs.io/en/latest/

License: Other

Makefile 0.50% Python 99.50%

xarray climate-analysis climate-science python netcdf4 xclim

clisops's People

Contributors

Stargazers

Watchers

Forkers

cehbrecht zeitsperre aulemahal agstephens ouranosinc carlitosh un-climate-change-conference charlesgauthier-udm licongren

clisops's Issues

Documentation needs to highlight breaking changes

I noticed that the C-library requirements now include libudunits2-dev or udunits(for macOS). This is a breaking change that needs to be communicated upstream and to users. Part of the issue is that documentation is not updated alongside changes to code to ensure that "surprises" don't occur in production.

I can see a few steps needed to address this:

Documenting version changes (new features, breaking changes, internal restructurings, etc.)
Updating version documentation alongside PRs needs to become a practice.
PRs should require review before merging (to make sure we don't forget a step).

I'm sorry if I seem frustrated, but clisops will be going into Ouranos's production instance of xclim on our servers next month, so we need to follow better protocols.

Subsetting incorrect for c3s-cmip5.output1.NCC.NorESM1-ME.rcp60.mon.seaIce.OImon.r1i1p1.tsice.v20120614

Description

The dataset c3s-cmip5.output1.NCC.NorESM1-ME.rcp60.mon.seaIce.OImon.r1i1p1.tsice.v20120614 has an irregular grid with dims i and j.

The test

clisops/tests/ops/test_subset.py

Lines 391 to 423 in d9a7ef0

 @pytest.mark.skipif( 

  os.path.isdir("/group_workspaces") is False, reason="data not available" 

 ) 

 def test_coord_variables_subsetted_i_j(): 

 """ 

  check coord variables e.g. lat/lon when original data 

  is on an irregular grid are subsetted correctly in output dataset 

  """ 

 ds = _load_ds(C3S_CMIP5_TSICE) 

 assert "lat" in ds.coords 

 assert "lon" in ds.coords 

 assert "i" in ds.dims 

 assert "j" in ds.dims 

 area = (5.0, 10.0, 20.0, 65.0) 

 result = subset( 

 ds=C3S_CMIP5_TSICE, 

 time=("2005-01-01T00:00:00", "2020-12-30T00:00:00"), 

 area=area, 

 output_type="xarray", 

 ) 

 # check within 10% of expected subset value 

 assert abs(area[1] - float(result[0].lat.min())) / area[1] <= 0.1 

 assert abs(float(result[0].lat.max()) - area[3]) / area[3] <= 0.1 

 with pytest.raises(AssertionError): 

 assert abs(area[0] - float(result[0].lon.min())) / area[0] <= 0.1 

 assert abs(float(result[0].lon.max()) - area[2]) / area[2] <= 0.1 

 # working for lat but not lon in this example

shows that longitude is not subsetted correctly for this example but latitude is.

Fix the "0 degree meridian" issue in subsetting

clisops version: 0.3.0
Python version:
Operating System:

Description

The core.subset module is missing the implementation for "crossing 0 degree meridian".

clisops/clisops/core/subset.py

Line 114 in c493bc5

f"Input longitude bounds ({kwargs[lon]}) cross the 0 degree meridian but"

What I Did

Spatial averaging over polygons

[This issue is a migration of Ouranosinc/xclim#422]

The goal would be to compute spatial averages over a polygon. It would need to account for non-uniform grid area and partial overlap, holes in polygons.
The best would be to compute a weight mask for the array representing the area of each gridcell covered by the polygon (of the fractional area).

As mentioned on the original thread, a first way to do this would be to generate a grid of higher resolution and use the existing create_mask.
Or, we could iterate over all gridcells and generate Polygons for them, either using provided lat_bnds and lon_bnds or inferring them. Then, shapely's methods could be used to compute the intersection of each gridpolygon and the target polygon.

From some tests I made of both methods, the second can be quite fast and relatively easy to implement.

Coordinates don't match for NCC/NorESM1-ME or MIROC/MIROC5

Issue in ESMValTool - some coordinate points vary for different files of this dataset (for different time range). This fix removes these inaccuracies by rounding the coordinates.

can be found here: https://github.com/ESMValGroup/ESMValCore/blob/master/esmvalcore/cmor/_fixes/cmip5/noresm1_me.py

This seems to have been corrected - can't find this problem in any of the NCC/NorESM1-ME Amon tas datasets

Are we doing ".compute()" correctly on delayed dask object?

@ellesmith88: I was just comparing the documentation on the dask pages with our implementation at:

http://xarray.pydata.org/en/stable/dask.html

Their example is:

delayed_obj = ds.to_netcdf("manipulated-example-data.nc", compute=False)
results = delayed_obj.compute()

Our code is:

        with dask.config.set(scheduler="synchronous"):
            chunked_ds.to_netcdf(output_path, compute=False)
            chunked_ds.compute()

I think our code should be:

        with dask.config.set(scheduler="synchronous"):
            delayed_obj = chunked_ds.to_netcdf(output_path, compute=False)
            delayed_obj.compute()

Please check which is the correct implementation. Remembering that compute=False was writing files but filling them with empty arrays. Thanks

November review meeting of regridding prototype

Agree who should meet and when/where
Identify goals
Work out next steps

The Windows build is failing on master (dependencies misaligned)

The windows build is presently set as an allowed_fail build but this isn't really the case, as it is generally stable and presently used by Windows users. Since this build uses Anaconda explicitly to get around the nightmare of installing/compiling C-libraries, it's important to note that changes to the requirements (in setup.py or requirements.txt) need to be mirrored in the environment.yml, otherwise WIndows-installed pip will try to install dependencies, and we'll get failure stack traces that make no sense at all.

If no one disagrees, I'm going to re-align the dependencies and set the Windows build to be a required passing build.

Implement subset by level

Discuss with Ouranos how we should implement this. Maybe inside clisops.core.subset.

Note about using xarray open_mfdataset

DKRZ are loading CMIP6 into Zarr. Here are some of their experiences with xarray.open_mfdataset:

One problem with the following line:

    ds = xarray.open_mfdataset(catvar.df["path"].to_list(), use_cftime=True, combine="by_coords")

Xarray does not interpret the bounds keyword so that the corresponding lat and lon bounds are listed as data variables. That might not cause any problem, but on top of that, xarray adds a time dimension to that variables:

    lat_bnds   (time, lat, bnds) float64 dask.array<chunksize=(1826, 192, 2), meta=np.ndarray>
    lon_bnds   (time, lon, bnds) float64 dask.array<chunksize=(1826, 384, 2), meta=np.ndarray>

DKRZ used:

xarray.open_mfdataset(catvar.df["path"].to_list(),
                               decode_cf=True,
                               concat_dim="time",
                               data_vars='minimal', 
                               coords='minimal', 
                               compat='override')

From the xarray tutorial so that there is no time dimension anymore for the bnds. They had not included use_cftime , which might cause other problems as I saw now when reconverting it to netCDF.

clisops version not found

Noticed this error occurring on the ReadTheDocs build. While __init__.py isn't needed for packages in Python3, it's terribly convenient when it comes to ensuring that the version of the library is available when imported. My suggestion would be to simply add a single __init__.py at the top-level of the library or to add a __version__ object to the main module to set the version.

This would ultimately be managed by bumpversion.

Issue relating to: properly handling of non permanent masks

@sol1105: noted this one, might be relevant when we look at masks:

pangeo-data/xESMF#29 (comment)

We need a subset test to check that auxiliary and coord variables are operated on as well as the main variable (or dropped from the output)

Create a couple of unit tests that check that auxiliary variables and coordinate variables are provided in the output Dataset from clisops.

coordinate variables, such as latitude and longitude when the original data is on an irregular grid.
auxiliary variables, such a related variable that happens to exist in the file but is not a coordinate.

What we want to understand is:

Does Xarray remember these variables and write them to that output?
If they have coordinates (e.g. time, level, lat, lon) - do they get subsetted like the main variable?

Can we modify check on start and end datetimes in "clisops.core.subset"?

Description

The checks in subset_time can raise an exception in certain cases:

I have tested:

start: None, end: None --> defaults to: time.min(), time.max()
start: value in array, end: value in array --> uses provided values

However, if start_date or end_date are in the time range BUT are not exactly aligned then we get an exception, e.g.:

valid times: ['2075-12-01T12', '2075-12-02T12', '2075-12-03T12']
start_date: '2075-12-02T00' - in range but not exactly in the time array
result:

Traceback (most recent call last):
  File "test_start_datetime.py", line 7, in <module>
    res = subset_time(ds, start_date=ts[0], end_date=ts[1])
  File "/home/users/astephen/roocs/clisops/clisops/core/subset.py", line 78, in func_checker
    > da.time.sel(time=kwargs["end_date"]).max()
  File "/home/users/astephen/roocs/venv/lib/python3.7/site-packages/xarray/core/common.py", line 46, in wrapped_func
    return self.reduce(func, dim, axis, skipna=skipna, **kwargs)
  File "/home/users/astephen/roocs/venv/lib/python3.7/site-packages/xarray/core/dataarray.py", line 2338, in reduce
    var = self.variable.reduce(func, dim, axis, keep_attrs, keepdims, **kwargs)
  File "/home/users/astephen/roocs/venv/lib/python3.7/site-packages/xarray/core/variable.py", line 1591, in reduce
    data = func(input_data, **kwargs)
  File "/home/users/astephen/roocs/venv/lib/python3.7/site-packages/xarray/core/duck_array_ops.py", line 324, in f
    return func(values, axis=axis, **kwargs)
  File "/home/users/astephen/roocs/venv/lib/python3.7/site-packages/xarray/core/nanops.py", line 86, in nanmax
    return _nan_minmax_object("max", dtypes.get_neg_infinity(a.dtype), a, axis)
  File "/home/users/astephen/roocs/venv/lib/python3.7/site-packages/xarray/core/nanops.py", line 67, in _nan_minmax_object
    data = getattr(np, func)(filled_value, axis=axis, **kwargs)
  File "<__array_function__ internals>", line 6, in amax
  File "/home/users/astephen/roocs/venv/lib/python3.7/site-packages/numpy/core/fromnumeric.py", line 2706, in amax
    keepdims=keepdims, initial=initial, where=where)
  File "/home/users/astephen/roocs/venv/lib/python3.7/site-packages/numpy/core/fromnumeric.py", line 87, in _wrapreduction
    return ufunc.reduce(obj, axis, dtype, out, **passkwargs)
ValueError: zero-size array to reduction operation maximum which has no identity

Here is a suggested rewrite of the check_start_end_dates decorator, use slicing to nudge the start_date and end_date, e.g.:

def check_start_end_dates(func):
    @wraps(func)
    def func_checker(*args, **kwargs):
        da = args[0]

        kwargs.setdefault("start_date", None)
        kwargs.setdefault("end_date", None)

        nudged_start = da.time.sel(time=slice(kwargs["start_date"], None)).values[0].isoformat()
        if nudged_start != kwargs["start_date"]:
            warnings.warn(
                '"start_date" not found within input date time range. Value has been nudged to nearest'
                ' valid time step in xarray object.',
            UserWarning,
            stacklevel=2,
            )
            kwargs["start_date"] = nudged_start

        nudged_end = da.time.sel(time=slice(None, kwargs["end_date"])).values[-1].isoformat()
        if nudged_end != kwargs["start_date"]:
            warnings.warn(
                '"end_date" not found within input date time range. Value has been nudged to nearest'
                ' valid time step in xarray object.',
                UserWarning,
                stacklevel=2,
                )
            kwargs["end_date"] = nudged_end

        return func(*args, **kwargs)

    return func_checker

daops and clisops: are they only using the YEAR of the datetime?

Check that:

we are sending full datetime strings to clisops subset.
clisops is handling the datetime strings properly.

Enable macos and windows tests

clisops version:
Python version:
Operating System:

Description

... test on windows and macos are currently disabled in .travis.yml (#8).

clisops should calculate time chunks based on full subset (which is lazily evaluated)

Current clisops implementation calculates time chunks without taking Area and Level selections into account.

See:

https://github.com/roocs/clisops/blob/master/clisops/ops/subset.py#L71

However, due to lazy evaluation, the call to clisops.core.subset.py will not touch the actual data. Hence we can do the subset on the entire object, then calculate the time slices on that. This will simplify the whole calculation of size and time...probably means we can delete code!

Determine chunk size instead of using dask config

Description

we need something here that finds nbytes and n_time_steps and uses our config memory limit in "clisops:read" to decide on a sensible chunk_length

    var_id = get_main_variable(ds)
    da = ds[var_id].chunk({'time': chunk_length})
    da.unify_chunks()
    ds.to_netcdf(tmp_file)

Also set environment variables in roocs_utils/etc/roocs.ini and set in clisops/__init__.py

print('[WARNING] Heeding warning about Dask environment variables: https://docs.dask.org/en/latest/array-best-practices.html#avoid-oversub
scribing-threads')
print("""export OMP_NUM_THREADS=1
export MKL_NUM_THREADS=1
export OPENBLAS_NUM_THREADS=1
""")
for key in 'OMP_NUM_THREADS MKL_NUM_THREADS OPENBLAS_NUM_THREADS'.split():
    os.environ[key] = '1'

Docstring cleanup

The Docstring examples still refer to calls using xclim. These should be revised to reflect the accepted call behaviour for users (and eventually tested in Travis CI)

Get clisops ready for acceptance by Xclim

Tag as version 3 and make a pull request to Xclim.

Consider using a class or decorators to manage the workflow of each operation (function)

We can see from clisops.ops.subset::subset that most of the workflow will be independent of the actual operation:

clisops/clisops/ops/subset.py

Line 53 in ffeb599

def subset(

The differences are:

input arguments
mapping those arguments
calling the actual operation(s) - from clisops.core.*

Would it make sense to create a class that worked through the stages?

validate/parse inputs
map args
call operation
get outputs

The latter, get_outputs could, in turn, be a class that brings together:

file-naming
format management
time-splitting
output file/object generation

The clisops.ops.subset could then be a class or function with dectorators:

@validate_ds
def subset(ds, .....):
      mapped_args = utils.map_params(...)
      ds = clisops.core.subset(ds, ...)
      
      output_handler = OutputHandler(ds, ...)
      return output_handler.get_outputs()

NOTE: we also need to simplify, or find a way to remove: clisops.ops.subset::_subset

How to safely define a selection of dimensions to average across?

The average function will allow averaging across multiple dimensions. But what do we call them?

Here is a suggestion to be discussed:

time - should match our standard parameter name for submitting over time
level - should match our standard parameter name for submitting over level
latitude - allowed (but might not always exist)
longitude - allowed (but might not always exist)
y - if y-axis is not really latitude
x - if x-axis is not really longitude

Main issue: the user doesn't know what the options are - so we need to map them.

Images in notebooks in docs not showing

The images in the Subsetting Utilities notebooks in the documentation are not showing. This might be something the do with the path/ symlink between the two notebook directories.

Xarray arguments for `mean` : `skipna` and `keep_attrs`

The Xarray mean method is:
http://xarray.pydata.org/en/stable/generated/xarray.DataArray.mean.html

It includes two optional arguments:

skipna - skip missing values (or not)
keep_attrs - keep variable attributes (or not)

Create some unit tests to help us understand the behaviour of each argument:

def test_xarray_da_mean_skipna_true():
  - create a simple 1D xarray.DataArray with 10 values of [10., 10., 10., 10., 10., nan, nan, nan, nan, nan]
  - test that the average is 2 if you use `skipna=True`

def test_xarray_da_mean_skipna_false():
  - create a simple 1D xarray.DataArray with 10 values of [10., 10., 10., 10., 10., nan, nan, nan, nan, nan]
  - test that the average is 1 if you use `skipna=False`

If the results are not as above, we need to investigate more.

def test_xarray_da_mean_keep_attrs_true():
  - read a variable from our mini-esgf-cache
  - average it with `mean` method across the time axis, with `keep_attrs=True`
  - assert the original attributes match the new attributes

def test_xarray_da_mean_keep_attrs_false():
  - read a variable from our mini-esgf-cache
  - average it with `mean` method across the time axis, with `keep_attrs=False`
  - examine the attributes of the resulting average DataArray
  - assert those values when you know them

Discuss with team whether we want to:
  1. Keep attrs
  2. Lose attrs
  3. Modify attrs (which might be: keep some then remove/edit/add others).

Keep these unit tests in our codebase anyway.

Define "average" operation interface

We need to provide an average function in clisops with an associated daops and clisops method. This is a high-level GitHub issue regarding that function. It breaks down into more issues.

Overall plan:

Our average uses xarrray directly, as documented here:
http://xarray.pydata.org/en/stable/generated/xarray.DataArray.mean.html
We are not going to support weighted means or complex averages.
This function only supports a simple and complete average across one or more dimensions of a hypercube.

Issues to explore:

Extra arguments:
#28
How do we limit define the inputs:
#29

release 0.1.0?

clisops version:
Python version:
Operating System:

Description

Can we make an initial release 0.1.0 of the current clisops? This can be referenced by daops. After that we can merge the initial xclim subset module ... and make it a 0.2.0 release.

Error running subset with no arguments

Description

I ran a test on daops that used subset but didn't pass any arguments and got the error UnboundLocalError: local variable 'result' referenced before assignment

This is from clisops/ops/subset.py : https://github.com/roocs/clisops/blob/master/clisops/ops/subset.py#L20-L46

Do we want to be able to use subset with no arguments? If not then then we should update this so that the error message is more meaningful.

@agstephens What do you think?

Add a Pull Request template for CLISOPS

PRs could benefit from a template of questions that the contributor could answer. Some identifiers like "What does this PR change?" and "Are there breaking changes?" are things I've seen across GitHub and xclim has a few as well. I can add that at some point soon.

Share Notebooks

with Ag and Elle

enable travis ci

clisops version:
Python version:
Operating System:

Description

travis ci is not active yet. We can adapt to the xclim configuration.

Dependency fixes across roocs packages

Should we move root dirs, basic xarray utills etc into a new repo (roocs-utils)?

FOR NOW: move those dependencies into clisops - but later we might move out.

fix black style check complains

clisops version:
Python version:
Operating System:

Description

black style checks tests are run by travis. Fix complains.

How do our operations handle "larger than memory" requests?

Sooner or later a user makes a "larger than memory" request.

We need to implement an appropriate level of Dask chunking in open dataset operations so that we avoid memory errors.

This needs some thought, but this example may be of use:

import xarray as xr
import os


def _setup_env():
    """
export OMP_NUM_THREADS=1
export MKL_NUM_THREADS=1
export OPENBLAS_NUM_THREADS=1
"""
    env = os.environ
    env['OMP_NUM_THREADS'] = '1'
    env['MKL_NUM_THREADS'] = '1'
    env['OPENBLAS_NUM_THREADS'] = '1'

_setup_env()


def main():
    dr = '/badc/cmip6/data/CMIP6/HighResMIP/MOHC/HadGEM3-GC31-HH/control-1950/r1i1p1f1/day/ta/gn/v20180927'
    print(f'[INFO] Working on: {dr}')

    ds = xr.open_mfdataset(f'{dr}/*.nc') # , parallel=False)

    chunk_rule = {'time': 4}
    chunked_ds = ds.chunk(chunk_rule)

    ds['ta'].unify_chunks()

    print(f'[INFO] Chunk rule: {chunk_rule}')

    OUTPUT_DIR = '/gws/nopw/j04/cedaproc/astephen/ag-zarr-test'
    output_path = f'{OUTPUT_DIR}/test.zarr'

    chunked_ds.to_zarr(output_path) # Although we won't use Zarr in clisops! - NC should work fine.
    print(f'[INFO] Wrote: {output_path}')


if __name__ == '__main__':

    main()

prepare inital release used by xclim

clisops version:
Python version:
Operating System:

Description

Things to do:

prepare PR for xclim with clisops
use clisops in rook
update docs
optional: add notebook examples
upload to pypi
optional: conda package

'simple' filenames all the same for chunked output

When simple filenamer is used and split_method = 'time:auto' the output files are all named output_001.nc

see tests/test_file_namers.py::test_SimpleFileNamer_with_chunking on implement-split-outputs branch

clisops/tests/test_file_namers.py

Line 33 in 8d8eea3

def test_SimpleFileNamer_with_chunking(tmpdir):

Make a release of clisops and a PR to xclim

Also available in PyPI and conda-forge.

Keep track of "rechunker" package - we might need it

https://medium.com/pangeo/rechunker-the-missing-link-for-chunked-array-analytics-5b2359e9dc11

build conda package on conda-forge

clisops version:
Python version:
Operating System:

Description

clisops has "heavy" dependencies ... for installation a conda package is more convenient.

What I Did

Update mini-esgf-data to fix clisops -10 to 10 subsetting test

Improve checking inside clisops?

would be good to raise Exception saying "No valid data found within spatial bounds...."

Modernize requirements and leverage pre-commit

This is an issue to identify a few problems:

requirements_dev.txt is pinning several outdated python libraries (e.g. bumpversion 0.5.3 was released in 2015).
Much of the travis and tox configurations borrowed from xclim are not used but not fully commented out from the configurations.
pre-commit, a library that has made standardized development of xclim a breeze is not implemented here. This library automatically catches PEP8 and black errors before they are committed to the shared code base with almost no effort or extra steps needed from the contributor. For more information: https://pre-commit.com/.

I would like to contribute a fix for these issues.

Maintaining compatibility with xclim

Hello!

I just wanted to open an issue about some things that would make compatibility much easier to maintain for xclim. As of now, we have clisops installed in our Travis CI load-out for the following versions and builds:

python = 3.6, 3.7, 3.8
OSes = Windows, Linux (Ubuntu Xenial), macOS

Clisops is built into Xclim such that when importing xclim.subset with clisops in the installation environment, xclim exposes all the processes listed in __all__ of clisops.core.subset. It would be good to ensure that new functions are listed in __all__ and tested. Presently, these are:

__all__ = [
    "create_mask",
    "create_mask_vectorize",
    "distance",
    "subset_bbox",
    "subset_gridpoint",
    "subset_shape",
    "subset_time",
]

I understand that there may eventually be decisions that change the way clisops is run, so in order to ensure that these changes don't take us by surprise, it would be good to see some of the following:

DeprecationWarnings and FutureDeprecationWarnings on major functions.
Documentation for the API and/or usage examples (notebooks) of clisops.
Release notes in tagged versions or in the documentation that let us know of new/upcoming changes.
Warnings about any major refactoring efforts, so we can update our API and let our users know of any changes coming down the line to the subset utilities.

I am always available to help clarify any of the practices we use and can help with implementing standards as needed. Just let me know where I can help out.

use bumpversion

clisops version:
Python version:
Operating System:

Description

Use bumpversion for releases. Fix config and update docs for bumpversion. Check usage of version and compare with xclim.

enable readthedocs build

clisops version:
Python version:
Operating System:

Description

Enable ReadTheDocs build for documentation. Docs can include notebook examples using nbsphinx. See xclim/finch.

Slack integration

Just curious, but I was wondering if there was interest in installing slack integration to clisops (and possibly other repositories)? I'm not sure if the roocs team uses slack, but it's useful in that the slack integrations allows you to watch issues and PRs on slack with build checks updated in real-time.

We presently use it for some Bird-house/PAVICS repositories (xclim included). Seems like a good fit for coordinating efforts/conversing. For more info: https://github.com/integrations/slack

clisops: need to replace identification of dimensions by name

Instead of looking for "lat" or "latitude", clisops should use:

is_longitude(dim)
is_latitude(dim)
etc etc

See: https://github.com/roocs/dachar/blob/b2b058615696eae2ef58614c7ba248cfa34ca06d/dachar/utils/character.py#L28

Regridding Prototype

Tasks on the way to a regridding prototype

Review this list after 1st October and 2nd October Meetings
Set up repo: https://github.com/roocs/regrid-prototype

Working Jupyter Notebook Examples to regrid (and visualize) ...

Regular lat-lon grid: https://github.com/roocs/regrid-prototype/blob/main/docs/notebooks/xESMF_Test.ipynb
Curvilinear grid
Tripolar curvilinear grid (eg. NEMO, MPIOM)
Unstructured grid (eg. FESOM ocean submodel of AWI-ESM, CMIP6 tos)

Familiarize with clisops, daops, dachar and roocs-utils

Basics
On the fly / learning by doing

Identify / Extend / Create functions to ...

Read standardized model output into xarray DataSets (from disk, also via intake-esm?) -> Not required, assume data has been read into an xarray dataset

Tests

Add testdata to the mini-esgf-data repo
Set up basic tests

Collect information

Identify the grid type
Supported standards regarding grid description
Progress the literature review

Set up external tools for comparison

Can we move the "notebooks" directory to the top-level (and link to it in the "docs" directory)

Hi @huard @Zeitsperre as part of getting ready for our first formal release of clisops we would to:

Move <repo>/docs/notebooks to <repo>/notebooks
Symlink <repo>/docs/notebooks to <repo>/notebooks
Add a new notebook for our top-level clisops.ops.subset module and connect it to the sphinx documentation.

Are you happy for us to do this?

some clisops.subset tests are failing with tox ...

clisops version:
Python version:
Operating System:

Description

I have marked in PR #8 some subset tests with pytest.xfail which are failing in a tox environment but are working with a conda environment.

What about an xarray extension?

Packages like rioxarray or hvplot, provide an xarray extension so their methods can be called directly on the dataset. Would that be wanted with clisops?
Example: instead of

from clisops import subset
subset.subset_bbox(ds, lat_bnds=[45, 50], lon_bnds=[-60, -55])

one could use:

import clisops.xarray
ds.cso.subset_bbox(lat_bnds=[45, 50], lon_bnds=[-60, -55])

Where "cso" is the xarray extension added by clisops.

Personally, I like this approach as it looks more elegant and xarray-esque. Moreover, it could allow for dataset-related lookups like crs info in metadata or using something like rioxarray's ds.rio.set_spatial_dims to solve the problem of #32.
Implementation-wise, it shouldn't be complicated and wouldn't change the rest of the api, simply add another access mechanism.
And, I believe it would make clisops more attractive to xarray users!

As a heavy user of almost-extinct xclim.subset, I can offer some time on this implementation, it it is wanted.

Should clisops reverse the lat and lon inputs to match the data limits?

This test demonstrates that at present, clisops does NOT reverse lat limits - but DOES NOT RAISE AN EXCEPTION:

from clisops.core import subset
import xarray as xr


def test_lat_lon_reversal_empty_ds():
    data = 'CMIP6.ScenarioMIP.DKRZ.MPI-ESM1-2-HR.ssp126.Amon.gn'
    coll = '/badc/cmip6/data/CMIP6/ScenarioMIP/DKRZ/MPI-ESM1-2-HR/ssp126/r1i1p1f1/Amon/tas/gn/v20190710/*.nc'

    ds = xr.open_mfdataset(coll, decode_times=False, combine='by_coords', use_cftime=True)

    ds1 = subset.subset_bbox(ds, lat_bnds=[70, 35])
    ds2 = subset.subset_bbox(ds, lat_bnds=[35, 70])

    assert(ds1.tas.shape == (1032, 0, 384))
    assert(ds1.dims['lat'] == 0)

    assert(ds2.tas.shape == (1032, 38, 384))
    assert(ds2.dims['lat'] == 38)

	@pytest.mark.skipif(
	os.path.isdir("/group_workspaces") is False, reason="data not available"
	)
	def test_coord_variables_subsetted_i_j():
	"""
	check coord variables e.g. lat/lon when original data
	is on an irregular grid are subsetted correctly in output dataset
	"""

	ds = _load_ds(C3S_CMIP5_TSICE)

	assert "lat" in ds.coords
	assert "lon" in ds.coords
	assert "i" in ds.dims
	assert "j" in ds.dims

	area = (5.0, 10.0, 20.0, 65.0)

	result = subset(
	ds=C3S_CMIP5_TSICE,
	time=("2005-01-01T00:00:00", "2020-12-30T00:00:00"),
	area=area,
	output_type="xarray",
	)

	# check within 10% of expected subset value
	assert abs(area[1] - float(result[0].lat.min())) / area[1] <= 0.1
	assert abs(float(result[0].lat.max()) - area[3]) / area[3] <= 0.1

	with pytest.raises(AssertionError):
	assert abs(area[0] - float(result[0].lon.min())) / area[0] <= 0.1
	assert abs(float(result[0].lon.max()) - area[2]) / area[2] <= 0.1
	# working for lat but not lon in this example

roocs / clisops Goto Github PK

clisops's People

Contributors

Stargazers

Watchers

Forkers

clisops's Issues

Description

Description

What I Did

Description

Description

Description

Description

Description

Description

Description

Description

Description

What I Did

Description

Description

Tasks on the way to a regridding prototype

Working Jupyter Notebook Examples to regrid (and visualize) ...

Familiarize with clisops, daops, dachar and roocs-utils

Identify / Extend / Create functions to ...

Tests

Collect information

Set up external tools for comparison

Description

Recommend Projects

Recommend Topics

Recommend Org

Jobs