ncar / esmlab Goto Github PK

Earth System Model Lab (esmlab). ⚠️⚠️ ESMLab functionality has been moved into <https://github.com/NCAR/geocat-comp>. ⚠️⚠️

Home Page: https://esmlab.readthedocs.io

License: Apache License 2.0

Shell 0.95% Python 99.05%

xarray dask pangeo cmip-ap earth-system-informatics

esmlab's Introduction

⚠️⚠️ ESMLab functionality has been moved into https://github.com/NCAR/geocat-comp. ⚠️⚠️

ESMLab: Earth System Model Lab

https://img.shields.io/codecov/c/github/NCAR/esmlab.svg?style=for-the-badge

Tools for working with earth system multi-model analyses with xarray. See documentation for more information.

Installation

ESMLab can be installed from PyPI with pip:

pip install esmlab

It is also available from conda-forge for conda installations:

conda install -c conda-forge esmlab

esmlab's People

Contributors

Stargazers

Watchers

Forkers

mnlevy1981 matt-long alperaltuntas sudharsana-kjl bradyrx bonnland sciencewiki klindsay28

esmlab's Issues

calling compute_ann_mean(ds) can lead to ds.time becoming encoded

If I call compute_ann_mean(ds) on an xarray Dataset ds that has
an unencoded time coordinate, and a time bounds named ds.time.attrs['bounds'],
I'm seeing that the time coordinate in ds gets encoded by the compute_ann_mean call.
Changing aspects of ds in compute_ann_mean(ds) is unexpected behavior (to me).
I'm guessing that this is occurring at
https://github.com/NCAR/esmlab/blob/master/esmlab/climatology.py#L209

Below is some jupyter notebook code that demonstrates this.
I'm using the very latest commit of esmlab, be608ed, and
numpy 1.15.4
xarray 0.11.3

import numpy as np
import xarray as xr
import esmlab

# set up values for Dataset, 2 yrs of analytic monthly values
days_1yr = np.array([31.0, 28.0, 31.0, 30.0, 31.0, 30.0, 31.0, 31.0, 30.0, 31.0, 30.0, 31.0])
time_edges = np.insert(np.cumsum(np.concatenate((days_1yr, days_1yr))), 0, 0)
time_bounds_vals = np.stack((time_edges[:-1], time_edges[1:]), axis=1)
time_vals = np.mean(time_bounds_vals, axis=1)
var_vals = np.sin(np.pi * time_vals / 365.0)

# create Dataset, including time_bounds
time_var = xr.DataArray(time_vals, name='time', dims='time', coords={'time':time_vals},
                        attrs={'units':'days since 0001-01-01', 'calendar':'noleap',
                               'bounds':'time_bounds'})
time_bounds = xr.DataArray(time_bounds_vals, name='time_bounds', dims=('time', 'd2'),
                           coords={'time':time_var})
var = xr.DataArray(var_vals, name='var', dims='time', coords={'time':time_var})
ds = var.to_dataset()
ds = xr.merge((ds, time_bounds))

print('ds.time before compute_ann_mean')
print(ds.time)
ds_ann = esmlab.climatology.compute_ann_mean(ds)
print('')
print('ds.time after compute_ann_mean')
print(ds.time)

ds.time before compute_ann_mean
<xarray.DataArray 'time' (time: 24)>
array([ 15.5,  45. ,  74.5, 105. , 135.5, 166. , 196.5, 227.5, 258. , 288.5,
       319. , 349.5, 380.5, 410. , 439.5, 470. , 500.5, 531. , 561.5, 592.5,
       623. , 653.5, 684. , 714.5])
Coordinates:
  * time     (time) float64 15.5 45.0 74.5 105.0 ... 623.0 653.5 684.0 714.5
Attributes:
    units:     days since 0001-01-01
    calendar:  noleap
    bounds:    time_bounds

ds.time after compute_ann_mean
<xarray.DataArray 'time' (time: 24)>
array([cftime.DatetimeNoLeap(1, 1, 16, 12, 0, 0, 0, 2, 16),
       cftime.DatetimeNoLeap(1, 2, 15, 0, 0, 0, 0, 4, 46),
       cftime.DatetimeNoLeap(1, 3, 16, 12, 0, 0, 0, 5, 75),
       cftime.DatetimeNoLeap(1, 4, 16, 0, 0, 0, 0, 1, 106),
       cftime.DatetimeNoLeap(1, 5, 16, 12, 0, 0, 0, 3, 136),
       cftime.DatetimeNoLeap(1, 6, 16, 0, 0, 0, 0, 6, 167),
       cftime.DatetimeNoLeap(1, 7, 16, 12, 0, 0, 0, 1, 197),
       cftime.DatetimeNoLeap(1, 8, 16, 12, 0, 0, 0, 4, 228),
       cftime.DatetimeNoLeap(1, 9, 16, 0, 0, 0, 0, 0, 259),
       cftime.DatetimeNoLeap(1, 10, 16, 12, 0, 0, 0, 2, 289),
       cftime.DatetimeNoLeap(1, 11, 16, 0, 0, 0, 0, 5, 320),
       cftime.DatetimeNoLeap(1, 12, 16, 12, 0, 0, 0, 0, 350),
       cftime.DatetimeNoLeap(2, 1, 16, 12, 0, 0, 0, 3, 16),
       cftime.DatetimeNoLeap(2, 2, 15, 0, 0, 0, 0, 5, 46),
       cftime.DatetimeNoLeap(2, 3, 16, 12, 0, 0, 0, 6, 75),
       cftime.DatetimeNoLeap(2, 4, 16, 0, 0, 0, 0, 2, 106),
       cftime.DatetimeNoLeap(2, 5, 16, 12, 0, 0, 0, 4, 136),
       cftime.DatetimeNoLeap(2, 6, 16, 0, 0, 0, 0, 0, 167),
       cftime.DatetimeNoLeap(2, 7, 16, 12, 0, 0, 0, 2, 197),
       cftime.DatetimeNoLeap(2, 8, 16, 12, 0, 0, 0, 5, 228),
       cftime.DatetimeNoLeap(2, 9, 16, 0, 0, 0, 0, 1, 259),
       cftime.DatetimeNoLeap(2, 10, 16, 12, 0, 0, 0, 3, 289),
       cftime.DatetimeNoLeap(2, 11, 16, 0, 0, 0, 0, 6, 320),
       cftime.DatetimeNoLeap(2, 12, 16, 12, 0, 0, 0, 1, 350)], dtype=object)
Coordinates:
  * time     (time) object 0001-01-16 12:00:00 ... 0002-12-16 12:00:00
Attributes:
    units:     days since 0001-01-01
    calendar:  noleap
    bounds:    time_bounds

resample returns meaningless dates

Resampling daily to monthly data using resample returns an erroneous time axis and time_bounds.

For instance:

case = 'b.e21.BWHIST.f09_g17.CMIP6-historical-WACCM.001'
droot = f'/glade/p/cgd/tss/people/oleson/{case}/cpl/hist.mon'
files = sorted(glob(f'{droot}/{case}.cpl.ha2x1d.2000-??.nc'))
ds = xr.open_mfdataset(files)
dsm = esmlab.resample(ds, freq='mon')

The original dataset was a year of daily data

ds.time

<xarray.DataArray 'time' (time: 365)>
array([cftime.DatetimeNoLeap(2000, 1, 1, 12, 0, 0, 0, 5, 1),
       cftime.DatetimeNoLeap(2000, 1, 2, 12, 0, 0, 0, 6, 2),
       cftime.DatetimeNoLeap(2000, 1, 3, 12, 0, 0, 0, 0, 3), ...,
       cftime.DatetimeNoLeap(2000, 12, 29, 12, 0, 0, 0, 3, 363),
       cftime.DatetimeNoLeap(2000, 12, 30, 12, 0, 0, 0, 4, 364),
       cftime.DatetimeNoLeap(2000, 12, 31, 12, 0, 0, 0, 5, 365)], dtype=object)
Coordinates:
  * time     (time) object 2000-01-01 12:00:00 ... 2000-12-31 12:00:00
Attributes:
    bounds:   time_bnds

The monthly dataset mysteriously begins on Jan 2 1850 and appears to be daily data, but is in fact monthly.

dsm.time

<xarray.DataArray 'time' (time: 12)>
array([cftime.DatetimeNoLeap(1850, 1, 2, 0, 0, 0, 0, 3, 2),
       cftime.DatetimeNoLeap(1850, 1, 3, 0, 0, 0, 0, 4, 3),
       cftime.DatetimeNoLeap(1850, 1, 4, 0, 0, 0, 0, 5, 4),
       cftime.DatetimeNoLeap(1850, 1, 5, 0, 0, 0, 0, 6, 5),
       cftime.DatetimeNoLeap(1850, 1, 6, 0, 0, 0, 0, 0, 6),
       cftime.DatetimeNoLeap(1850, 1, 7, 0, 0, 0, 0, 1, 7),
       cftime.DatetimeNoLeap(1850, 1, 8, 0, 0, 0, 0, 2, 8),
       cftime.DatetimeNoLeap(1850, 1, 9, 0, 0, 0, 0, 3, 9),
       cftime.DatetimeNoLeap(1850, 1, 10, 0, 0, 0, 0, 4, 10),
       cftime.DatetimeNoLeap(1850, 1, 11, 0, 0, 0, 0, 5, 11),
       cftime.DatetimeNoLeap(1850, 1, 12, 0, 0, 0, 0, 6, 12),
       cftime.DatetimeNoLeap(1850, 1, 13, 0, 0, 0, 0, 0, 13)], dtype=object)
Coordinates:
  * time     (time) object 1850-01-02 00:00:00 ... 1850-01-13 00:00:00
Attributes:
    bounds:   time_bnds

Even worse, time_bnds:

dsm.time_bnds

<xarray.DataArray 'time_bnds' (time: 12, ntb: 2)>
array([[cftime.DatetimeNoLeap(2000, 1, 1, 0, 0, 0, 0, 5, 1),
        cftime.DatetimeNoLeap(2000, 2, 1, 0, 0, 0, 0, 1, 32)],
       [cftime.DatetimeNoLeap(2000, 2, 1, 0, 0, 0, 0, 1, 32),
        cftime.DatetimeNoLeap(2000, 3, 1, 0, 0, 0, 0, 1, 60)],
       [cftime.DatetimeNoLeap(2000, 3, 1, 0, 0, 0, 0, 1, 60),
        cftime.DatetimeNoLeap(2000, 4, 1, 0, 0, 0, 0, 4, 91)],
       [cftime.DatetimeNoLeap(2000, 4, 1, 0, 0, 0, 0, 4, 91),
        cftime.DatetimeNoLeap(2000, 5, 1, 0, 0, 0, 0, 6, 121)],
       [cftime.DatetimeNoLeap(2000, 5, 1, 0, 0, 0, 0, 6, 121),
        cftime.DatetimeNoLeap(2000, 6, 1, 0, 0, 0, 0, 2, 152)],
       [cftime.DatetimeNoLeap(2000, 6, 1, 0, 0, 0, 0, 2, 152),
        cftime.DatetimeNoLeap(2000, 7, 1, 0, 0, 0, 0, 4, 182)],
       [cftime.DatetimeNoLeap(2000, 7, 1, 0, 0, 0, 0, 4, 182),
        cftime.DatetimeNoLeap(2000, 8, 1, 0, 0, 0, 0, 0, 213)],
       [cftime.DatetimeNoLeap(2000, 8, 1, 0, 0, 0, 0, 0, 213),
        cftime.DatetimeNoLeap(2000, 9, 1, 0, 0, 0, 0, 3, 244)],
       [cftime.DatetimeNoLeap(2000, 9, 1, 0, 0, 0, 0, 3, 244),
        cftime.DatetimeNoLeap(2000, 10, 1, 0, 0, 0, 0, 5, 274)],
       [cftime.DatetimeNoLeap(2000, 10, 1, 0, 0, 0, 0, 5, 274),
        cftime.DatetimeNoLeap(2000, 11, 1, 0, 0, 0, 0, 1, 305)],
       [cftime.DatetimeNoLeap(2000, 11, 1, 0, 0, 0, 0, 1, 305),
        cftime.DatetimeNoLeap(2000, 12, 1, 0, 0, 0, 0, 3, 335)],
       [cftime.DatetimeNoLeap(2000, 12, 1, 0, 0, 0, 0, 3, 335),
        cftime.DatetimeNoLeap(2001, 1, 1, 0, 0, 0, 0, 6, 1)]], dtype=object)
Coordinates:
  * time     (time) object 1850-01-02 00:00:00 ... 1850-01-13 00:00:00
Dimensions without coordinates: ntb

Trigger errors when coordinates don't match

I have encountered issues operating on files with lossy compression applied to coordinates. In these cases, xarray can do the wrong thing when applying its logic for coordinate alignment.

If the coordinates don't match, it takes the 'inner' set by default:

>>> da1 = xr.DataArray(np.ones(4), dims=('x'), coords={'x': [1.,2.,3.,4.]})
>>> da2 = xr.DataArray(2*np.ones(4), dims=('x'), coords={'x': [1.,2.,3.,4.01]})
>>> da1 * da2
<xarray.DataArray (x: 3)>
array([2., 2., 2.])
Coordinates:
  * x        (x) float64 1.0 2.0 3.0

This is seldom desired in the applications I am familiar with.

We could therefore wrap all instances where we are relying on automatic alignment in a context manager as follows.

with xr.set_options(arithmetic_join='exact'):
   dao = da1 * da2

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-19-4bb80efa7a54> in <module>()
      9 with xr.set_options(arithmetic_join='exact'):
---> 10     dao = da1 * da2

/glade/work/mclong/miniconda3/envs/py3/lib/python3.6/site-packages/xarray/core/dataarray.py in func(self, other)
   1965                 align_type = (OPTIONS['arithmetic_join']
   1966                               if join is None else join)
-> 1967                 self, other = align(self, other, join=align_type, copy=False)
   1968             other_variable = getattr(other, 'variable', other)
   1969             other_coords = getattr(other, 'coords', None)

/glade/work/mclong/miniconda3/envs/py3/lib/python3.6/site-packages/xarray/core/alignment.py in align(*objects, **kwargs)
    130                     raise ValueError(
    131                         'indexes along dimension {!r} are not equal'
--> 132                         .format(dim))
    133                 index = joiner(matching_indexes)
    134                 joined_indexes[dim] = index

ValueError: indexes along dimension 'x' are not equal

computing annual mean with resample(ds, freq='ann') yields incorrect results

When I compute annual means with ds_ann = esmlab.resample(ds, freq='ann'), I am getting incorrect results.

If I run the following python lines, using the function xr_ds_ex in xr_ds_ex.py.txt
that generates an example xarray.Dataset object, I see that while ds.var_ex is all ones, ds_ann.var_ex is 1/12. If the example dataset has 2 years of monthly ones, the computed mean is 1/24. So it looks like the result is effectively being incorrectly divided by the number of samples in the input.

I am using xarray 0.12.1 and esmlab as of commit c0903ad.

import esmlab
from xr_ds_ex import xr_ds_ex
ds = xr_ds_ex(decode_times=True, nyrs=1, var_const=True)
ds_ann = esmlab.resample(ds, freq='ann')
print(ds)
print(ds_ann)

<xarray.Dataset>
Dimensions: (d2: 2, time: 12)
Coordinates:

time (time) object 0001-01-16 12:00:00 ... 0001-12-16 12:00:00
Dimensions without coordinates: d2
Data variables:
var_ex (time) float64 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0
time_bounds (time, d2) object 0001-01-01 00:00:00 ... 0002-01-01 00:00:00
<xarray.Dataset>
Dimensions: (d2: 2, time: 1)
Coordinates:

time (time) object 0001-07-01 17:00:00
Dimensions without coordinates: d2
Data variables:
var_ex (time) float64 0.08333
time_bounds (time, d2) float64 0.0 365.0
Attributes:
history: \n2019-04-09 06:46:11.072221 esmlab.resample(, freq="a...

interface to esmlab-data

See
https://github.com/pydata/xarray/blob/aabda43c54c6336dc62cc9ec3c2050c9c656cff6/xarray/tutorial.py

Bug: `_get_weights_and_dims` doesn't work properly

Does anybody have a straightforward example of using the dim keyword on weighted_corr? I'm having issues with it. I'm testing the p-value function on, say, a lat/lon grid of data.

import numpy as np
import xarray as xr
import esmlab

base = np.random.rand(100,10,10)
x = xr.DataArray(np.random.rand(100,10,10) + base, dims=['time','lat','lon'])
y = xr.DataArray(np.random.rand(100,10,10) + base, dims=['time','lat','lon'])
# just looking to correlate over the 'time' dimension so a grid of correlation
# coefficients is returned.
r = esmlab.statistics.weighted_corr(x, y, dim=['time'], weights=None)

Originally posted by @bradyrx in #90 (comment)

Implement weighted rmsd, cov, and corr

Now that weighted mean, sum, and std are implemented in the merged PR #2, we can implement:

rmsd
cov
corr

esmlab.resample not found

@brianpm has the following issue.

I'm trying to use esmlab.resample(), but I’m getting the error that the module doesn’t have an attribute resample; same for esmlab.core.resample(). I can do from esmlab import core to get access to resample, but based on the __init__ file, I’m guessing the intention is to have it available as esmlab.core.

Anyway, I can work with it this way, but it wasn’t the behavior I expected.

@andersy005, can you help?

Release v2019.2.1

Update Changelog.rst
Publish on PyPI
Publish on conda-forge

significance tests

For our statistics functions, like weighted_cor it would be nice to enable computing significance metrics (p-value...).

@bradyrx, perhaps you can help with this?

statistics.weighted_mean does not work with single precision weights

esmlab version: 2019.1.0
xarray version: 0.11.0
Python version: 2.7.14
Operating System: this is on cheyenne

Description

I tried to compute the weighted average of a single-precision field, and it failed with the traceback

Traceback (most recent call last):
  File "./driver.py", line 45, in <module>
    AnalysisElement.do_analysis()
  File "/gpfs/fs1/work/mlevy/codes/marbl-diags/marbl_diags/analysis_elements_class.py", line 122, in do_analysis
    func(self, self._config_dict)
  File "/gpfs/fs1/work/mlevy/codes/marbl-diags/marbl_diags/analysis_ops.py", line 26, in plot_ann_climo
    _plot_climo(AnalysisElement, config_dict, valid_time_dims)
  File "/gpfs/fs1/work/mlevy/codes/marbl-diags/marbl_diags/analysis_ops.py", line 137, in _plot_climo
    fmean = esmlab.statistics.weighted_mean(field, TAREA).load().values
  File "/glade/work/mlevy/miniconda3/envs/NPL-conda/lib/python2.7/site-packages/esmlab/statistics.py", line 107, in weighted_mean
    weights = _apply_nan_mask(x, weights, avg_over_dims_v)
  File "/glade/work/mlevy/miniconda3/envs/NPL-conda/lib/python2.7/site-packages/esmlab/statistics.py", line 21, in _apply_nan_mask
    (weights / weights.sum(avg_over_dims_v)).sum(avg_over_dims_v), 1.0
  File "/glade/work/mlevy/miniconda3/envs/NPL-conda/lib/python2.7/site-packages/numpy/testing/nose_tools/utils.py", line 1396, in assert_allclose
    verbose=verbose, header=header, equal_nan=equal_nan)
  File "/glade/work/mlevy/miniconda3/envs/NPL-conda/lib/python2.7/site-packages/numpy/testing/nose_tools/utils.py", line 779, in assert_array_compare
    raise AssertionError(msg)
AssertionError:
Not equal to tolerance rtol=1e-07, atol=0

(mismatch 100.0%)
 x: array(1., dtype=float32)
 y: array(1.)

What I Did

More details are available in NCAR/marbl-diags#10, but I think the issue is in statistics.py:

def _apply_nan_mask(x, weights, avg_over_dims_v):
    weights = weights.where(x.notnull())
    np.testing.assert_allclose(
        (weights / weights.sum(avg_over_dims_v)).sum(avg_over_dims_v), 1.0
    )
return weights

The 1.0 argument to np.testing.assert_allclose needs to be the same type as weights. In my case, the weights are TAREA from /glade/p/cesm/bgcwg/obgc_diag/data/obs_data/WOA2005/WOA05_ann_avg_gx1v6.nc, which is a float (not a double).

wrong time values in results from compute_mon_anomaly and compute_mon_climatology

The time values returned by compute_mon_anomaly and compute_mon_climatology are wrong.

The python code in the following file, which has a .txt suffix to enable me to attach it, exhibits the problem:
esmlab_compute_mon.py.txt

It uses the function dset from tests/conftest.py to create an example dataset, and then prints time from dset, and the datasets returned by compute_mon_anomaly and compute_mon_climatology.

print(ds.time) returns:

<xarray.DataArray 'time' (time: 24)>
array([ 31, 59, 90, 120, 151, 181, 212, 243, 273, 304, 334, 365, 396, 424,
455, 485, 516, 546, 577, 608, 638, 669, 699, 730])
Coordinates:

time (time) int64 31 59 90 120 151 181 212 ... 577 608 638 669 699 730
Attributes:
units: days since 0001-01-01 00:00:00
calendar: noleap
bounds: time_bound

print(compute_mon_anomaly(ds).time) returns:

<xarray.DataArray 'time' (time: 24)>
array([-182.5, -182.5, -182.5, -182.5, -182.5, -182.5, -182.5, -182.5, -182.5,
-182.5, -182.5, -182.5, 182.5, 182.5, 182.5, 182.5, 182.5, 182.5,
182.5, 182.5, 182.5, 182.5, 182.5, 182.5])
Coordinates:

time (time) float64 -182.5 -182.5 -182.5 -182.5 ... 182.5 182.5 182.5
Attributes:
units: days since 0001-01-01 00:00:00
calendar: noleap
bounds: time_bound

I expect this to report the same values as print(ds.time), and it does not.

print(compute_mon_climatology(ds).time) returns:

<xarray.DataArray 'time' (time: 12)>
array([213.5, 241.5, 272.5, 302.5, 333.5, 363.5, 394.5, 425.5, 455.5, 486.5,
516.5, 547.5])
Coordinates:

time (time) float64 213.5 241.5 272.5 302.5 ... 455.5 486.5 516.5 547.5
Attributes:
long_name: time
units: days since 0001-01-01 00:00:00
bounds: time_bound

I expect this to report values corresponding to mid-Jan, mid-Feb, ..., and it does not.
I expect this to have the same calendar attribute that ds has, and it does not.

I am using the trunk of esmlab. conda list returns the following for xarray, numpy, and esmlab:

esmlab 2019.2.0+30.ged53a19 dev_0
numpy 1.15.4 py36h7e9f1db_0
numpy-base 1.15.4 py36hde5b4d6_0
xarray 0.11.3 pypi_0 pypi

Function to compute true monthly mean from daily mean.

Thanks for submitting an issue!

Here's a quick checklist in what to include:

Include a detailed description of the bug or suggestion
conda list of the conda environment you are using
Minimal, self-contained copy-pastable example that generates the issue if possible. Please be concise with code posted. See guidelines below on how to provide a good bug report:
- Minimal Complete Verifiable Examples
- Craft Minimal Bug Reports
Bug reports that follow these guidelines are easier to diagnose,
and so are often handled much more quickly.

Description

I was trying to fix some bad monthly and daily means from the CESM-CICE history files. @jessluo came up with a script to do a group my year and month. We thought this would be a nice addition to esmlab.

What I Did

The path of the data is:

/gpfs/fs1/p/cesm/pcwg/timeseries-cmip6/b.e21.B1850.f09_g17.CMIP6-piControl.001/ice/proc/tseries/day_1

case = "b.e21.B1850.f09_g17.CMIP6-piControl.001"
year1 = "0061"
year2 = "0099"
year1m = "0001"
year2m = "0099"

filename_month = "../month_1/"+case+".cice.h.aicen."+year1m+"01-"+year2m+"12.nc"
filename_day = case+".cice.h1.aicen_d."+year1+"0101-"+year2+"1231.nc"
filename1 = case+".cice.h1.aicen_d_fix."+year1+"0101-"+year2+"1231.nc"
filename2 = case+".cice.h.aicen_fix."+year1m+"01-"+year2m+"12.nc"

ds = xr.open_dataset(filename1)
ds_day = xr.open_dataset(filename_day)
ds_month = xr.open_dataset(filename_month)

aicen_d = ds['aicen_d']

times = ds_day.time_bounds.values
print(times)

leftbounds_yr = [x[0].timetuple()[0] for x in times]
leftbounds_mo = [x[0].timetuple()[1] for x in times]

yrmo = [cftime.datetime(y, m, 15) for y,m in zip(leftbounds_yr,leftbounds_mo)]
yrmo = np.array(yrmo)

print(yrmo.shape)

aicen_d_val = ds.aicen_d.values

aicen_d = xr.DataArray(aicen_d_val, coords={'time':yrmo, 'NCAT':ds.NCAT, 'TLON':ds.TLON, 'TLAT':ds.TLAT},
                       dims=('time','nc','nj','ni'))

aicen2 = aicen_d.groupby('time').mean(dim='time')

aicen = ds_month['aicen']

aicen.values[720:][:][:][:] = np.where(aicen2.values[:][:][:][:]>1.,1.0e30,aicen2.values[:][:][:][:])

print(aicen.values[0][4][370][180])

aicen.to_netcdf(filename2)

Paste the command(s) you ran and the output.
If there was a crash, please include the traceback here.

rename compute_ann_climatology

I think a more appropriate name for compute_ann_climatology would be compute_ann_mean.

This is sort of like a resample operation.

groupby("time.month").mean("time") drops time_bound variable

Description

compute_mon_climatology function fails when run on multiple POP2 history files (glade/scratch/altuntas/archive/g.e20.GIAF.T62_g37.test_d2m.001/ocn/hist/g.e20.GIAF.T62_g37.test_d2m.001.pop.h.0001-*.nc)

As far as I can tell, this is because groupby("time.month").mean("time") call drops the "time_bound" variable when instantiating computed_dset. This causes the function to fail at the following line within climatology.py

    computed_dset["month_bounds"] = (
        computed_dset[tb_name] - computed_dset[tb_name][0, 0]
    )

where tb_name=="time_bound"

my environment = ncar_pylib_20190118 on cheyenne

What I Did

# filePaths -> /glade/scratch/altuntas/archive/g.e20.GIAF.T62_g37.test_d2m.001/ocn/hist/g.e20.GIAF.T62_g37.test_d2m.001.pop.h.0001-*.nc
dset = xr.open_mfdataset(filePaths)
compute_mon_climatology(dset)

Interactive Debugging (starting at climatology.py line 53):

(Pdb) 'time_bound' in dset
True
(Pdb) 'time_bound' in dset.drop(grid_vars)
True
(Pdb) 'time_bound' in dset.drop(grid_vars).groupby("time.month").mean("time")
False
(Pdb) 'time_bound' in dset.drop(grid_vars).groupby("time.month").mean("time").rename({"month": "time"})
False

time operations where time_bounds span multiple averaging periods

There is an assumption within the functions in climatology.py that the time_bound of data fit concisely within the averaging period applied; this assumption is violated when computing monthly averages, say, on 5-day data. A more appropriate approach would be to compute averaging weights based on the portion of the time_bound that falls within the target averaging period.

Don't add NaNs for _FillValue

xarray sets a _FillValue = NaN on writing to file if the _FillValue field in encoding is not set. I think this is undesirable; the preferred behavior would be for the absence of _FillValue to be preserved.

This could be addressed in get_original_attrs by including something like the following.

    if '_FillValue' not in encoding:
        encoding['_FillValue'] = None

Scaling Study

@kmpaul, @matt-long, @jukent

I wanted to let you know that I added data collected during xarray/dask scaling preliminary study here.

Feel free to play around with the data. There are some uninteresting plots here :) https://nbviewer.jupyter.org/github/andersy005/esmlab/blob/master/benchmarks/plots.ipynb

I will come back to this task later

time_bound variable requirement when computing monthly anomaly, annual mean, etc

@matt-long, while looking at

def compute_mon_anomaly(dsm) function

tb_name,tb_dim = time_bound_var(dsm)
grid_vars = [v for v in dsm.variables if 'time' not in dsm[v].dims]
variables = [v for v in dsm.variables if 'time' in dsm[v].dims and v not in ['time',tb_name]]

# save attrs
attrs = {v:dsm[v].attrs for v in dsm.variables}
coords = {v:dsm[v].attrs for v in dsm.variables}
encoding = {v:{key:val for key,val in dsm[v].encoding.items()
               if key in ['dtype','_FillValue','missing_value']}
            for v in dsm.variables}

#-- compute time variable
time_values_orig = dsm.time.values
date = cftime.num2date(dsm[tb_name].mean(tb_dim),
                       units = dsm.time.attrs['units'],
                       calendar = dsm.time.attrs['calendar'])
dsm.time.values = date
if len(date)%12 != 0:
    raise ValueError('Time axis not evenly divisible by 12!')

#-- compute anomaly
ds = dsm.drop(grid_vars).groupby('time.month') - 
dsm.drop(grid_vars).groupby('time.month').mean('time')
ds.reset_coords('month',inplace=True)

#-- put grid_vars back
ds = xr.merge((ds,dsm.drop([v for v in dsm.variables if v not in grid_vars])))
ds.time.values = time_values_orig

attrs['month'] = {'long_name':'Month'}

# put the attributes back
for v in ds.variables:
    ds[v].attrs = attrs[v]

# put the encoding back
for v in ds.variables:
    if v in encoding:
        ds[v].encoding = encoding[v]

return ds

I noticed that you are finding the time_bound variable first. I am wondering whether this is a prerequisite condition for computing monthly anomaly? In other words, can we generalize this function? Is time_bound variable earth-system models specific? If not, can we just make it an optional argument? The goal would be to have a somewhat general function that one can use with most datasets.

pip install regrid issue

Building environments using a conda environment file with the following pip directive

git+https://github.com/NCAR/esmlab.git

leads to the following error

Pip subprocess error:
  Could not find a version that satisfies the requirement esmlab-regrid (from esmlab==2019.3.16+140.gd3ce9a5->-r /gpfs/u/home/mclong/codes/NCAR-pangeo-tutorial/environments/condaenv.grstg34t.requirements.txt (line 4)) (from versions: )
No matching distribution found for esmlab-regrid (from esmlab==2019.3.16+140.gd3ce9a5->-r /gpfs/u/home/mclong/codes/NCAR-pangeo-tutorial/environments/condaenv.grstg34t.requirements.txt (line 4))

cc @andersy005

Support flexible `weights` option in statistics.py module

Current functions in esmlab.statistics.py work when weights are in form of xr.DataArray only. The docstring states that functions expect weights to be array_like

esmlab/esmlab/statistics.py

Lines 48 to 78 in 538185f

 def weighted_sum(x, weights, dim=None): 

 """Reduce DataArray by applying `weighted sum` along some dimension(s). 

  Parameters 

  ---------- 

  x : DataArray object 

  xarray object for which to compute `weighted sum`. 

  weights : array_like 

  dim : str or sequence of str, optional 

  Dimension(s) over which to apply mean. By default `weighted sum` 

  is applied over all dimensions. 

  Returns 

  ------- 

  reduced : DataArray 

  New DataArray object with `weighted sum` applied to its data 

  and the indicated dimension(s) removed. 

  """ 

 sum_over_dims_v = _get_op_over_dims(x, weights, dim) 

 if not sum_over_dims_v: 

 raise ValueError("Unexpected dimensions for variable {0}".format(x.name)) 

 x_w_sum = (x * weights).sum(sum_over_dims_v) 

 original_attrs, original_encoding = get_original_attrs(x) 

 return update_attrs(x_w_sum, original_attrs, original_encoding)

It might be useful to support flexible options for this argument.

Reliance on metadata

I think we need a clear philosophy regarding the operation of esmlab in the context of it's reliance on metadata attributes.

I think there are two fundamental concerns.

Are we consistent and compliant with CF conventions in our interpretation and setting of attributes? @kmpaul might have the best perspective on this.
I think we should consider clear fallback strategies to enable functionality in the absence of CF compliant data. The package should either do something sensible, or there should be clear means by users can supply necessary information.

compute_ann_mean does not preserve time:calendar

I'm calling ds_ann = compute_ann_mean(ds) on an xarry Dataset where ds.time.attrs includes 'calendar' and am finding that ds_ann.time.attrs does not include 'calendar'.

The python code in the following file, which has a .txt suffix to enable me to attach it, exhibits the problem:
esmlab_compute_ann_mean_calendar.py.txt

The python creates an example Dataset object, ds, in which ds.time.attrrs includes 'calendar', calls ds_ann = compute_ann_mean(ds), and writes out ds.time.attrs and ds_ann.time.attrs.

Using the trunk of esmlab, I get the output:

OrderedDict([('units', 'days since 0001-01-01'), ('calendar', 'noleap'), ('bounds', 'time_bounds')])
OrderedDict([('long_name', 'time'), ('units', 'days since 0001-01-01 00:00:00'), ('bounds', 'time_bounds')])

I expect the 2nd line to include 'calendar', like the first line does, but it doesn't.

My reading of the code leads me to expect that this should be happening, but it is not working for me. I don't understand the code well enough to understand why it isn't working.

(It also looks like compute_ann_mean has added 'long_nametods_ann.time.attrs`, which I don't expect it to do.)

Release v2019.2.0

PyPI release
Conda-forge release

compute_time_var should accept year offset and should not require multiple of 12

Add new argument to compute_time_var: year_offset=None

Then after computing date, add something like the following

    if year_offset is not None and ~np.isnan(year_offset):
        date += cftime.date2num(datetime(int(year_offset),1,1),
                                       ds.time.attrs['units'],
                                       ds.time.attrs['calendar'])

Also, I think we should remove this check:

if len(date) % 12 != 0:
        raise ValueError("Time axis not evenly divisible by 12!")

Climatology functions

Integrations/interpolations

`xarray_extras`

xref: https://xarray-extras.readthedocs.io/en/latest/index.html

Cumulatives

cummean
compound_sum
compound_prod
compound_mean

Interpolate

splrep
splev

compute_ann_mean does not work if time is encoded

I'm calling ds_ann = compute_ann_mean(ds) on an xarray Dataset with ds.time is encoded, and it fails down in xarray.

The python code in the following file, which has a .txt suffix to enable me to attach it, exhibits the problem:
esmlab_compute_ann_mean_time_encoded.py.txt

The python creates an example Dataset that has time encoded. It accomplishes this by first creating an Dataset with time unencoded, writes that to a netcdf file, and reads it back in with xr.open_mfdataset. The script then calls ds_ann = compute_ann_means(ds) on the result, and I get the following error.

Traceback (most recent call last):
File "esmlab_compute_ann_mean_time_encoded.py", line 33, in
ds_ann = esmlab.climatology.compute_ann_mean(ds)
File "/glade/work/klindsay/miniconda3/envs/analysis/lib/python3.6/contextlib.py", line 52, in inner
return func(*args, **kwds)
File "/gpfs/fs1/work/klindsay/analysis/esmlab/esmlab/climatology.py", line 178, in compute_ann_mean
dset = tm.compute_time()
File "/gpfs/fs1/work/klindsay/analysis/esmlab/esmlab/utils/time.py", line 173, in compute_time
if self.time_bound is not None:
File "/gpfs/fs1/work/klindsay/analysis/esmlab/esmlab/utils/time.py", line 216, in time_bound
return xr.DataArray(tb_value, dims=(self.time_coord_name))
File "/glade/work/klindsay/miniconda3/envs/analysis/lib/python3.6/site-packages/xarray/core/dataarray.py", line 230, in init
variable = Variable(dims, data, attrs, encoding, fastpath=True)
File "/glade/work/klindsay/miniconda3/envs/analysis/lib/python3.6/site-packages/xarray/core/variable.py", line 260, in init
self._dims = self._parse_dimensions(dims)
File "/glade/work/klindsay/miniconda3/envs/analysis/lib/python3.6/site-packages/xarray/core/variable.py", line 424, in _parse_dimensions
% (dims, self.ndim))
ValueError: dimensions ('time',) must have the same length as the number of data dimensions, ndim=2

I am using the trunk of esmlab. conda list returns the following for xarray, numpy, and esmlab:

esmlab 2019.2.0+30.ged53a19 dev_0
numpy 1.15.4 py36h7e9f1db_0
numpy-base 1.15.4 py36hde5b4d6_0
xarray 0.11.3 pypi_0 pypi

This worked with versions of esmlab prior to:

commit a61c79a
Author: mclong [email protected]
Date: Thu Feb 21 20:42:52 2019 -0700

time_manager: returns ds with original time

rename `grid_vars` to `static_vars`

grid_vars is somewhat confusing; these are variables that don't change as a function of time, but they do not necessarily describe aspects of the grid.

A better name might be static_vars.

We are relying on the fact that a coordinate named time is in the dataset. This is obviously a bit fragile. Alternatives would be detect unlimited_dim (also fragile), have a user-specified fallback, or give up and don't do the operations that rely on static_vars. Actually, this reliance on time is pervasive and perhaps a bigger issue.

Issues with testing Python 2.7 on CircleCI

Python 2.7 conda env is not getting properly created. So far, it is defaulting to Python 3.7

Make _time.py, _variable.py modules public

Thoughts on dropping support for Python 2.7

@matt-long, @mnlevy1981, @kmpaul:

I would like to know your thoughts on dropping Python 2.7 suppport in future versions of esmlab.
I am currently working on integrating some functionality from xarray-extras namely interpolations and integrations functions (see #25) in esmlab. However, xarray-extras is only Python 3.5+ compatible. Since it is 2019 already, it is likely that the number of upstream packages that will drop support for Python 2.7 will definitely increase, making Python 2.7 support a painful exercise for us.

@mnlevy1981, I haven't forgotten the fact that cesm_postprocessing still relies on Python2.7 and that you'd like to use some features from esmlab in cesm_postprocessing. I am open to ideas on how to deal with this issue.

establish dask-mpi batch processing functionality

Based on a discussion today, I think we want to have a batch processing capability in esmlab based on dask-mpi.

All the functions in esmlab.climatology accept and return arguments of type xr.Dataset; explicit attention has been paid to ensure that the returned Dataset is suitable to write to file (attributes and dtypes preserved, etc.).

@kmpaul, @andersy005: can you please provide @alperaltuntas with a dask-mpi example?

cc: @mnlevy1981, @jukent

resample fails with datetime

I am having trouble with esmlab resample. I am working with monthly data with standard calendar.

Here are a two failure modes.

Read dataset with decode_times=True

In [1]: ds = xr.open_dataset(file_in, drop_variables='date', decode_times=True)

In [2]: ds_ann = esmlab.resample(ds, freq='ann')

Error:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-15-0640091b3d29> in <module>
----> 1 ds_ann = esmlab.resample(ds, freq='ann')

/gpfs/u/home/mclong/codes/esmlab/esmlab/core.py in resample(dset, freq, weights, time_coord_name)
    723
    724     else:
--> 725         ds = dset.esmlab.set_time(time_coord_name=time_coord_name).compute_ann_mean(weights=weights)
    726
    727     new_history = f'\n{datetime.now()} esmlab.resample(<DATASET>, freq="{freq}")'

/gpfs/u/home/mclong/codes/esmlab/esmlab/core.py in set_time(self, time_coord_name, year_offset)
    319                     self._ds[self.tb_name],
    320                     units=self.time_attrs['units'],--> 321                     calendar=self.time_attrs['calendar'],
    322                 )
    323                 self.time_bound.data = tb_data

cftime/_cftime.pyx in cftime._cftime.date2num()
TypeError: ufunc subtract cannot use operands with types dtype('<M8[ns]') and dtype('O')

Read dataset with decode_times=False

In [1]: file_in = '/glade/work/mclong/pco2-landschutzer/downloaded/MPI-SOM_FFN_GCB2018.nc'

In [2]: ds = xr.open_dataset(file_in, drop_variables='date', decode_times=False)

In [3]: ds_ann = esmlab.resample(ds, freq='ann')

Error:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-11-0640091b3d29> in <module>
----> 1 ds_ann = esmlab.resample(ds, freq='ann')

/gpfs/u/home/mclong/codes/esmlab/esmlab/core.py in resample(dset, freq, weights, time_coord_name)
    723
    724     else:
--> 725         ds = dset.esmlab.set_time(time_coord_name=time_coord_name).compute_ann_mean(weights=weights)
    726
    727     new_history = f'\n{datetime.now()} esmlab.resample(<DATASET>, freq="{freq}")'

/gpfs/u/home/mclong/codes/esmlab/esmlab/core.py in set_time(self, time_coord_name, year_offset)
    322                 )
    323                 self.time_bound.data = tb_data
--> 324         self.setup()
    325         return self
    326

/gpfs/u/home/mclong/codes/esmlab/esmlab/core.py in setup(self)
    327     def setup(self):
    328         self.get_variables()
--> 329         self.compute_time()
    330         self.get_original_metadata()
    331

/gpfs/u/home/mclong/codes/esmlab/esmlab/core.py in compute_time(self)
     80
     81         if self.time_bound is not None:
---> 82             groupby_coord = self.get_time_decoded(midpoint=True)
     83
     84         else:

/gpfs/u/home/mclong/codes/esmlab/esmlab/core.py in get_time_decoded(self, midpoint)
    181         time_out.data = xr.CFTimeIndex(
    182             cftime.num2date(
--> 183                 time_data, units=self.time_attrs['units'], calendar=self.time_attrs['calendar']
    184             )
    185         )

/glade/work/mclong/miniconda3/envs/dev/lib/python3.7/site-packages/xarray/coding/cftimeindex.py in __new__(cls, data, name)
    235         result = object.__new__(cls)
    236         result._data = np.array(data, dtype='O')
--> 237         assert_all_valid_date_type(result._data)
    238         result.name = name
    239         return result

/glade/work/mclong/miniconda3/envs/dev/lib/python3.7/site-packages/xarray/coding/cftimeindex.py in assert_all_valid_date_type(data)
    191             raise TypeError(
    192                 'CFTimeIndex requires cftime.datetime '
--> 193                 'objects. Got object of {}.'.format(date_type))
    194         if not all(isinstance(value, date_type) for value in data):
    195             raise TypeError(

TypeError: CFTimeIndex requires cftime.datetime objects. Got object of <class 'cftime._cftime.real_datetime'>.

Workaround:

In [1]: ds = xr.open_dataset(file_in, drop_variables='date', decode_times=False)

In [2]: ds['time'] = cftime.num2date(ds.time, units=ds.time.units, 
    ...:                              only_use_cftime_datetimes=True)

From cftime documentation:
only_use_cftime_datetimes: if False (default), datetime.datetime objects are returned from num2date where possible; if True dates which subclass cftime.datetime are returned for all calendars.

Conclusion: esmlab is not datetime friendly.

User Defined Catalog

Brainstorming

Create an abstract class representing a user defined catalog

class Catalog(object):
    """
    An abstract class representing a user defined Catalog.
    All user defined catalogs should subclass it. All subclasses should override
    ``build_catalog``, ``activate_catalog``, ``search``, ``get_files``, 
    """
    def build_catalog(self):
        """Builds catalog from a user defined YAML file"""
        raise NotImplementedError
        
    def activate_catalog(self):
        """Set the active catalog to work with"""
        raise NotImplementedError
        
    def search(self):
        """Find entries matching a query"""
        raise NotImplementedError
        
    def get_files(self):
        """Returns list of files matching a query"""
        raise NotImplementedError

Ccing @matt-long, let's discuss more about this.

Should esmlab functions return xarray objects for scalars?

esmlab version:`2019.1.0
xarray version: 0.11.2
Python version: 3.6.7
Operating System: macos 10.14.2

Description

As discussed in #28, confusion can arise when a scalar value is returned as an xarray object... for example, taking the weighted average of an array where the weights and array have the same dimensions. In those cases, it might be preferable to return just the values attribute of the xarray object.

What I Did

Had a conversation with @andersy005 that led to #28 and then thought more about what I expected to happen

CircleCI workflow improvement

remove duplicate call to _get_weights_and_dims

esmlab/esmlab/statistics.py

Line 147 in 878b80c

x_w_mean = weighted_sum(

Since this function calls weighted_sum, we're call _get_weights_and_dims twice.

We could change this

    x_w_mean = weighted_sum(
        x, weights=weights, dim=op_over_dims, apply_nan_mask=apply_nan_mask_flag
    ) / weights.sum(op_over_dims)

to this

x_w_mean = (x * weights).sum(op_over_dims) / weights.sum(op_over_dims)

resample fails on HadCRUT dataset

I'm trying to use resample to go from monthly to annual means in HadCRUT data. It seems to be having trouble with the time coordinate, and ultimately complains "TypeError: cannot perform reduce with flexible type".

import numpy as np
import xarray as xr
import esmlab
troot = Path("/glade/work/brianpm/HadCRUT")
tanomfil = xr.open_dataset(troot/"HadCRUT.4.6.0.0.median.nc", decode_times=False)
esmlab.resample(tanomfil, 'ann',  time_coord_name='time')

This was using esmlab '2019.4.27'
xarray 0.12.1
numpy 1.16.3
python 3.7.3

Forgot to mention: tried with decode_times=False and True.

hard-coded tb_name=time_bound in compute_time_var

https://github.com/NCAR/esmlab/blob/master/esmlab/utils/time.py#L73
should use dset[tb_name], not dset.time_bound

encoding of time and time_bounds differs in compute_ann_mean results for decode_time=True

If I use xr.open_dataset with decode_times=True (the default) to open a dataset ds, then the values of both ds.time and ds[tb_name] are converted to cftime objects (tb_name=ds.time.attrs['bounds']). If I execute
ds_ann = esmlab.climatology.compute_ann_mean(ds),
then the values of ds.time are also cftime objects, but the values of ds[tb_name] are not.
Is this difference intended? I find it confusing.

Avoid converting Dask arrays into NumPy arrays

Per documentation: http://xarray.pydata.org/en/stable/api.html#id5, .values returns array’s data as a numpy.ndarray. Therefore, we should avoid using .values attribute and use .data attribute which keeps data in their original array container (numpy or dask array) instead.

esmlab/esmlab/utils/time.py

Lines 178 to 180 in 878b80c


	self._ds[self.time_coord_name].values = groupby_coord.values

esmlab/esmlab/regrid.py

Lines 179 to 180 in 878b80c

 # pull data, dims and coords from incoming DataArray 

 data_src = da_in.values

esmlab/esmlab/climatology.py

Lines 71 to 72 in 878b80c

 computed_dset[tm.tb_name] = tm.time_bound - tm.time_bound[0, 0] 

 computed_dset[time_coord_name].values = computed_dset[tm.tb_name].mean(tm.tb_dim).values

Add Contribution Guide

compute_time_var is broken

I am not able to invoke the compute_time_var method on a dataset.

files = ['/glade/scratch/mclong/archive/g.e21.G1850ECOIAF.T62_g17.002/ocn/hist/g.e21.G1850ECOIAF.T62_g17.002.pop.h.0001-01.nc']
ds = xr.open_dataset(files[0], decode_times=False, decode_coords=False)
esmlab.EsmlabAccessor.compute_time_var(ds, midpoint=True, year_offset=0)

Yields the following.

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
/glade/work/mclong/miniconda3/envs/dev/lib/python3.7/site-packages/xarray/core/common.py in __setattr__(self, name, value)
    186                 # to avoid key lookups with attribute-style access.
--> 187                 self.__getattribute__(name)
    188             except AttributeError:

AttributeError: 'Dataset' object has no attribute 'year_offset'

During handling of the above exception, another exception occurred:

AttributeError                            Traceback (most recent call last)
<ipython-input-11-f321accd20b7> in <module>
----> 1 esmlab.EsmlabAccessor.compute_time_var(ds, midpoint=True, year_offset=0)

/gpfs/u/home/mclong/codes/esmlab/esmlab/core.py in compute_time_var(self, midpoint, year_offset)
    123         The dataset with time coordinate modified.
    124         """
--> 125         self.year_offset = year_offset
    126         ds = self._ds_time_computed.copy(True)
    127         ds[self.time_coord_name] = self.get_time_decoded(midpoint)

/glade/work/mclong/miniconda3/envs/dev/lib/python3.7/site-packages/xarray/core/common.py in __setattr__(self, name, value)
    190                     "cannot set attribute %r on a %r object. Use __setitem__ "
    191                     "style assignment (e.g., `ds['name'] = ...`) instead to "
--> 192                     "assign variables." % (name, type(self).__name__))
    193         object.__setattr__(self, name, value)
    194 

AttributeError: cannot set attribute 'year_offset' on a 'Dataset' object. Use __setitem__ style assignment (e.g., `ds['name'] = ...`) instead to assign variables.

If I try this:

ds.esmlab.compute_time_var()

I get the following.

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-12-20396bd7bb3d> in <module>
----> 1 ds.esmlab.compute_time_var()

/gpfs/u/home/mclong/codes/esmlab/esmlab/core.py in compute_time_var(self, midpoint, year_offset)
    124         """
    125         self.year_offset = year_offset
--> 126         ds = self._ds_time_computed.copy(True)
    127         ds[self.time_coord_name] = self.get_time_decoded(midpoint)
    128         return ds

AttributeError: 'NoneType' object has no attribute 'copy'

Implement custom accessors on xarray objects

Use xarray's xr.register_dataset_accessor(), and xr.register_dataarray_accessor() to extend xarray for esmlab.
Reimplement https://gist.github.com/andersy005/65988b939edd976d45247b7a1104e678 by taking advantage of xarray's custom accessors.

Tests refactoring

Esmlab tests need refactoring:

Add pytest fictures for data sets being used
Add improved assertion statements

_get_weights_and_dims should accept tuple dim argument

This line

esmlab/esmlab/statistics.py

Line 39 in 878b80c

elif isinstance(dim, list):

should check for iterable, not just list.

Getting scalar values of the weighted average when input is 2D and weights is a 2D array

#!/usr/bin/env python

import xarray as xr
import esmlab

# Read in a CESM climatology file
in_files = []
for n in range(0,12):
  in_files.append("/glade/work/mclong/woa2013v2/POP_gx1v7/woa13_all_n{:02d}_gx1v7.nc".format(n))
ds = xr.open_mfdataset(in_files, decode_times=False)

TAREA = ds.TAREA.isel(time=0)
field = ds.NO3.isel(z_t=0).mean('time')
print(esmlab.statistics.weighted_mean(field, TAREA))

enable `time_coord_name` to be different from "time"

We assume throughout that the time coordinate is named "time", but this is overly restrictive.

We should introduce a variable time_coord_name and accept optional user input for this.

If the user does not supply time_coord_name, attempt to infer by looking for coordinate named "time", or checking for unlimited_dim.

Many instance like "time.month" will have to be changed to variables too:
time_dot_month = '.'.join([time_coord_name, "month"])

	def weighted_sum(x, weights, dim=None):
	"""Reduce DataArray by applying `weighted sum` along some dimension(s).

	Parameters
	----------

	x : DataArray object
	xarray object for which to compute `weighted sum`.

	weights : array_like

	dim : str or sequence of str, optional
	Dimension(s) over which to apply mean. By default `weighted sum`
	is applied over all dimensions.

	Returns
	-------

	reduced : DataArray
	New DataArray object with `weighted sum` applied to its data
	and the indicated dimension(s) removed.
	"""

	sum_over_dims_v = _get_op_over_dims(x, weights, dim)
	if not sum_over_dims_v:
	raise ValueError("Unexpected dimensions for variable {0}".format(x.name))

	x_w_sum = (x * weights).sum(sum_over_dims_v)

	original_attrs, original_encoding = get_original_attrs(x)
	return update_attrs(x_w_sum, original_attrs, original_encoding)

	# pull data, dims and coords from incoming DataArray
	data_src = da_in.values

	computed_dset[tm.tb_name] = tm.time_bound - tm.time_bound[0, 0]
	computed_dset[time_coord_name].values = computed_dset[tm.tb_name].mean(tm.tb_dim).values

ncar / esmlab Goto Github PK

esmlab's Introduction

⚠️⚠️ ESMLab functionality has been moved into https://github.com/NCAR/geocat-comp. ⚠️⚠️

ESMLab: Earth System Model Lab

Installation

esmlab's People

Contributors

Stargazers

Watchers

Forkers

esmlab's Issues

Description

What I Did

Description

What I Did

Description

What I Did

Interactive Debugging (starting at climatology.py line 53):

xarray_extras

Cumulatives

Interpolate

Description

What I Did

Recommend Projects

Recommend Topics

Recommend Org

Jobs

`xarray_extras`