Comments (8)
Note that native xarray
just drops variables that can't be sampled in the requested way. It would be a nice fix to avoid that behavior. Although you might want to throw a warning that certain variables weren't resampled as requested due to data type.
import numpy as np
import pandas as pd
import xarray as xr
A = xr.DataArray(np.random.rand(100,), dims='time')
A.name = 'ints'
B = xr.DataArray(['str']*100, dims='time')
B.name = 'strs'
ds = xr.merge([A, B])
ds['time'] = pd.date_range('1900', '2000', freq='A')
print(ds)
<xarray.Dataset>
Dimensions: (time: 100)
Coordinates:
* time (time) datetime64[ns] 1900-12-31 1901-12-31 ... 1999-12-31
Data variables:
ints (time) float64 0.8798 0.4835 0.6354 0.354 ... 0.1584 0.7635 0.8525
strs (time) <U3 'str' 'str' 'str' 'str' ... 'str' 'str' 'str' 'str'
ds.resample(time='A').mean()
<xarray.Dataset>
Dimensions: (time: 100)
Coordinates:
* time (time) datetime64[ns] 1900-12-31 1901-12-31 ... 1999-12-31
Data variables:
ints (time) float64 0.8798 0.4835 0.6354 0.354 ... 0.1584 0.7635 0.8525
from esmlab.
@brianpm, I am going to look into this and will get back to you
from esmlab.
- Open the dataset with
decode_times=True
The issue is caused by the field_status
variable datatype (which is a string). Esmlab is trying to apply resample
to all data variables including field_status
and it is failing due its field_status
data type:
In [14]: tanomfil
Out[14]:
<xarray.Dataset>
Dimensions: (latitude: 36, longitude: 72, time: 2030)
Coordinates:
* latitude (latitude) float32 -87.5 -82.5 -77.5 ... 77.5 82.5 87.5
* longitude (longitude) float32 -177.5 -172.5 ... 172.5 177.5
* time (time) datetime64[ns] 1850-01-16T12:00:00 ... 2019-02-15
Data variables:
temperature_anomaly (time, latitude, longitude) float32 ...
field_status (time) |S1 ...
One solution is to drop this variable:
In [16]: ds = tanomfil.drop('field_status')
In [17]: esmlab.resample(ds, 'ann', time_coord_name='time')
/glade/work/abanihi/softwares/miniconda3/envs/analysis/lib/python3.7/site-packages/xarray/core/nanops.py:159: RuntimeWarning: Mean of empty slice
return np.nanmean(a, axis=axis, dtype=dtype)
Out[17]:
<xarray.Dataset>
Dimensions: (latitude: 36, longitude: 72, time: 170)
Coordinates:
* latitude (latitude) float64 -87.5 -82.5 -77.5 ... 77.5 82.5 87.5
* longitude (longitude) float64 -177.5 -172.5 ... 172.5 177.5
* time (time) datetime64[ns] 1850-07-01T17:00:00 ... 2019-01-31T06:00:00
Data variables:
temperature_anomaly (time, latitude, longitude) float64 nan nan ... nan nan
Attributes:
history: \n2019-05-01 13:14:37.171125 esmlab.resample(<DATASET>, freq="a...
Another solution is to set field_status
as a coordinate variable and then call resample:
In [27]: tanomfil = tanomfil.set_coords(['field_status'])
In [28]: tanomfil
Out[28]:
<xarray.Dataset>
Dimensions: (latitude: 36, longitude: 72, time: 2030)
Coordinates:
* latitude (latitude) float32 -87.5 -82.5 -77.5 ... 77.5 82.5 87.5
* longitude (longitude) float32 -177.5 -172.5 ... 172.5 177.5
* time (time) datetime64[ns] 1850-01-16T12:00:00 ... 2019-02-15
field_status (time) |S1 b'f' b'f' b'f' b'f' ... b'p' b'p' b'p' b'p'
Data variables:
temperature_anomaly (time, latitude, longitude) float32 ...
Attributes:
title: HadCRUT4 near-surface temperature ensemble data -...
institution: Met Office Hadley Centre / Climatic Research Unit...
history: Updated at 25/03/2019 16:03:56
source: CRUTEM.4.6.0.0, HadSST.3.1.1.0
comment:
reference: Morice, C. P., J. J. Kennedy, N. A. Rayner, and P...
version: HadCRUT.4.6.0.0
Conventions: CF-1.0
ensemble_members: 100
ensemble_member_index: 0
In [29]: esmlab.resample(tanomfil, 'ann', time_coord_name='time')
/glade/work/abanihi/softwares/miniconda3/envs/analysis/lib/python3.7/site-packages/xarray/core/nanops.py:159: RuntimeWarning: Mean of empty slice
return np.nanmean(a, axis=axis, dtype=dtype)
Out[29]:
<xarray.Dataset>
Dimensions: (latitude: 36, longitude: 72, time: 170)
Coordinates:
* latitude (latitude) float64 -87.5 -82.5 -77.5 ... 77.5 82.5 87.5
* longitude (longitude) float64 -177.5 -172.5 ... 172.5 177.5
* time (time) datetime64[ns] 1850-07-01T17:00:00 ... 2019-01-31T06:00:00
Data variables:
temperature_anomaly (time, latitude, longitude) float64 nan nan ... nan nan
Attributes:
history: \n2019-05-01 13:20:25.645865 esmlab.resample(<DATASET>, freq="a...
from esmlab.
That makes sense.
A nice alternative would be to be able to apply resample to just a DataArray instead of the whole dataset. Is that possible, or is then time_bnds
then super problematic?
from esmlab.
A nice alternative would be to be able to apply resample to just a DataArray instead of the whole dataset.
One easy solution for this would be to allow a user to pass a list of variables they would like to apply the resample on. For example:
esmlab.resample(tanomfil, 'ann', time_coord_name='time', data_vars=['temperature_anomaly'])
@matt-long, any thoughts?
from esmlab.
I imagine you could also do a quick check for variable type when you loop through data_vars
. If you encounter a str
, datetime
, etc. you don't apply the resample there.
from esmlab.
@bradyrx, we are using dset.apply()
internally, and I believe this is where the problem is coming from:
Lines 488 to 505 in 8ced4d6
I am curious to know how one could add a check before calling .apply()
from esmlab.
Here's a naive solution off the top of my head (using slightly modified weighted mean function here):
def weighted_mean_arr(darr, wgts=1, time_coord_name='time'):
# if NaN are present, we need to use individual weights
cond = darr.isnull()
ones = xr.where(cond, 0.0, 1.0)
mask = (
darr.resample({time_coord_name: 'A'}).mean(dim=time_coord_name).notnull()
)
da_sum = (
(darr * wgts).resample({time_coord_name: 'A'}).sum(dim=time_coord_name)
)
ones_out = (
(ones * wgts).resample({time_coord_name: 'A'}).sum(dim=time_coord_name)
)
ones_out = ones_out.where(ones_out > 0.0)
da_weighted_mean = da_sum / ones_out
return da_weighted_mean.where(mask)
import numpy as np
import pandas as pd
import xarray as xr
# Generate monthly dummy data with two strings variables
A = xr.DataArray(np.random.rand(120,), dims='time')
A.name = 'ints'
B = xr.DataArray(['str']*120, dims='time')
B.name = 'strs'
C = xr.DataArray(['foo']*120, dims='time')
C.name = 'strs2'
ds = xr.merge([A, B, C])
ds['time'] = pd.date_range('1900-01', '1910-01', freq='M')
print(ds)
<xarray.Dataset>
Dimensions: (time: 120)
Coordinates:
* time (time) datetime64[ns] 1900-01-31 1900-02-28 ... 1909-12-31
Data variables:
ints (time) float64 0.3267 0.4233 0.4777 0.6476 ... 0.3189 0.9488 0.9251
strs (time) <U3 'str' 'str' 'str' 'str' ... 'str' 'str' 'str' 'str'
strs2 (time) <U3 'foo' 'foo' 'foo' 'foo' ... 'foo' 'foo' 'foo' 'foo'
# this list probably isn't exhaustive
acceptable_types = [np.dtype('float'), np.dtype('int'), np.dtype('double')]
drop_vars = []
for var in ds.data_vars:
if np.asarray(ds[var]).dtype not in acceptable_types:
drop_vars.append(var)
# extract variables that can't be extracted into separate dataset
no_apply = ds[drop_vars]
# apply function only to non-string types
resampled = ds.drop(drop_vars).apply(weighted_mean_arr)
print(resampled)
<xarray.Dataset>
Dimensions: (time: 10)
Coordinates:
* time (time) datetime64[ns] 1900-12-31 1901-12-31 ... 1909-12-31
Data variables:
ints (time) float64 0.4236 0.5849 0.5722 0.4475 ... 0.4421 0.5669 0.5151
print(no_apply)
<xarray.Dataset>
Dimensions: (time: 120)
Coordinates:
* time (time) datetime64[ns] 1900-01-31 1900-02-28 ... 1909-12-31
Data variables:
strs (time) <U3 'str' 'str' 'str' 'str' ... 'str' 'str' 'str' 'str'
strs2 (time) <U3 'foo' 'foo' 'foo' 'foo' ... 'foo' 'foo' 'foo' 'foo'
As in, you split the Dataset into two separate Datasets, apply the function, and then merge them in the end. The issue is now you have two time
dimensions with different coordinates. At this point, you'd have to rename one of them something like TIME
(which LANL does for some of their netCDF files with differing temporal resolutions). Although I don't really like this solution, as it gets confusing.
In the end, should you really be maintaining a Dataset
with different resolutions on the time axis? Maybe xarray
has it right to just drop any non-quantitative variables. (I.e., just ditch no_apply
in this case).
from esmlab.
Related Issues (20)
- Climatology is broken HOT 1
- more general resample utility HOT 1
- Remove the old benchmarks notebooks
- Least squares polynomial fit with Dask HOT 4
- Potential GroupBy() bottleneck
- Failing tests in tests/test_core.py
- Esmlab Design Document HOT 8
- sel_time with optional methods HOT 1
- cannot compute annual mean with xarray 0.14.0 HOT 4
- add upstream master test env HOT 2
- Fix Tests
- indexes related error from xarray v0.14.0 when calling esmlab.resample(ds, freq='ann') HOT 7
- Move general functionality upstream HOT 9
- esmlab operations clobber existing ds.attrs['history'] values
- esmlab.climatology does not propagate all coordinate variables
- xarray error after moving from 3897727 to f3a548d HOT 5
- esmlab.anomaly generating error, unrealistically large time values HOT 5
- esmlab.resample reverses order of dimensions in time bounds variable
- Move weighted reductions to xarray
- esmlab.resample() error with xarray 0.15.1 HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from esmlab.