Comments (8)
A better version of the script:
#!/usr/bin/env python
import xarray as xr
import numpy as np
import esmlab
# Read in WOA observational data (in 12 files)
in_files = []
for n in range(0,12):
in_files.append("/glade/work/mclong/woa2013v2/POP_gx1v7/woa13_all_n{:02d}_gx1v7.nc".format(n))
ds = xr.open_mfdataset(in_files, decode_times=False)
# Reduce dataset
# 1. Want NO3 averaged over all files
field = ds.NO3.isel(z_t=0).mean('time')
# 2. TAREA is identical across files, so just use the first time dimension of it
TAREA = ds.TAREA.isel(time=0)
# Statistics
print("field min: {:.2f}".format(np.nanmin(field)))
print("field max: {:.2f}".format(np.nanmax(field)))
print("field weighted mean:\n{}".format(esmlab.statistics.weighted_mean(field, TAREA)))
Output:
/glade/work/mlevy/miniconda3/envs/NPL-conda/lib/python2.7/site-packages/dask/array/numpy_compat.py:28: RuntimeWarning: invalid value encountered in divide
x = np.divide(x1, x2, out)
field min: 0.00
field max: 33.95
field weighted mean:
<xarray.DataArray ()>
dask.array<shape=(), dtype=float64, chunksize=()>
Coordinates:
z_t float64 500.0
time float64 6.0
(I don't think the numpy_compat.py
warning affects anything... numpy
doesn't complain about it with nanmin
and nanmax
, and I can still plot field
in other scripts)
from esmlab.
@mnlevy1981, to make sure I am getting this, what should be the expected output's dimensions (shape) of esmlab.statistics.weighted_mean(field, TAREA)
?
With
# 1. Want NO3 averaged over all files
field = ds.NO3.isel(z_t=0).mean('time')
# 2. TAREA is identical across files, so just use the first time dimension of it
TAREA = ds.TAREA.isel(time=0)
field
, and TAREA
are as follows:
In [20]: field.shape
Out[20]: (384, 320)
In [21]: TAREA.shape
Out[21]: (384, 320)
from esmlab.
It should be a scalar quantity. I just realized I could come up with an even simpler example using 1D arrays rather than reading data from disk, but I need to run... I'll try to find time tonight to post more. But the gist is that if
TAREA = [1, 1, 2]
field = [1, 3, 8]
Then I expect the weighted mean to be 5: the weighted sum of field
is 20, and the weights sum to 4 so the weighted mean is 20/4 = 5.
from esmlab.
I think I figured it out:
- Due to
xr.open_mfdataset()
, xarray loads data into dask arrays instead of in memory NumPy arrays. The implication of this is that any subsequent operation is likely to return an xarray object withdask
arrays. - There are two solutions to this:
- Force
xarray
to load the entire data set in memory with
- Force
ds = xr.open_mfdataset(in_files, decode_times=False).load()
Or
- Tell Dask to compute and return and an in-memory result
w_mean = esmlab.statistics.weighted_mean(field, TAREA).data.compute()
In [47]: field
Out[47]:
<xarray.DataArray 'NO3' (nlat: 384, nlon: 320)>
array([[ nan, nan, nan, ..., nan, nan, nan],
[ nan, nan, nan, ..., nan, nan, nan],
[22.465448, 22.383131, 22.292166, ..., nan, nan, nan],
...,
[ nan, nan, nan, ..., nan, nan, nan],
[ nan, nan, nan, ..., nan, nan, nan],
[ nan, nan, nan, ..., nan, nan, nan]],
dtype=float32)
Coordinates:
TLAT (nlat, nlon) float64 -79.22 -79.22 -79.22 ... 72.2 72.19 72.19
TLONG (nlat, nlon) float64 -39.44 -38.31 -37.19 ... -41.08 -40.65 -40.22
z_t float64 500.0
Dimensions without coordinates: nlat, nlon
In [48]: TAREA
Out[48]:
<xarray.DataArray 'TAREA' (nlat: 384, nlon: 320)>
dask.array<shape=(384, 320), dtype=float64, chunksize=(384, 320)>
Coordinates:
TLAT (nlat, nlon) float64 -79.22 -79.22 -79.22 ... 72.2 72.19 72.19
TLONG (nlat, nlon) float64 -39.44 -38.31 -37.19 ... -41.08 -40.65 -40.22
time float64 6.0
Dimensions without coordinates: nlat, nlon
Attributes:
long_name: area of T cells
units: cm^2
In [49]: w_mean = esmlab.statistics.weighted_mean(field, TAREA)
In [51]: w_mean
Out[51]:
<xarray.DataArray ()>
dask.array<shape=(), dtype=float64, chunksize=()>
Coordinates:
z_t float64 500.0
time float64 6.0
In [52]: w_mean.data
Out[52]: dask.array<truediv, shape=(), dtype=float64, chunksize=()>
In [53]: w_mean.data.compute()
Out[53]: 5.146317884602825
In [54]: np.nanmean(field)
Out[54]: 5.7132335
from esmlab.
@mnlevy1981,
Oooh one more thing, if you want your result to be an xarray object,
do:
w_mean = esmlab.statistics.weighted_mean(field, TAREA).load()
instead of:
w_mean = esmlab.statistics.weighted_mean(field, TAREA).data.compute()
from esmlab.
Thanks! The line
print("field weighted mean: {:.2f}".format(esmlab.statistics.weighted_mean(field, TAREA).load().values))
Works as expected. Is accessing values
directly the preferred method of getting a scalar out of an xarray
object? It looks like
print("field weighted mean: {:.2f}".format(esmlab.statistics.weighted_mean(field, TAREA).load().item()))
is another option.
from esmlab.
I think this is the appropriate way:
print("field weighted mean: {:.2f}".format(esmlab.statistics.weighted_mean(field, TAREA).load().values))
xref: http://xarray.pydata.org/en/stable/generated/xarray.DataArray.values.html#xarray.DataArray.values
from esmlab.
@mnlevy1981, should I close this? Or free feel to close it if you are satisfied with the conclusion from our last week's meeting.
from esmlab.
Related Issues (20)
- Climatology is broken HOT 1
- more general resample utility HOT 1
- Remove the old benchmarks notebooks
- Least squares polynomial fit with Dask HOT 4
- Potential GroupBy() bottleneck
- Failing tests in tests/test_core.py
- Esmlab Design Document HOT 8
- sel_time with optional methods HOT 1
- cannot compute annual mean with xarray 0.14.0 HOT 4
- add upstream master test env HOT 2
- Fix Tests
- indexes related error from xarray v0.14.0 when calling esmlab.resample(ds, freq='ann') HOT 7
- Move general functionality upstream HOT 9
- esmlab operations clobber existing ds.attrs['history'] values
- esmlab.climatology does not propagate all coordinate variables
- xarray error after moving from 3897727 to f3a548d HOT 5
- esmlab.anomaly generating error, unrealistically large time values HOT 5
- esmlab.resample reverses order of dimensions in time bounds variable
- Move weighted reductions to xarray
- esmlab.resample() error with xarray 0.15.1 HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from esmlab.