GithubHelp home page GithubHelp logo

Comments (8)

mnlevy1981 avatar mnlevy1981 commented on July 18, 2024

A better version of the script:

#!/usr/bin/env python

import xarray as xr
import numpy as np
import esmlab

# Read in WOA observational data (in 12 files)
in_files = []
for n in range(0,12):
  in_files.append("/glade/work/mclong/woa2013v2/POP_gx1v7/woa13_all_n{:02d}_gx1v7.nc".format(n))
ds = xr.open_mfdataset(in_files, decode_times=False)

# Reduce dataset
# 1. Want NO3 averaged over all files
field = ds.NO3.isel(z_t=0).mean('time')
# 2. TAREA is identical across files, so just use the first time dimension of it
TAREA = ds.TAREA.isel(time=0)

# Statistics
print("field min: {:.2f}".format(np.nanmin(field)))
print("field max: {:.2f}".format(np.nanmax(field)))
print("field weighted mean:\n{}".format(esmlab.statistics.weighted_mean(field, TAREA)))

Output:

/glade/work/mlevy/miniconda3/envs/NPL-conda/lib/python2.7/site-packages/dask/array/numpy_compat.py:28: RuntimeWarning: invalid value encountered in divide
  x = np.divide(x1, x2, out)
field min: 0.00
field max: 33.95
field weighted mean:
<xarray.DataArray ()>
dask.array<shape=(), dtype=float64, chunksize=()>
Coordinates:
    z_t      float64 500.0
    time     float64 6.0

(I don't think the numpy_compat.py warning affects anything... numpy doesn't complain about it with nanmin and nanmax, and I can still plot field in other scripts)

from esmlab.

andersy005 avatar andersy005 commented on July 18, 2024

@mnlevy1981, to make sure I am getting this, what should be the expected output's dimensions (shape) of esmlab.statistics.weighted_mean(field, TAREA)?

With

# 1. Want NO3 averaged over all files
field = ds.NO3.isel(z_t=0).mean('time')
# 2. TAREA is identical across files, so just use the first time dimension of it
TAREA = ds.TAREA.isel(time=0)

field, and TAREA are as follows:

In [20]: field.shape
Out[20]: (384, 320)

In [21]: TAREA.shape
Out[21]: (384, 320)

from esmlab.

mnlevy1981 avatar mnlevy1981 commented on July 18, 2024

It should be a scalar quantity. I just realized I could come up with an even simpler example using 1D arrays rather than reading data from disk, but I need to run... I'll try to find time tonight to post more. But the gist is that if

TAREA = [1, 1, 2]
field = [1, 3, 8]

Then I expect the weighted mean to be 5: the weighted sum of field is 20, and the weights sum to 4 so the weighted mean is 20/4 = 5.

from esmlab.

andersy005 avatar andersy005 commented on July 18, 2024

I think I figured it out:

  1. Due to xr.open_mfdataset(), xarray loads data into dask arrays instead of in memory NumPy arrays. The implication of this is that any subsequent operation is likely to return an xarray object with dask arrays.
  2. There are two solutions to this:
    • Force xarray to load the entire data set in memory with
ds = xr.open_mfdataset(in_files, decode_times=False).load()

Or

  • Tell Dask to compute and return and an in-memory result
w_mean = esmlab.statistics.weighted_mean(field, TAREA).data.compute()
In [47]: field
Out[47]:
<xarray.DataArray 'NO3' (nlat: 384, nlon: 320)>
array([[      nan,       nan,       nan, ...,       nan,       nan,       nan],
       [      nan,       nan,       nan, ...,       nan,       nan,       nan],
       [22.465448, 22.383131, 22.292166, ...,       nan,       nan,       nan],
       ...,
       [      nan,       nan,       nan, ...,       nan,       nan,       nan],
       [      nan,       nan,       nan, ...,       nan,       nan,       nan],
       [      nan,       nan,       nan, ...,       nan,       nan,       nan]],
      dtype=float32)
Coordinates:
    TLAT     (nlat, nlon) float64 -79.22 -79.22 -79.22 ... 72.2 72.19 72.19
    TLONG    (nlat, nlon) float64 -39.44 -38.31 -37.19 ... -41.08 -40.65 -40.22
    z_t      float64 500.0
Dimensions without coordinates: nlat, nlon

In [48]: TAREA
Out[48]:
<xarray.DataArray 'TAREA' (nlat: 384, nlon: 320)>
dask.array<shape=(384, 320), dtype=float64, chunksize=(384, 320)>
Coordinates:
    TLAT     (nlat, nlon) float64 -79.22 -79.22 -79.22 ... 72.2 72.19 72.19
    TLONG    (nlat, nlon) float64 -39.44 -38.31 -37.19 ... -41.08 -40.65 -40.22
    time     float64 6.0
Dimensions without coordinates: nlat, nlon
Attributes:
    long_name:  area of T cells
    units:      cm^2

In [49]: w_mean = esmlab.statistics.weighted_mean(field, TAREA)

In [51]: w_mean
Out[51]:
<xarray.DataArray ()>
dask.array<shape=(), dtype=float64, chunksize=()>
Coordinates:
    z_t      float64 500.0
    time     float64 6.0

In [52]: w_mean.data
Out[52]: dask.array<truediv, shape=(), dtype=float64, chunksize=()>

In [53]: w_mean.data.compute()
Out[53]: 5.146317884602825

In [54]: np.nanmean(field)
Out[54]: 5.7132335

from esmlab.

andersy005 avatar andersy005 commented on July 18, 2024

@mnlevy1981,
Oooh one more thing, if you want your result to be an xarray object,

do:

w_mean = esmlab.statistics.weighted_mean(field, TAREA).load()

instead of:

w_mean = esmlab.statistics.weighted_mean(field, TAREA).data.compute()

from esmlab.

mnlevy1981 avatar mnlevy1981 commented on July 18, 2024

Thanks! The line

print("field weighted mean: {:.2f}".format(esmlab.statistics.weighted_mean(field, TAREA).load().values))

Works as expected. Is accessing values directly the preferred method of getting a scalar out of an xarray object? It looks like

print("field weighted mean: {:.2f}".format(esmlab.statistics.weighted_mean(field, TAREA).load().item()))

is another option.

from esmlab.

andersy005 avatar andersy005 commented on July 18, 2024

I think this is the appropriate way:

print("field weighted mean: {:.2f}".format(esmlab.statistics.weighted_mean(field, TAREA).load().values))

xref: http://xarray.pydata.org/en/stable/generated/xarray.DataArray.values.html#xarray.DataArray.values

from esmlab.

andersy005 avatar andersy005 commented on July 18, 2024

@mnlevy1981, should I close this? Or free feel to close it if you are satisfied with the conclusion from our last week's meeting.

from esmlab.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.