Comments (13)
It works now for STD_DEV, but of course, the weighted standard deviation is not the standard deviation of the weighted values (doh).
I can provide you with a clean function of the above for having a weighted mean, though.
I do not see any other solution to this than writing actual iris aggregators with - based on the iris version we use - xarray or daskarray supported lazy functions yourself.
I'll have to deep dive into the writing of custom aggregators and the packages xarray and dask to provide these aggregators to our codes. Whenever I got there, I'll have them proposed to iris.
from esmvalcore.
OK - I found something very useful: here is a little function:
def simple_area(cube, coord1, coord2):
grid_areas = iris.analysis.cartography.area_weights(cube)
print('LAZY', cube.has_lazy_data())
result = cube.collapsed([coord1, coord2],
iris.analysis.MEAN)
weights=grid_areas)
print('LAZY', result.has_lazy_data())
return result
if you run this on a 500MB file with weights in it will chuck in a max mem=1.3GB, if you don't apply the weights it will be blindingly fast and max mem=20MB. Note that with weights in the result cube will NOT have lazy data - @bjlittle why is it that using weights as keyword arg for collapsed
data gets realized and as such, performance drops like a rock?
from esmvalcore.
argh! I see now - in cube.py
when passing weights as kwarg the aggregation is not lazy anymore 😢
from esmvalcore.
OK - good resources:
SciTools/iris#3280
SciTools/iris#2418
@ledm how important are those weights in the area average calculations? ie is it something that alters the results by a significant margin that justifies their use?
from esmvalcore.
ah nevermind! Lack of Area weights may introduce biases of avg 10degs for temperature (just tested with/without weights)
from esmvalcore.
I'm currently trying to resolve this in c3s_511:
A code snippet to work around weighting memory while increasing cpu time is:
def __apply_fun2cube__(self, cube, dims=None, function=None,
incl_weights = True, **kwargs):
"""
applies function to a sliced cube (memory saving)
"""
self.__logger__.info("====================================================")
self.__logger__.info("Running apply_fun2_cube")
# self.__logger__.info(cube)
self.__logger__.info(dims)
self.__logger__.info(kwargs)
self.__logger__.info("still lazy?")
self.__logger__.info(cube.has_lazy_data())
try:
if "time" not in dims:
# if "latitude" in dims:
latlon_list = []
for latlon in cube.slices(["latitude","longitude"]):
# self.__logger__.info(latlon)
latlon.remove_coord("day_of_month")
latlon.remove_coord("day_of_year")
latlon.remove_coord("month_number")
latlon.remove_coord("year")
if incl_weights and "latitude" in dims:
latlon = latlon * self.map_area
latlon.standard_name = cube.standard_name
latlon.long_name = cube.long_name
# self.__logger__.info(latlon)
latlon = latlon.collapsed(dims,
function,
**kwargs)
latlon_list.append(latlon)
cube_list = iris.cube.CubeList(latlon_list)
# self.__logger__.info(cube_list)
# self.__logger__.info([c.coords for c in cube_list])
new_cube = cube_list.merge_cube()
self.__logger__.info("going path one")
# except:
else:
new_cube = cube.collapsed(dims, function, **kwargs)
self.__logger__.info("going path two")
except:
new_cube = cube.collapsed(dims, function, **kwargs)
self.__logger__.info("going path three")
# self.__logger__.info(new_cube)
# self.__logger__.info(function)
self.__logger__.info("still lazy?")
self.__logger__.info(cube.has_lazy_data())
self.__logger__.info("====================================================")
return new_cube
You can call the function like:
new_cube = self.__apply_fun2cube__(cube, dims=["latitude"], function=iris.analysis.MEAN)
You see, I'm still debugging and trying to catch everything that kills the function. It works in this way for 1 coord (supposed to work with all usual dimensions in a cube in ESMValTool) staying lazy. You can call it twice, if latitude goes first, in my tests (this is why I need the except... does not work on scalar coordinates!).
from esmvalcore.
Plus: It does not work for std_dev, but I could not find out why, yet.
from esmvalcore.
hi @BenMGeo good stuff, man! There is an open SciTools/iris gitHub issue SciTools/iris#3129 that @bjlittle and me we talked about last week, you may want to comment on that one so maybe you and the iris guys can work together on it - we will want a solution straight into iris rather than something in esmvaltool 🍺
from esmvalcore.
great stuff! 🍺 I suggest getting in touch with the iris guys via the associated SciTools issue and present them with the solution, mention @bjlittle and myslef please so we can keep track of the implementation, it's best to have it straight in iris than in esmvaltool - but yeah, cool stuff so far!
from esmvalcore.
@bouweandela good stuff - but have you tested the implementation of SciTools/iris#3299 within ESMValTool?
from esmvalcore.
we can now pass lazy weights to collapse operations eg:
import dask.array as da
def simple_area(cube, coord1, coord2):
grid_areas = iris.analysis.cartography.area_weights(cube)
grid_areas = da.array(grid_areas)
result = cube.collapsed([coord1, coord2],
iris.analysis.MEAN,
weights=grid_areas)
return result
and the input data and the output data will be LAZY (iris2.4) but the execution time of the collapse operation with or without weights still differ by a factor of 50 and memory increases by 2-3x
@bjlittle -> any news on the iris front? 🍺
from esmvalcore.
@valeriupredoi Is this still an issue?
from esmvalcore.
I have extensively tested this in SciTools/iris#5341, so this should not be an issue anymore. Please re-open if necessary.
from esmvalcore.
Related Issues (20)
- ERA5 hurs or huss derivation
- Add preprocessor that converts time to local solar time
- Version number is hardcoded in conda badge HOT 4
- Missing height coordinates in MPI-ESM1-2-HR and MPI-ESM1-2-XR HOT 1
- New preprocessor: Distance metrics dataset vs. reference HOT 2
- [Circle CI] run tests container environment just got upgraded to Python=3.12, esmvalcore is not 100% ready for 3.12 HOT 1
- Custom location of `index.html` HOT 3
- If custom location for custom CMOR tables is used, default location is ignored HOT 3
- ~/.esmvaltool/config-user.yml is always read even when --config-file is specified HOT 6
- Remove custom concatenation of derived coords now that bug in iris is fixed
- Update iris pin >=3.6.1 in environment.yml and setup.py
- Missing coordinate for sfcWindmax MPI-ESM1-2-XR and HR
- Missing height 2m coordinates in GFDL-CM4 HOT 1
- Missing height 2m coordinate in KIOST-EMS HOT 1
- Special unit conversion for evaporation
- Performance improvement: recipe_easy_ipcc.yml
- Performance improvement: recipe_extremes_wind_3h.yml
- No explicit logging of supplementary variable file paths
- Adjusting `extract_time` to make year optional HOT 9
- Using `anomalies` preprocessor with the reference period outside of time range of interest HOT 4
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from esmvalcore.