GithubHelp home page GithubHelp logo

Comments (13)

BenMGeo avatar BenMGeo commented on July 24, 2024 1

It works now for STD_DEV, but of course, the weighted standard deviation is not the standard deviation of the weighted values (doh).

I can provide you with a clean function of the above for having a weighted mean, though.

I do not see any other solution to this than writing actual iris aggregators with - based on the iris version we use - xarray or daskarray supported lazy functions yourself.

I'll have to deep dive into the writing of custom aggregators and the packages xarray and dask to provide these aggregators to our codes. Whenever I got there, I'll have them proposed to iris.

from esmvalcore.

valeriupredoi avatar valeriupredoi commented on July 24, 2024

OK - I found something very useful: here is a little function:

def simple_area(cube, coord1, coord2):
    grid_areas = iris.analysis.cartography.area_weights(cube)
    print('LAZY', cube.has_lazy_data())
    result = cube.collapsed([coord1, coord2],
                            iris.analysis.MEAN)
                            weights=grid_areas)
    print('LAZY', result.has_lazy_data())
    return result

if you run this on a 500MB file with weights in it will chuck in a max mem=1.3GB, if you don't apply the weights it will be blindingly fast and max mem=20MB. Note that with weights in the result cube will NOT have lazy data - @bjlittle why is it that using weights as keyword arg for collapsed data gets realized and as such, performance drops like a rock?

from esmvalcore.

valeriupredoi avatar valeriupredoi commented on July 24, 2024

argh! I see now - in cube.py when passing weights as kwarg the aggregation is not lazy anymore 😢

from esmvalcore.

valeriupredoi avatar valeriupredoi commented on July 24, 2024

OK - good resources:
SciTools/iris#3280
SciTools/iris#2418

@ledm how important are those weights in the area average calculations? ie is it something that alters the results by a significant margin that justifies their use?

from esmvalcore.

valeriupredoi avatar valeriupredoi commented on July 24, 2024

ah nevermind! Lack of Area weights may introduce biases of avg 10degs for temperature (just tested with/without weights)

from esmvalcore.

BenMGeo avatar BenMGeo commented on July 24, 2024

I'm currently trying to resolve this in c3s_511:

A code snippet to work around weighting memory while increasing cpu time is:

def __apply_fun2cube__(self, cube, dims=None, function=None,
                           incl_weights = True, **kwargs):
        """
        applies function to a sliced cube (memory saving)
        """
        self.__logger__.info("====================================================")
        self.__logger__.info("Running apply_fun2_cube")
#        self.__logger__.info(cube)
        self.__logger__.info(dims)
        self.__logger__.info(kwargs)
        self.__logger__.info("still lazy?")
        self.__logger__.info(cube.has_lazy_data())
        
        try:
            if "time" not in dims:
    #        if "latitude" in dims:
                latlon_list = []
                
                for latlon in cube.slices(["latitude","longitude"]):
                    
    #                self.__logger__.info(latlon)
                    
                    latlon.remove_coord("day_of_month")
                    latlon.remove_coord("day_of_year")
                    latlon.remove_coord("month_number")
                    latlon.remove_coord("year")
                    
                    if incl_weights and "latitude" in dims:
                        latlon = latlon * self.map_area
                        latlon.standard_name = cube.standard_name
                        latlon.long_name = cube.long_name
                        
    #                self.__logger__.info(latlon)
    
                    latlon = latlon.collapsed(dims,
                                              function,
                                              **kwargs)
                    latlon_list.append(latlon)
            
                cube_list = iris.cube.CubeList(latlon_list)
    #            self.__logger__.info(cube_list)
    #            self.__logger__.info([c.coords for c in cube_list])
                
                new_cube = cube_list.merge_cube()
                
                self.__logger__.info("going path one")
        
#        except: 
            else:
                new_cube = cube.collapsed(dims, function, **kwargs)
                self.__logger__.info("going path two")
        except:
            new_cube = cube.collapsed(dims, function, **kwargs)
            self.__logger__.info("going path three")
            
#        self.__logger__.info(new_cube)
            
#        self.__logger__.info(function)
        self.__logger__.info("still lazy?")
        self.__logger__.info(cube.has_lazy_data())
        self.__logger__.info("====================================================")
        
        return new_cube

You can call the function like:

new_cube = self.__apply_fun2cube__(cube, dims=["latitude"], function=iris.analysis.MEAN)

You see, I'm still debugging and trying to catch everything that kills the function. It works in this way for 1 coord (supposed to work with all usual dimensions in a cube in ESMValTool) staying lazy. You can call it twice, if latitude goes first, in my tests (this is why I need the except... does not work on scalar coordinates!).

from esmvalcore.

BenMGeo avatar BenMGeo commented on July 24, 2024

Plus: It does not work for std_dev, but I could not find out why, yet.

from esmvalcore.

valeriupredoi avatar valeriupredoi commented on July 24, 2024

hi @BenMGeo good stuff, man! There is an open SciTools/iris gitHub issue SciTools/iris#3129 that @bjlittle and me we talked about last week, you may want to comment on that one so maybe you and the iris guys can work together on it - we will want a solution straight into iris rather than something in esmvaltool 🍺

from esmvalcore.

valeriupredoi avatar valeriupredoi commented on July 24, 2024

great stuff! 🍺 I suggest getting in touch with the iris guys via the associated SciTools issue and present them with the solution, mention @bjlittle and myslef please so we can keep track of the implementation, it's best to have it straight in iris than in esmvaltool - but yeah, cool stuff so far!

from esmvalcore.

valeriupredoi avatar valeriupredoi commented on July 24, 2024

@bouweandela good stuff - but have you tested the implementation of SciTools/iris#3299 within ESMValTool?

from esmvalcore.

valeriupredoi avatar valeriupredoi commented on July 24, 2024

we can now pass lazy weights to collapse operations eg:

import dask.array as da
def simple_area(cube, coord1, coord2):
    grid_areas = iris.analysis.cartography.area_weights(cube)
    grid_areas = da.array(grid_areas)
    result = cube.collapsed([coord1, coord2],
                            iris.analysis.MEAN,
                            weights=grid_areas)
    return result

and the input data and the output data will be LAZY (iris2.4) but the execution time of the collapse operation with or without weights still differ by a factor of 50 and memory increases by 2-3x
@bjlittle -> any news on the iris front? 🍺

from esmvalcore.

bouweandela avatar bouweandela commented on July 24, 2024

@valeriupredoi Is this still an issue?

from esmvalcore.

schlunma avatar schlunma commented on July 24, 2024

I have extensively tested this in SciTools/iris#5341, so this should not be an issue anymore. Please re-open if necessary.

from esmvalcore.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.