GithubHelp home page GithubHelp logo

Comments (2)

keewis avatar keewis commented on June 12, 2024 1

(1) sounds good to me, actually, so it's fine to not do this here (and I can just create a PR for GroupBy.agg or something similar).

Edit: basically, let's make use of pydata/xarray#7206

from flox.

dcherian avatar dcherian commented on June 12, 2024

It would be a decent bit of complexity to add, and I'm not inclined to add it.

There would be two advantages:

  1. The data are only factorized once, and the integer codes are reused.
  2. We could drastically reduce the number of tasks in the dask graph at the cost of more complicated code. Number of tasks is reduced because we can maker a single task calculate all the necessary intermediates for all reductions.

I'm not sure (1) is worth it, at least for xarray, because after pydata/xarray#7206, we will get this for free by just calling each individual method on a saved GroupBy object (for xarray).

I'm not sure (2) is worth it for a couple of cases:

  1. It will also mean that to calculate max only you will calculate every other reduction and then discard it.
  2. If you're writing the output to zarr for example, you lose parallelism again.
  3. It could be an advantage to only compute count once and reuse it for count, mean but not sure its worth it. We could get this advantage by instead breaking up the current algo to. compute count and sum separately for mean. Then the dask optimizer will handle the shared count computation for us.

from flox.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.