GithubHelp home page GithubHelp logo

Memory saving output about eko HOT 8 CLOSED

nnpdf avatar nnpdf commented on July 21, 2024
Memory saving output

from eko.

Comments (8)

felixhekhorn avatar felixhekhorn commented on July 21, 2024

separate Q2 storage: it will be {q2}.npy

at first I wanted to point out that we should use the byte representation, but then I thought maybe we could actually use this to introduce an approximation: we could use log(Q2) with a fixed precision (say 4 digits) and this way save some computations ...

merge separately computed (but input compatible) outputs

  • before being able to merge full output objects, we first need to be able to extend existing outputs
  • a simple condition for this is to dump the theory and operators card into the archive
  • such that we can do Output.load("a.tar").get(2.)

support separately threshold operators and partial Q2 in Output

actually we could even join them only at loading time ...

they are also dumped on disk, with their own names: thresholds.npz

I'd rather put them in a separate sub-directory (just to keep them out of the way) - or we could even add them as regular operator, since after all they are regular operators leading to exactly the threshold (in the upper scheme)

from eko.

alecandido avatar alecandido commented on July 21, 2024

at first I wanted to point out that we should use the byte representation, but then I thought maybe we could actually use this to introduce an approximation: we could use log(Q2) with a fixed precision (say 4 digits) and this way save some computations ...

This is fine, but I wonder if Q2 might be simpler than log(Q2): I know the log is more relevant, but the other way it's easier to inspect manually, since you require fewer operations (but we can think about it).

  • before being able to merge full output objects, we first need to be able to extend existing outputs

I was just thinking the other way round: the moment we can merge, we use this to extend.

  • a simple condition for this is to dump the theory and operators card into the archive

I thought it was done, but maybe I'm only doing it for yadism...

actually we could even join them only at loading time ...

This I'm not sure: it's easier, because in order to extend you just need to drop more .npy files into the archive, but might be expensive. We can try to benchmark how much it takes, if it's negligible we can even do, but if the product is expensive, I would precompute it (or perhaps JIT, the first time you do it, you dump the product, unless you call .compile() or something like, and you do it AOT for all).

I'd rather put them in a separate sub-directory (just to keep them out of the way) - or we could even add them as regular operator, since after all they are regular operators leading to exactly the threshold (in the upper scheme)

I was thinking to store patches and matching separately, but maybe you're right and there is no purpose. If we store them as regular operators, we should mark them as thresholds in the metadata.

from eko.

alecandido avatar alecandido commented on July 21, 2024

we have a detailed plan for the output:

Plan

The folder will contain:

  • metadata.yaml, containing output metadata (already present)
  • runcards folder, containing all the runcards needed to reproduce the output
    • in case of products and later manipulations, all the involved runcards are saved to this folder, and the full history to reproduce is written in metadata.yaml
  • recipes folder: this will contain brief YAML files with the recipes for jobs computing parts
  • parts folder: containing all the partial outputs that have to be Mellin integrated
    • partial patches
    • matching conditions
  • operators folder: containing the final results, after combining the parts
    • the files name will be binary representation of the Q2 floats + extension (all files here will be valid evolution operators)

from eko.

alecandido avatar alecandido commented on July 21, 2024

This is also more long term than #138 at this point: the core part of the memory structure has been done in #105, while in #138 the computation will be faced (that will include OperatorGrid removal and split threshold operators).

Every other feature here is considered a "nice to have", but no more.

from eko.

alecandido avatar alecandido commented on July 21, 2024

@felixhekhorn the part strictly addressing the title is already implemented, and most of the rest will be implemented as a consequence of #138

The only two elements that are falling outside #138 and contained here are:

  • merge separately computed (but input compatible) outputs
  • split a single output into multiple ones
    • not strictly needed, but it's dual to the former one, and so nice to have

Should we keep this issue for them?

from eko.

felixhekhorn avatar felixhekhorn commented on July 21, 2024

I still consider the items relevant - maybe we can put them into a new issue (with a clearer title)?

from eko.

alecandido avatar alecandido commented on July 21, 2024

Yes, maybe that's the way. If you open the new issue, feel free to close this one, otherwise I'll do at some point.

from eko.

felixhekhorn avatar felixhekhorn commented on July 21, 2024

Closed in favor of #193

from eko.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.