GithubHelp home page GithubHelp logo

Comments (9)

s-kuberski avatar s-kuberski commented on September 28, 2024

Now, that the output for the correlator class is final, there are no more things to add to the format, right? At least from the current perspective? In this case, I'll try to come up with a good way to document the format.

from pyerrors.

fjosw avatar fjosw commented on September 28, 2024

I agree. I would also suggest to bump the format version to 1.0.

from pyerrors.

s-kuberski avatar s-kuberski commented on September 28, 2024

I have by now implemented another feature that we have talked about, but I am not sure if it should enter the main branch or if I am the only one who would find this neat:
I have written routines to export and import python dictionaries to JSON. These are implemented as wrappers around load_json and dump_to_json. I export by:

  • Browsing through the dictionary
  • Replacing all supported structures (Obs, Corr, np.ndarray, list) by placeholders and
  • adding the structure to a list
  • Afterwards, the list is passed to dump_to_json where
  • the dictionary with the placeholders is put into the description of the file. (Which becomes a dictionary with two keys: the placeholder-dict and the original description)

On import, the same thing is done in reverse. The dict may contain anything that is valid in JSON (number, string, boolean, list, dict, None) as well as the above defined supported structures.

So-far, my only issue is that am not able to tell the input routine, that the file is a parsed dictionary. If the JSON file is imported using load_json, you get a list with all the structures (and the description, if you choose full_output) instead of a dict.

I am not sure, if it is wise to have this feature: In my workflow, it would certainly help to switch from pickles to JSON in intermediate steps. Globally, this output could become confusing, when large dicts are written to JSON. However, the placeholder-dict in the description could actually pretty helpful in understanding what is saved in this file and it is created automatically, so the user would not have to write this description him/herself, if the keys of the dictionary is self-explanatory.
@fjosw , @JanNeuendorf what do you think? I could create a pull request, as soon as everything is tested.

from pyerrors.

fjosw avatar fjosw commented on September 28, 2024

My feeling is that relational databases like sqlite are better suited for what you want to achieve but I have no objections if you want to add this functionality. My concerns are that this bloats up the specification of the format and that nested code is not easy to maintain.

from pyerrors.

s-kuberski avatar s-kuberski commented on September 28, 2024

If one does not have such database (they don't run well on lustre!), it could help, though. But I understand your concerns.
I tried to use the format, as it is, such that I don't have to change anything in its specification. This leads to the fact, that one has to know whether to parse the JSON file as list or dict - which is not really nice and prohibits to use the wrapper for reading a random JSON file and deciding if it is a dictionary or not (you could always read the file using load_json and get a valid list of structs). This would need an additional keyword and I don't want to add it to the format, because dicts are somewhat specific.
I'll think a bit more and then I'll propose an implementation.

from pyerrors.

fjosw avatar fjosw commented on September 28, 2024

In the meantime I implemented io routines for the json.gz format in a separate julia package: https://github.com/fjosw/ADjson.jl

from pyerrors.

s-kuberski avatar s-kuberski commented on September 28, 2024

In the meantime I implemented io routines for the json.gz format in a separate julia package: https://github.com/fjosw/ADjson.jl

This is a very good solution, thank you!

from pyerrors.

s-kuberski avatar s-kuberski commented on September 28, 2024

I am still working on the documentation. Meanwhile, I am still not satisfied with the performance for a large number of Obses. When I replace

jsonstring = ''
for chunk in my_encoder(indent=indent, ensure_ascii=False).iterencode(d):
jsonstring += chunk

by

jsonstring = ''.join(chunk for chunk in my_encoder(indent=indent, ensure_ascii=False).iterencode(d))

the memory consumption increases again (by about 40%), but the time that is needed for the string creating decreases significantly (such that about 60% of the total time is spent in create_json_string and the rest is needed for writing the file to disk. [As opposed to 84% vs. 16% - this most likely depends on the size of the file/string]

It is probably more important to have a fast routine than one that needs the minimal amount of memory. Instead of juggling around with our own code, we could resort to faster json implementations such as https://github.com/ijl/orjson or https://github.com/ultrajson/ultrajson . These seem, in general, to be suited to our needs, apart from one aspect:

When using an indent for writing files, all of the standard implementations split up multi-dimensional lists, such that one line is written for each element of deltas. For a correlator based on N configs and T time slices, the file has then N*(T+1) many lines, instead of N many, as in my current implementation. The fast packages do not support to change the encoder since these are precompiled c routines. These leaves several possibilities:

  • Use a faster package. This would, in principle, still support indentation, but a visual inspection of a file is made much more difficult, because there are many more lines. One would have to see, which packages is most compatible to @fjosw needs to have the string in memory, instead of writing directly to a file.
  • Stay with the current setup (or some deviation that still uses the json module but is tuned such that speedup and memory requirements are well balanced for the use case). The visual inspection of files (if an indent is used) is easier.
  • Use some kind of mixture, where the slower setup is chosen if indentation is needed - I am not in favor of this.

It kind of boils down to the question, whether we think that it is important to be able to write the files such that they are not written to a single line. Maybe we could think about this to fix the behavior in a possible release candidate.

from pyerrors.

fjosw avatar fjosw commented on September 28, 2024

I have to say that I never ran in any issues with creating json strings or writing out the corresponding files so I would leave that totally up to you. I don't have a strong opinion on the indentation. As long as the library can still produce a python string containing the json output I would not have to alter my current workflow.

from pyerrors.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.