GithubHelp home page GithubHelp logo

Comments (6)

martindurant avatar martindurant commented on July 30, 2024

At the combine stage, we allow a preprocessor argument to modify datasets. We could have something similar in the individual files; having said that, loading the produces JSON and removing the offending tags should not be too difficult.

from kerchunk.

keewis avatar keewis commented on July 30, 2024

sure, postprocessing the individual files is not too difficult, and I'll do that for now.

However, since the files I investigated are from multiple sources (I think?) it might be good to also catch this here. Am I correct in assuming that these values should not be out-of-sync?

from kerchunk.

lsterzinger avatar lsterzinger commented on July 30, 2024

@cgentemann this seems similar to the problem you were having a while back with the fill values, did you ever figure that out?

from kerchunk.

martindurant avatar martindurant commented on July 30, 2024

I have a feeling that, in general, there can be many things that are wrong/inconsistent in original data files! I don't know that we can cover them all, but perhaps we can auto-correct common issues. The ability to add custom processing is more powerful, however, and I suspect that many datasets will require some form of custom processing. In fact, I think that kerchunk workflows (in pangeo-forge or not) is an opportunity to apply those things so that users don't have to.

from kerchunk.

keewis avatar keewis commented on July 30, 2024

right, that makes sense. I was thinking that rather than postprocessing the output of .translate() it would be much easier to postprocess with the variable's zarr object, but of course the details of hooks like that are tricky to get right.

for reference, here's my hacky postprocessing function
def correct_fill_values(data):
    def fix_variable(values):
        zattrs = values[".zattrs"]

        if "_FillValue" not in zattrs:
            return values

        _FillValue = zattrs["_FillValue"]
        if values[".zarray"]["fill_value"] != _FillValue:
            values[".zarray"]["fill_value"] = _FillValue

        return values

    refs = data["refs"]
    prepared = (
        (tuple(key.split("/")), value) for key, value in refs.items() if "/" in key
    )
    filtered = (
        (key, ujson.loads(value))
        for key, value in prepared
        if key[1] in (".zattrs", ".zarray")
    )
    key = lambda i: i[0][0]
    grouped = (
        (name, {n[1]: v for n, v in group})
        for name, group in itertools.groupby(sorted(filtered, key=key), key=key)
    )
    fixed = ((name, fix_variable(var)) for name, var in grouped)
    flattened = {
        f"{name}/{item}": ujson.dumps(data, indent=4)
        for name, var in fixed
        for item, data in var.items()
    }
    data["refs"] = dict(sorted((refs | flattened).items()))
    return data

from kerchunk.

martindurant avatar martindurant commented on July 30, 2024

it would be much easier to postprocess with the variable's zarr object

Yes, totally agree with this too. We could have a bunch of optional processing functions. As you say, it's tricky, because (for instance) processing the zarr object might not be quite the same as processing the xarray view of the same thing.

from kerchunk.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.