GithubHelp home page GithubHelp logo

NWM2.1 reanalysis question about kerchunk HOT 6 OPEN

fsspec avatar fsspec commented on July 30, 2024
NWM2.1 reanalysis question

from kerchunk.

Comments (6)

martindurant avatar martindurant commented on July 30, 2024
In [14]: fs = fsspec.filesystem("s3", anon=True)

In [17]: fs.head("s3://noaa-nwm-retrospective-2-1-pds/forcing/2007/2007010100.LDASIN_DOMAIN1", 8)
Out[17]: b'CDF\x01\x00\x00\x00\x00'

It appears to be "classic netCDF CDF-1 format" (see here). That would need a separate conversion class; the file format looks simpler, but I don't know if the old CDF libraries will be as convenient. If the chunking remains the same, there would be, in principle, no problem combining the different file formats into a global kerchunked dataset.

@rsignell-usgs , any idea why it looks like the data format became older in 2007? Or was this some sort of HDF5 (not CDF) -> CDF (not HDF) evolution?

from kerchunk.

dialuser avatar dialuser commented on July 30, 2024

Just add some extra observations: All post-2007 NWM2.1 files are not only bigger in size (540mb each), but also have slightly different naming conventions (e.g., 10-digit 2007010100 vs. 12-digit 199602182000).
In any case, I'd appreciate if the people in this forum can help me find a temporary solution. I've spent several days converting pre-2007 files.

from kerchunk.

martindurant avatar martindurant commented on July 30, 2024

I don't anticipate having time to implement a netCDF<4 scanner in the near term, but perhaps someone else has? At a guess, the files are much larger because there is no compression; but maybe the chunking is still the same.

from kerchunk.

rsignell-usgs avatar rsignell-usgs commented on July 30, 2024

@dialuser, also note that the NWM2.1 data is already available in Zarr format from
https://registry.opendata.aws/nwm-archive/
Specifically:

aws s3 ls s3://noaa-nwm-retrospective-2-1-zarr-pds/ --no-sign-request

The rechunking-and-conversion-to-zarr was done by @jmccreight who would likely be able to answer these questions if necessary.

from kerchunk.

jmccreight avatar jmccreight commented on July 30, 2024

@rsignell-usgs Thanks for pinging me here.
@dialuser I changed jobs and had covid, your email fell through the cracks. i was looking for your email recently but could not find it. I had several inquiries on this exact topic, which also confused me.

Thanks for these questions. The answer is that no single conversion process or person produced all the LDASIN files here. I'm not fully up on what was done, but I had some similar (but different) myself when processing the data on NCAR systems.

https://github.com/NCAR/rechunk_retro_nwm_v21/blob/da170bf2af462a4a117ceebc39f751d3ba91ea74/precip/symlink_aorc.py#L18
You can see there are essentially 3 different periods of data with different conventions. (at least it's finite, right?)

I can anecdotally confirm what @martindurant uncovered above

jamesmcc@casper-login2[1017]:/glade/p/cisl/nwc/nwm_forcings/AORC> for ff in $f1 $f2 $f3; do echo $ff: `ncdump -k $ff`;  done
/glade/campaign/ral/hap/zhangyx/AORC.Forcing/2007/200702010000.LDASIN_DOMAIN1: netCDF-4
/glade/p/cisl/nwc/nwm_forcings/AORC/2007020101.LDASIN_DOMAIN1: classic
/glade/p/cisl/nwc/nwm_forcings/AORC/202002010100.LDASIN_DOMAIN1: netCDF-4

I believe that the file size difference is because no compression ("deflate level") is available for classic (against @martindurant pointed out), while _DeflateLevel = 2 is applied for the other, netCDF-4 files (that I looked at). I was surprised to see that there is chunking in the netCDF-4 files: for (time, y, x): _ChunkSizes = 1, 768, 922. It appears there is no chunking in the classic (as far as I can tell).

I dont expect much can be done on the NCAR/NOAA end at this point except to take note that this is a problem. I will connect you with at least one other user who is interested in this data. Perhaps you can collaborate on a solution (I may point them here). It would be nice to see. I honestly did not know that all this forcing data was part of the release, i thought that only the Zarr precip field that I processed was what was released.

from kerchunk.

martindurant avatar martindurant commented on July 30, 2024

Note on this point:

It appears there is no chunking in the classic (as far as I can tell)

if the blocks are not compressed, then, from a kerchunk point of view, we can pick any chunking we like on the biggest dimension (and second-biggest, if we choose a chunksize of 1 for the biggest), so it may still be possible to get consistency across the different file species.

from kerchunk.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.