Hi, I'm new to kerchunk. I followed the example (see below) to convert the NWM S3Files

<div class="snippet-clipboard-content notranslate position-relative overflow-auto" data-snippet-clip

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

NWM2.1 reanalysis question about kerchunk HOT 6 OPEN

fsspec commented on July 30, 2024

NWM2.1 reanalysis question

from kerchunk.

Comments (6)

martindurant commented on July 30, 2024

In [14]: fs = fsspec.filesystem("s3", anon=True)

In [17]: fs.head("s3://noaa-nwm-retrospective-2-1-pds/forcing/2007/2007010100.LDASIN_DOMAIN1", 8)
Out[17]: b'CDF\x01\x00\x00\x00\x00'

It appears to be "classic netCDF CDF-1 format" (see here). That would need a separate conversion class; the file format looks simpler, but I don't know if the old CDF libraries will be as convenient. If the chunking remains the same, there would be, in principle, no problem combining the different file formats into a global kerchunked dataset.

@rsignell-usgs , any idea why it looks like the data format became older in 2007? Or was this some sort of HDF5 (not CDF) -> CDF (not HDF) evolution?

from kerchunk.

dialuser commented on July 30, 2024

Just add some extra observations: All post-2007 NWM2.1 files are not only bigger in size (540mb each), but also have slightly different naming conventions (e.g., 10-digit 2007010100 vs. 12-digit 199602182000).
In any case, I'd appreciate if the people in this forum can help me find a temporary solution. I've spent several days converting pre-2007 files.

from kerchunk.

martindurant commented on July 30, 2024

I don't anticipate having time to implement a netCDF<4 scanner in the near term, but perhaps someone else has? At a guess, the files are much larger because there is no compression; but maybe the chunking is still the same.

from kerchunk.

rsignell-usgs commented on July 30, 2024

@dialuser, also note that the NWM2.1 data is already available in Zarr format from
https://registry.opendata.aws/nwm-archive/
Specifically:

aws s3 ls s3://noaa-nwm-retrospective-2-1-zarr-pds/ --no-sign-request

The rechunking-and-conversion-to-zarr was done by @jmccreight who would likely be able to answer these questions if necessary.

from kerchunk.

jmccreight commented on July 30, 2024

@rsignell-usgs Thanks for pinging me here.
@dialuser I changed jobs and had covid, your email fell through the cracks. i was looking for your email recently but could not find it. I had several inquiries on this exact topic, which also confused me.

Thanks for these questions. The answer is that no single conversion process or person produced all the LDASIN files here. I'm not fully up on what was done, but I had some similar (but different) myself when processing the data on NCAR systems.

https://github.com/NCAR/rechunk_retro_nwm_v21/blob/da170bf2af462a4a117ceebc39f751d3ba91ea74/precip/symlink_aorc.py#L18
You can see there are essentially 3 different periods of data with different conventions. (at least it's finite, right?)

I can anecdotally confirm what @martindurant uncovered above

jamesmcc@casper-login2[1017]:/glade/p/cisl/nwc/nwm_forcings/AORC> for ff in $f1 $f2 $f3; do echo $ff: `ncdump -k $ff`;  done
/glade/campaign/ral/hap/zhangyx/AORC.Forcing/2007/200702010000.LDASIN_DOMAIN1: netCDF-4
/glade/p/cisl/nwc/nwm_forcings/AORC/2007020101.LDASIN_DOMAIN1: classic
/glade/p/cisl/nwc/nwm_forcings/AORC/202002010100.LDASIN_DOMAIN1: netCDF-4

I believe that the file size difference is because no compression ("deflate level") is available for classic (against @martindurant pointed out), while _DeflateLevel = 2 is applied for the other, netCDF-4 files (that I looked at). I was surprised to see that there is chunking in the netCDF-4 files: for (time, y, x): _ChunkSizes = 1, 768, 922. It appears there is no chunking in the classic (as far as I can tell).

I dont expect much can be done on the NCAR/NOAA end at this point except to take note that this is a problem. I will connect you with at least one other user who is interested in this data. Perhaps you can collaborate on a solution (I may point them here). It would be nice to see. I honestly did not know that all this forcing data was part of the release, i thought that only the Zarr precip field that I processed was what was released.

from kerchunk.

martindurant commented on July 30, 2024

Note on this point:

It appears there is no chunking in the classic (as far as I can tell)

if the blocks are not compressed, then, from a kerchunk point of view, we can pick any chunking we like on the biggest dimension (and second-biggest, if we choose a chunksize of 1 for the biggest), so it may still be possible to get consistency across the different file species.

from kerchunk.

NWM2.1 reanalysis question about kerchunk HOT 6 OPEN

Comments (6)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs