GithubHelp home page GithubHelp logo

Comments (27)

martindurant avatar martindurant commented on July 30, 2024 1

OK, so the task is to find out why ._get_array_dims failed in this case. Perhaps this is because the file isn't one netCDF, but several netCDFs stored in the hierarchy - I think this is the first such example.

I would breakpoint in ._get_array_dims to figure out why ["phony_dim_0", "phony_dim_1"] are not being found.

from kerchunk.

martindurant avatar martindurant commented on July 30, 2024 1

I think you are checking for the case when there are dimensions (i.e., a non-empty shape), but _get_array_dims doesn't populate any names at all.

from kerchunk.

martindurant avatar martindurant commented on July 30, 2024

To investigate, you might want to start with fs.ls(...) to figure out what files you can see, and for metadata ones (".zarray", ...) look at their contents. Maybe check you can access some of the data paths.

from kerchunk.

lsterzinger avatar lsterzinger commented on July 30, 2024

I'm able to see all the files stored for a variable (FireMask) in the HDFEOS/GRIDS/VNP14A1_Grid/Data Fields group.

fs1.ls("HDFEOS/GRIDS/VNP14A1_Grid/Data Fields/FireMask")
[{'name': 'HDFEOS/GRIDS/VNP14A1_Grid/Data Fields/FireMask/.zarray',
  'type': 'file',
  'size': 269},
 {'name': 'HDFEOS/GRIDS/VNP14A1_Grid/Data Fields/FireMask/.zattrs',
  'type': 'file',
  'size': 345},
 {'name': 'HDFEOS/GRIDS/VNP14A1_Grid/Data Fields/FireMask/0.0',
  'type': 'file',
  'size': 5206},
 {'name': 'HDFEOS/GRIDS/VNP14A1_Grid/Data Fields/FireMask/1.0',
  'type': 'file',
  'size': 4805},
 {'name': 'HDFEOS/GRIDS/VNP14A1_Grid/Data Fields/FireMask/2.0',
  'type': 'file',
  'size': 3385},
 {'name': 'HDFEOS/GRIDS/VNP14A1_Grid/Data Fields/FireMask/3.0',
  'type': 'file',
  'size': 2918},
 {'name': 'HDFEOS/GRIDS/VNP14A1_Grid/Data Fields/FireMask/4.0',
  'type': 'file',
  'size': 3302},
 {'name': 'HDFEOS/GRIDS/VNP14A1_Grid/Data Fields/FireMask/5.0',
  'type': 'file',
  'size': 2643},
 {'name': 'HDFEOS/GRIDS/VNP14A1_Grid/Data Fields/FireMask/6.0',
  'type': 'file',
  'size': 2558},
 {'name': 'HDFEOS/GRIDS/VNP14A1_Grid/Data Fields/FireMask/7.0',
  'type': 'file',
  'size': 4857},
 {'name': 'HDFEOS/GRIDS/VNP14A1_Grid/Data Fields/FireMask/8.0',
  'type': 'file',
  'size': 5784},
 {'name': 'HDFEOS/GRIDS/VNP14A1_Grid/Data Fields/FireMask/9.0',
  'type': 'file',
  'size': 5196},
 {'name': 'HDFEOS/GRIDS/VNP14A1_Grid/Data Fields/FireMask/10.0',
  'type': 'file',
  'size': 7293},
 {'name': 'HDFEOS/GRIDS/VNP14A1_Grid/Data Fields/FireMask/11.0',
  'type': 'file',
  'size': 6792},
 {'name': 'HDFEOS/GRIDS/VNP14A1_Grid/Data Fields/FireMask/12.0',
  'type': 'file',
  'size': 5104},
 {'name': 'HDFEOS/GRIDS/VNP14A1_Grid/Data Fields/FireMask/13.0',
  'type': 'file',
  'size': 3153},
 {'name': 'HDFEOS/GRIDS/VNP14A1_Grid/Data Fields/FireMask/14.0',
  'type': 'file',
  'size': 5435}]

Looking at the contents of .zarray, I get:

{'chunks': [80, 1200],
 'compressor': {'id': 'zlib', 'level': 4},
 'dtype': '|u1',
 'fill_value': 0,
 'filters': None,
 'order': 'C',
 'shape': [1200, 1200],
 'zarr_format': 2}

So I don't see any dimension info, but I'm also not sure what's supposed to be in this metadata and what isn't.

from kerchunk.

ajelenak avatar ajelenak commented on July 30, 2024

This is an HDF-EOS HDF5 file (https://earthdata.nasa.gov/esdis/eso/standards-and-references/hdf-eos5) and it's content is not necessarily compatible with xarray's model. Have you already used such files with xarray before?

from kerchunk.

lsterzinger avatar lsterzinger commented on July 30, 2024

Not extensively, but I was able to open the file locally with xarray and as long as I specified the group the data variables were in, it seemed to work fine

from kerchunk.

martindurant avatar martindurant commented on July 30, 2024

from kerchunk.

martindurant avatar martindurant commented on July 30, 2024

(btw: the dimensions info is in the .zattrs file)

from kerchunk.

lsterzinger avatar lsterzinger commented on July 30, 2024

I notice you are giving a remote protocol (az) but the URLs are to a local file.

What URL are you referring to? The JSON specifies az://modis-006/VNP14A1/08/04/2020001/VNP14A1.A2020001.h08v04.001.2020003132203.h5.

I included a local file to show that xarray can open the file fine locally. I think I might be misunderstanding something?


Does it work with protocol "file".

If I give remote_protocol="file" I get the same dimension error.


(btw: the dimensions info is in the .zattrs file)

fs1.cat("HDFEOS/GRIDS/VNP14A1_Grid/Data Fields/FireMask/.zattrs") yields:

{'_ARRAY_DIMENSIONS': [],
 'legend': 'Classes:\n0 missing input data\n1 not processed (trim)\n2 not processed (obsolete)\n3 non-fire water\n4 cloud\n5 non-fire land\n6 unknown\n7 fire (low confidence)\n8 fire (nominal confidence)\n9 fire (high confidence)',
 'long_name': 'fire mask',
 'valid_range': [0, 9]}

from kerchunk.

martindurant avatar martindurant commented on July 30, 2024

'_ARRAY_DIMENSIONS': [] - this is clearly wrong! Can you find the same variable in the original and see what the dimensions ought to be?

from kerchunk.

lsterzinger avatar lsterzinger commented on July 30, 2024

Haha yeah that's the problem.

The dimensions of the variable are (1200, 1200)

Opening the file directly with xarray yields

print(xr.open_dataset("./VNP14A1.A2020001.h08v04.001.2020003132203.h5", group="HDFEOS/GRIDS/VNP14A1_Grid/Data Fields"))
<xarray.Dataset>
Dimensions:   (phony_dim_0: 1200, phony_dim_1: 1200)
Dimensions without coordinates: phony_dim_0, phony_dim_1
Data variables:
    FireMask  (phony_dim_0, phony_dim_1) uint8 ...
    MaxFRP    (phony_dim_0, phony_dim_1) float64 ...
    QA        (phony_dim_0, phony_dim_1) uint8 ...
    sample    (phony_dim_0, phony_dim_1) float32 ...

from kerchunk.

lsterzinger avatar lsterzinger commented on July 30, 2024

The culprit is this line
https://github.com/intake/fsspec-reference-maker/blob/67ccf7111709707d2643458ddc47872ab6d768c4/fsspec_reference_maker/hdf.py#L223

While earlier it does correctly get rank=2 since dset.shape returns (1200,1200), len(dset.dims) returns 0 since it seems like each dim is not iterable.

Part of the problem I think is that HDF5 file do not have named dimensions. Some engines like netCDF4 will name the dimensions phony_dim_x but it doesn't appear as if that's happening here.

from kerchunk.

lsterzinger avatar lsterzinger commented on July 30, 2024

Well, I should say the culprit is not really just that line. Even if it did return num_scales=2, it would still fail on
https://github.com/intake/fsspec-reference-maker/blob/67ccf7111709707d2643458ddc47872ab6d768c4/fsspec_reference_maker/hdf.py#L225

since dset.dims[0][0] returns

Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
  File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
  File "/Users/lucass/anaconda3/lib/python3.8/site-packages/h5py/_hl/dims.py", line 74, in __getitem__
    h5ds.iterate(self._id, self._dimension, scales.append, 0)
  File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
  File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
  File "h5py/h5ds.pyx", line 167, in h5py.h5ds.iterate
  File "h5py/defs.pyx", line 4300, in h5py.defs.H5DSiterate_scales
RuntimeError: Unspecified error in H5DSiterate_scales (return value <0)

from kerchunk.

martindurant avatar martindurant commented on July 30, 2024

@ajelenak , perhaps you have an opinion on this too?

from kerchunk.

martindurant avatar martindurant commented on July 30, 2024

Can you please share the original data file on anywhere-but-azure?

from kerchunk.

ajelenak avatar ajelenak commented on July 30, 2024

The HDF5 file does not have dimension scales which is the mechanism to generate the array dimension info in the reference JSON. That's why the h5py error, somehow to code was forced to execute on an HDF5 dataset without dimension scales.

The quick fix is to add ability to generate phony dims so xarray can handle this file. However, information about coordinates is available in HDF-EOS5 files, like this one, but don't know if that is currently supported by xarray.

HDF created a command-line tool that makes HDF-EOS5 files netCDF-friendly but this may not be a workable option here.

from kerchunk.

martindurant avatar martindurant commented on July 30, 2024

Thank, @ajelenak .
@lsterzinger , when xarray opens the original data, it was able to infer the coordinates here, right? It has a lot of hidden magic inside, of course, but maybe we can find out how that happens.

from kerchunk.

lsterzinger avatar lsterzinger commented on July 30, 2024

Like @ajelenak said, I believe the netCDF4 engine (which I think xarray uses by default) adds phony dims to an HDF5 file. If you do a ncdump -h on a file, those phony dims will also be there.

from kerchunk.

martindurant avatar martindurant commented on July 30, 2024

So can we just put '_ARRAY_DIMENSIONS': ["phony_dim_x", ...]? Perhaps you could edit the JSON to see if this allows xarray to proceed.

from kerchunk.

ajelenak avatar ajelenak commented on July 30, 2024

the netCDF4 engine (which I think xarray uses by default) adds phony dims to an HDF5 file. If you do a ncdump -h on a file, those phony dims will also be there.

Just to clarify: What ncdump shows in case of netCDF-4 (HDF5) files is a view of the file content, interpreted according to the netCDF data model. So those phony dims are not added to the file.

from kerchunk.

martindurant avatar martindurant commented on July 30, 2024

@ajelenak , I think that's what @lsterzinger means :)

from kerchunk.

lsterzinger avatar lsterzinger commented on July 30, 2024

Yeah bad wording on my part. Nothing is added to the file of course, but ncdump/netcdf engine interpret the file according to the netcdf spec and returns phony dims accordingly

from kerchunk.

lsterzinger avatar lsterzinger commented on July 30, 2024

So can we just put '_ARRAY_DIMENSIONS': ["phony_dim_x", ...]? Perhaps you could edit the JSON to see if this allows xarray to proceed.

@martindurant Yes, this works. I replaced all instances of '_ARRAY_DIMENSIONS' with the added phony dims (one for each variable) and I was able to open the remote file with xarray and engine='zarr'.

from kerchunk.

martindurant avatar martindurant commented on July 30, 2024

Well OK then!
So we need to pick these from a predefined list of, what, up to five labels (x, y, z, t, ...), in the case that that labels don't get generated directly by h5py.

from kerchunk.

lsterzinger avatar lsterzinger commented on July 30, 2024

I can attempt to have _get_array_dims add these dimensions manually, but I'm not sure what the best way for it to determine whether it's needed or not is. The dset.dims are different between a netcdf and hdf5 file so I think checking that first might be the best way forward

from kerchunk.

lsterzinger avatar lsterzinger commented on July 30, 2024

So we need to pick these from a predefined list of, what, up to five labels (x, y, z, t, ...), in the case that that labels don't get generated directly by h5py.

This sounds like it would work well, but it would make assumptions about what is x,y,z etc. Maybe just phony_dim_0/phone_dim_1 like what netcdf/ncdump currently do?

from kerchunk.

martindurant avatar martindurant commented on July 30, 2024

Maybe just phony_dim_0/phone_dim_1

Sure.

from kerchunk.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.