Comments (22)
Do you mind trying with main branch? We've been trying to nail down the thorny time types.
from kerchunk.
I'll give it a shot.
from kerchunk.
I just tested this on main on my system and got the same result as @abkfenris
from kerchunk.
Yep, still does the same thing on main.
from kerchunk.
So the date that should be 2021-08-01
is 2065-03-01
, which is an offset of 15918 days.
The original timestamp of the .nc
file is days since 1978-01-01 12:00:00
. Coincidentally, that date is also 15918 days from 2021-08-01
.
So it seems like MultiZarrToZarr
is taking the day offset and applying it to the first datetime in the first data file, instead of applying it to 1978-01-01
from kerchunk.
@lsterzinger do you have time/interest to debug? First, I would set debug logging (e.g., fsspec.utils.setup_logging(logger_name="reference-combine")
) and then second set a breakpoint in _build_output where the cftime stuff is to see how the numbers get manipulated.
from kerchunk.
I do have interest in debugging, but my time is a bit limited these days. I might have time this afternoon/weekend to play around with things and see what's going on.
from kerchunk.
Turns out fixing this is much more fun than anything else I have to do today 😉
There was a missing calendar attribute that caused the datetime building to fail. I tested it on my end and it seems to work. @abkfenris can you try out the change in #75 and see if that works for you?
from kerchunk.
Hmm, I still appear to be seeing that offset when trying with your branch.
from kerchunk.
Did you regenerate the .json
files? I copy pasted your generation script directly and re-generated the files (I had to comment out the *_preliminary.nc
files because of a 404 error). I attached a zip of my combined.json
for reference.
Make sure you're actually running on the branch
import fsspec_reference_maker
print(fsspec_reference_maker.__version__)
Should result in 0.0.2+3.gcdb6528
from kerchunk.
Your's does open correctly.
I did regenerate the json after rebuilding my environment on the branch and it still gives dates in 2065.
20210819 is no longer preliminary which caused that failure, so removing the _preliminary
from that URL should include it.
>>> import fsspec_reference_maker
>>> fsspec_reference_maker.__version__
'0+untagged.174.gcdb6528'
Here's the generated combined.json.zip and the Dockerfile, environment.yml, and test script: environment_and_test_script.zip
from kerchunk.
@abkfenris I don't think it will make a difference, but I did push another change to that branch. Can you try again?
from kerchunk.
I'm still getting the same result with 5072d61
from kerchunk.
That's super weird. I'm also on 5072d61, and I cloned your environment directly, and I get
from kerchunk.
Aha, I didn't have cftime in my environment, so it wasn't executing the code your branch changed, it was instead executing https://github.com/intake/fsspec-reference-maker/pull/75/files#diff-850b631beff65d5bd4abca60a56ef8308e345e5626bbb8a526f15d31c33a752bL192-L194 .
And now with cftime, comparing your branch to the released version, your branch does fix the time offset.
Awesome, thank you!
from kerchunk.
Great to hear!
@martindurant It seems like it's not good that it silently fails in this way if cftime is not installed, meaning this code does not parse the dates correctly
https://github.com/intake/fsspec-reference-maker/blob/5072d614cbb6cfa0f497dece422a953c7c4812ab/fsspec_reference_maker/combine.py#L205-L207
Thoughts?
from kerchunk.
It's a fair point, but I can't think of another way to say "see if this converts as times" (because most coordinates are not time, but it just so happens that all our examples are time series).
Note that @rabernat says we should just rely on xarray, but I haven't figured out yet how (because we have zarr arrays, not xarray datasets).
from kerchunk.
Is that Exception
be catching both ImportError
when cftime is not available and it looks like ValueError
when num2pydate
can't convert a date?
Maybe throw a warning in the first case, and use the existing handling otherwise?
from kerchunk.
from kerchunk.
@abkfenris , that would e OK - except it may get annoying for those that have no idea what cftime is :)
from kerchunk.
If I'm understanding warning filters right, warnings.simplefilter('ignore:::fsspec_reference_maker,default')
would help quiet things down in that case.
from kerchunk.
Note that @rabernat says we should just rely on xarray, but I haven't figured out yet how (because we have zarr arrays, not xarray datasets).
I have added comments to try to help with this: #70 (comment)
The path we are on will end up re-implementing all of Xarray's coding machinery in fsspec-reference-maker. This is not sustainable. I would suggest refactoring and removing these special-case workarounds as soon as possible, before the technical debt piles up.
from kerchunk.
Related Issues (20)
- `concatenate_arrays(..., check_arrays=True)` argument not behaving as expected HOT 1
- `concatenate_arrays` with (slightly) different array shapes HOT 2
- `combine.auto_dask` doesn't loop over all batches HOT 1
- Is it possible to open remote ensemble datasets? HOT 2
- `combine.MultiZarrToZarr` producing broken reference file HOT 3
- Issue using `tiff_to_zarr` HOT 10
- Read json dictionary of references instead of list of json references HOT 3
- Error when trying to open dataset with xarray HOT 9
- tiff_to_zarr for geotiff with compression: zarr reads strange values HOT 14
- Link SECM MOM6 workflow as a case study? HOT 5
- write parquet in MultiZarrToZarr HOT 2
- TIFF: internal codec (small chunks) vs. entire file as single chunk (`imagecodecs_tiff` codec) HOT 10
- Should there be a `kerchunk[parquet]` optional install for `fastparquet`? HOT 3
- Is it possible to create a kerchunk mapping that has different chunk sizes than the underlying file. HOT 4
- Regression in 0.0.8-0.0.9 release causes race condition & segfault in eccodes grib_string_length HOT 10
- Kerchunk tutorial for July ESIP Meeting HOT 14
- MultiZarrToZarr for non-spatial HDF5 files HOT 7
- Allow file scanners to write straight to parquet
- UserWarning / NotImplementedError HOT 4
- `tiff_to_zarr` ValueError: incomplete chunks are not supported by the fsspec ReferenceFileSystem HOT 4
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from kerchunk.