cloud-drift / clouddrift Goto Github PK

View Code? Open in Web Editor NEW

36.0 6.0 8.0 5.51 MB

CloudDrift accelerates the use of Lagrangian data for atmospheric, oceanic, and climate sciences.

Home Page: https://clouddrift.org/

License: MIT License

Python 100.00%

climate-data climate-science data-structures oceanography python

clouddrift's People

Contributors

Stargazers

Watchers

Forkers

milancurcic philippemiron selipot miniufo vadmbertr kevinsantana11 akin-steven rcaneill

clouddrift's Issues

Allow setting a download directory for individual GDP files

See #88 (comment).

Implement `RaggedArray.from_awkward()`

Similar to #44, but for instantiating a RaggedArray from an awkward array instance.

Not sure yet whether and to what extent is this useful, but at least for feature parity with xarray.

Like in #44, this is already used internally in RaggedArray.from_parquet():

clouddrift/clouddrift/dataformat.py

Lines 117 to 131 in 437c0e4

 ds = ak.from_parquet(filename) 

 attrs_global = ds.layout.parameters["attrs"] 

 name_coords = ["time", "lon", "lat", "ids"] 

 for var in name_coords: 

 coords[var] = ak.flatten(ds.obs[var]).to_numpy() 

 attrs_variables[var] = ds.obs[var].layout.parameters["attrs"] 

 for var in [v for v in ds.fields if v != "obs"]: 

 metadata[var] = ds[var].to_numpy() 

 attrs_variables[var] = ds[var].layout.parameters["attrs"] 

 for var in [v for v in ds.obs.fields if v not in name_coords]: 

 data[var] = ak.flatten(ds.obs[var]).to_numpy() 

 attrs_variables[var] = ds.obs[var].layout.parameters["attrs"]

missing version tag for the package

import clouddrift
print(clouddrift.__version__)

should return the current version. I believe that this is how you can easily create a GitHub action that would update the pypi package if this number is updated.

fail to "vectorize" velocity_from_position

I am attempting to apply velocity_from_position to xarray.DataArrays of lon, lat, and time. I have been following a tutorial for a similar situation. With the following ds Dataset:

ds.info()
xarray.Dataset {
dimensions:
	trajectory = 593297 ;
	obs = 1440 ;

variables:
	float64 time(trajectory, obs) ;
	float32 lat(trajectory, obs) ;
	float32 lon(trajectory, obs) ;
	int32 obs(obs) ;
	int64 trajectory(trajectory) ;
}

I can easily do:

u,v = velocity_from_position(ds.lon.isel(trajectory=0),ds.lat.isel(trajectory=0),ds.time.isel(trajectory=0))

u2,v2 = xr.apply_ufunc(
    velocity_from_position,
    ds.lon.isel(trajectory=0),
    ds.lat.isel(trajectory=0),
    ds.time.isel(trajectory=0),
    input_core_dims=[["obs"], ["obs"], ["obs"]],
    output_core_dims=[["obs"], ["obs"]],
    dask="allowed",
)

but the following fails:

u2,v2 = xr.apply_ufunc(
    velocity_from_position,  # first the function
    ds.lon.isel(trajectory=slice(0,10)),
    ds.lat.isel(trajectory=slice(0,10)),
    ds.time.isel(trajectory=slice(0,10)),
    input_core_dims=[["obs"], ["obs"], ["obs"]],
    output_core_dims=[["obs"],["obs"]],
    dask="allowed",
    vectorize=True,
)

and the bottom line of the error is

File ~/miniconda3/envs/research/lib/python3.10/site-packages/clouddrift/analysis.py:65, in velocity_from_position(x, y, time, coord_system, difference_scheme)
     57 # Compute dx, dy, and dt
     58 if difference_scheme == "forward":
     59 
     60     # All values except the ending boundary value are computed using the
   (...)
     63 
     64     # Time
---> 65     dt[:-1] = np.diff(time)
     66     dt[-1] = dt[-2]
     68     # Space

ValueError: could not broadcast input array from shape (10,1439) into shape (9,1440)

So yes, I get the error but I don't understand if the fix is to apply ufunc differently or make velocity_from_position more flexible?

Implement adapters for custom datasets

Some are already implemented in clouddrift-examples.

Ragged array incremental update?

How to deal with an incremental updates after QC of new trajectories for example?

Include GDP ragged array processing pangeo-forg/staged-recipes#101

Test of a project.

Migrate notebooks to clouddrift-examples

The examples can then grow in their own repo and keep the core library repo clean. We can then also separate the dependencies needed for the core library and for examples (#41). We'll link to the examples repo from the docs and the core library README.

environment file issue

Couple of points about the environment file:

Get the following warning when creating the environment:

Warning: you have pip-installed dependencies in your environment file, but you do not list pip itself as one of your conda dependencies. Conda may not use the correct pip to install your packages, and they may end up in the wrong place. Please add an explicit pip dependency. I'm adding one for you, but still nagging you.

Would it be better practice to include the version number for the packages?

`velocity_from_position`: Handle ragged array

Previous discussion in #68.

@selipot suggested that velocity_from_position should also handle ragged arrays as input. Let's discuss here what these ragged arrays look like. I.e. is the ragged array in the form of an xarray Dataset as generated by clouddrift or something else?

Allow specifying the time axis for multi-dimensional input arrays to `velocity_from_position`

In #71 (comment) @selipot wrote:

For this and future functions that are dimension-aware, could we make it more flexible for the user by passing arguments as needed time_axis = n_1?

Should docs be built and deployed on push to clouddrift/*.py files?

As I understand it, the docs are built and deployed when there is a push to files in docs/:

clouddrift/.github/workflows/docs.yml

Lines 1 to 7 in 17869a7

 name: Docs 

 on: 

 push: 

 branches: [ main ] 

 paths: 

 - 'docs/**'

However, docstrings are sourced from Python module files and we need a manual re-build if, say, no new module is added (no changes to docs/) but a docstring in a .py file is updated. I've been running the docs workflow manually in such cases.

Is it reasonable to also build and deploy the docs on changes to *.py files? In other words:

 name: Docs 
  
 on: 
   push: 
     branches: [ main ] 
     paths: 
       - 'docs/**' 
       - 'clouddrift/*.py'

Building docs fails

See https://github.com/Cloud-Drift/clouddrift/actions/runs/3935783550/jobs/6731763496.

The relevant bit is:

Theme error:
An error happened in rendering the page api.
Reason: UndefinedError("'logo' is undefined")
make: *** [Makefile:20: html] Error 2
Error: Process completed with exit code 2.

This error goes away for me locally after I have commented out the html_theme_options in docs/conf.py. However, it doesn't seem to go away in GitHub Actions and I don't understand why.

@philippemiron do you have an idea?

Rename `ragged_array` -> `RaggedArray`

The Python style guide recommends TitleCase for class names.

This is a very minor issue. Newcomers to the library who are familiar with Python may be on first look get the idea that

from clouddrift import ragged_array

is a function and not a class. Since the project is very young and there are a few if any users, I think it'd be good to address this now and be consistent with the Python style guide for the public API.

Upgrade to awkward v2

Awkward v2 was released on Dec 9. As we don't require <2 for awkward in pyproject.toml dependencies field, the awkward import will need to change from

import awkward._v2 as ak

import awkward as ak

and require awkward>=2.0.0 in pyproject.toml.

Thanks to Ibis Gonzalez for reporting.

Preprocess other Lagrangian datasets (such as output files from OceanParcels and OpenDrift)

Build a basic preprocess.py with only a limited number of expected variables (time, lat, lon, id, ?) and takes as an optional argument a python dictionary with a list of other variables with dimension traj or ‘obs`.

What dependencies are needed?

In pyproject.toml we have

dependencies = [
    "numpy>=1.22.4",
    "pandas>=1.4.2",
    "xarray>=2022.3.0",
    "netcdf4>=1.5.8",
    "pyarrow>=8.0.0",
    "zarr>=2.11.3",
    "numba>=0.53.1",
    "tqdm>=4.64.0",
    "fsspec>=2022.3.0",
    "awkward==1.9.0",
]

From my scanning of the project, we're not using ~~fsspec~~, pandas, zarr, and numba. Can they be safely removed? We're not using netCDF4 directly, but it's an optional dependency of xarray.

Syntax error when importing clouddrift with Python 3.8

$ ipython
Python 3.8.10 (default, Jun 22 2022, 20:18:18) 
Type 'copyright', 'credits' or 'license' for more information
IPython 8.5.0 -- An enhanced Interactive Python. Type '?' for help.

In [1]: import clouddrift
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In [1], line 1
----> 1 import clouddrift

File ~/Work/clouddrift/venv/lib/python3.8/site-packages/clouddrift/__init__.py:1
----> 1 from clouddrift.dataformat import *

File ~/Work/clouddrift/venv/lib/python3.8/site-packages/clouddrift/dataformat.py:9
      5 from typing import Tuple, Optional
      6 from tqdm import tqdm
----> 9 class ragged_array:
     10     def __init__(
     11         self,
     12         coords: dict,
   (...)
     16         attrs_variables: Optional[dict] = {},
     17     ):
     18         self.coords = coords

File ~/Work/clouddrift/venv/lib/python3.8/site-packages/clouddrift/dataformat.py:29, in ragged_array()
     22     self.attrs_variables = attrs_variables
     23     self.validate_attributes()
     25 @classmethod
     26 def from_files(
     27     cls,
     28     indices: list,
---> 29     preprocess_func: Callable[[int], xr.Dataset],
     30     vars_coords: dict,
     31     vars_meta: list = [],
     32     vars_data: list = [],
     33     rowsize_func: Optional[Callable[[int], int]] = None,
     34 ):
     35     """Generate ragged arrays archive from a list of trajectory files
     36 
     37     Args:
   (...)
     46         obj: ragged array class object
     47     """
     48     # if no method is supplied, get the dimension from the preprocessing function

TypeError: 'ABCMeta' object is not subscriptable

As I understand it, the typing syntax for stuff in collections.abc was not introduced until Python 3.9.

We can remove the subscript here, or we can require Python 3.9. I recommend the latter.

Function to compute velocity from positions

Let's discuss the API for this function.

def velocity_from_positions(
    x: xr.DataArray,
    y: xr.DataArray,
    time: xr.DataArray,
    coords: str='spherical', # where x is lon and y is lat, otherwise can be 'cartesian' where x is eating and y is northing
    order: int=1, # can also be 2 for centered difference; we can discuss if higher orders may be desired
) -> Tuple(xr.DataArray, xr.DataArray)

I wonder if we shouldn't use coords to avoid confusion with the Xarray special coords. Perhaps coord_system or coordinate_system?

Until there is native support for ragged arrays in Xarray, the user should be careful to not run this on multiple consecutive trajectories. It will be easy for this function to detect that (e.g. jump in time and space) and issue a warning.

Update environment.yml

In light of the recent packaging updates for awkward and adding clouddrift to conda-forge.

See this thread for context.

Function to recast longitude values [0,360] or [-180,180]

lon = longitude_unwrap(lon,pivot=[180,360]) : recast longitudes in [-180,180[ or or [0,360[
See in jLab deg180.m, deg360.m and inversion degunwrap.m

pip vs conda install in README.md

@philippemiron @milancurcic :

pip vs conda install on the README.md ?
if the conda install persist, change the name of the environment from clouddrift to clouddrift_env or something?
do we really need the conda environment install? for development?

Implement RaggedArray.to_zarr() and RaggedArray.from_zarr()?

From #40 by @selipot:

The latest version of ocean parcels now outputs in zarr format, see https://github.com/OceanParcels/parcels/releases/tag/v2.4.0. It is a priority to write a new recipe that takes such zarr output (still written as a sparse 2D array) into a RaggedArray. We also should add a functionality to output the RaggedArray to zarr with RaggedArray.to_zarr()

GDP 6-hourly dataset versioning

Right now preprocessing-6hourly.ipynb generates files such as gdp_6h_v2.00*.nc yet the label version 2.00 is applicable only to the hourly dataset. I think the versioning of the 6 hourly dataset is done with a cut off date. The latest date in the dataset created with that code is 2021-01-04T00:00:00.000000000 so I am guessing the cut off is December 2020. This needs to be checked with @RickLumpkin and Bertrand (is he on github?).

Implement `from_gdp1h` from AWS

def from_gdp1h() -> xr.Dataset:
    ...

This can go in clouddrift/adapters/__init__.py unless a better idea comes up.

Prototype clouddrift library

best way to create a clouddrift kernel for jupyter lab?

What's the best way to create a clouddrift kernel for jupyter lab?

After creating the clouddrift environment with conda I had to

conda activate clouddrift
conda install ipython ipykernel
ipython kernel install --user --name=clouddrift
Then in another environment with jupyter lab installed, launch jupyter lab, then select the clouddrift kernel within jupyter lab.

Is this the right way to go about this? That is for using a jupyter lab/notebook to run some of the clouddrift examples locally?

remove or stash timeseries.py

Please remove/stash/delete timeseries.py. It only contains a spectrum function which is incorrect. And it should certainly not be in the docs :)

Should we commit executed Jupyter notebooks?

Currently, the notebooks under examples/ are committed with cleared cells.

Should we commit executed notebooks? An advantage to doing this is that the notebook becomes readable in the browser, no need to run it. This is for people who want to get get a taste of it without having to set it up locally.

A disadvantage is a little more burden on maintenance. If the output of the cells changes (e.g. due to the change of the implementation or the API), the notebooks would need to be updated as well. If there are graphics in the notebooks (and there aren't yet), then the PNG images are encoded as strings which become part of the notebook JSON file. This can significantly increase the size of the git repo, although not in any problematic way.

All this said, I'm in favor of committing executed notebooks to the repo. I'm curious what you think.

RaggedArray to accept any type of coordinates and variables

The function RaggedArray does seem to expect lat, lon, time as coordinates. It should be more flexible and allow ad hoc coordinates.

GLAD dataset adapter

Part of #53.

Can be adapted from clouddrift-examples/data/glad.py into clouddrift/adapters.py.

Units of time and the result in `velocity_from_position`

In #62 (comment) @selipot wrote:

I think the function returns the velocity in meter per unit of time input, correct? If yes we need to update the help. Can the function be made aware of the unit of the time and automatically return the velocity in m/s? with an option for that behavior to be disabled?

Public website for the project

The main structure of the website is there (docs/), but it needs updates before public release.

modify the general description of the project
includes the prototype of the library

GDP sorted RA

Read the directory files (https://www.aoml.noaa.gov/ftp/pub/phod/buoydata/)
Sort by death_date and/or deployment.

include netCDF4 in the environment file for CloudDrift?

After installation on a brand new machine, the creation of the ragged array in data format-gdp-6hourly.ipynb failed with the error:

installing netCDF4 within cloud drift and restarting the kernel fixed the issue.

Update GDP `preprocess.py` in clouddrift

Based on the latest version used for the Jupyter Notebook, it should be more flexible to query data from https/ftp or the ERRDAP server.

Includes a way to fetch:

by specific IDs;
only alive drifters;
by date;
by some attributes.

All of this should be possible by initially reading the metadata file (dirfl_1_5000.dat).

lint

Ok, I was really confused by this linting error.

The files are fine, but there is a circular import... See for example when you run:

$ black haversine.py dataformat.py
All done! ✨ 🍰 ✨
2 files left unchanged.
(research) pmiron@m2air ~/Downloads/clouddrift/clouddrift
$ Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/Users/pmiron/micromamba/envs/research/lib/python3.10/multiprocessing/__init__.py", line 16, in <module>
    from . import context
  File "/Users/pmiron/micromamba/envs/research/lib/python3.10/multiprocessing/context.py", line 6, in <module>
    from . import reduction
  File "/Users/pmiron/micromamba/envs/research/lib/python3.10/multiprocessing/reduction.py", line 16, in <module>
    import socket
  File "/Users/pmiron/micromamba/envs/research/lib/python3.10/socket.py", line 54, in <module>
    import os, sys, io, selectors
  File "/Users/pmiron/micromamba/envs/research/lib/python3.10/selectors.py", line 12, in <module>
    import select
  File "/Users/pmiron/Downloads/clouddrift/clouddrift/select.py", line 1, in <module>
    import awkward as ak
  File "/Users/pmiron/micromamba/envs/research/lib/python3.10/site-packages/awkward/__init__.py", line 7, in <module>
    import awkward._nplikes
  File "/Users/pmiron/micromamba/envs/research/lib/python3.10/site-packages/awkward/_nplikes.py", line 7, in <module>
    import numpy
  File "/Users/pmiron/micromamba/envs/research/lib/python3.10/site-packages/numpy/__init__.py", line 140, in <module>
    from . import core
  File "/Users/pmiron/micromamba/envs/research/lib/python3.10/site-packages/numpy/core/__init__.py", line 100, in <module>
    from . import _add_newdocs_scalars
  File "/Users/pmiron/micromamba/envs/research/lib/python3.10/site-packages/numpy/core/_add_newdocs_scalars.py", line 9, in <module>
    import platform
  File "/Users/pmiron/micromamba/envs/research/lib/python3.10/platform.py", line 119, in <module>
    import subprocess
  File "/Users/pmiron/micromamba/envs/research/lib/python3.10/subprocess.py", line 223, in <module>
    _PopenSelector = selectors.SelectSelector
AttributeError: partially initialized module 'selectors' has no attribute 'SelectSelector' (most likely due to a circular import)

Long story short, the issue is with the name of the module select.py. If you rename it to anything else, it works. So my guess is that black is importing another module call select.py (https://docs.python.org/3/library/select.html).

The solution is to change the name of the select.py module, but to be honest I don't have any good suggestions.

Download file again if present but the remote GDP file is newer

See #88 (comment)

conda package

The package is now available on pypi but a few steps are required to put this into a conda feedstock. I am waiting for awkward array >=1.9.0 (link) to be available in conda, which would simplify a lot building the package and setting up the environment. This should also fix the binder environment.

Note that the clouddrift library is using awkward array v2, which officially should be out by the end of Q4 2022. It is accessed using import awkward._v2 as ak for now, and a lot of this is only available from >1.8.0. Their conda package is currently on 1.8.0, but it is evolving pretty fast!

speed up processing of numerical datasets

With numerical outputs, it is not efficient to loop through the trajectories when we can simply identify the filling value and reshape the data into ragged arrays.

preprocess-parquet.ipynb vs preprocess-hourly.ipynb

Why are parquet files generated both in preprocess-parquet.ipynb and preprocess-hourly.ipynb? Can these be combined?

How to interpolate a gridded field onto the drifter locations?

I am trying a solution on the branch sst-interp in examples/interp_cci_drifters.ipynb with the cci sst analysis global dataset. That dataset is hosted on some AWS data repository. In order to make this interpolation manageable I am looping over trajectories using xarray. There is probably a better solution.

Extract/Select

Include a general function to filter by any attribute by taking a dictionary as input containing list of variables and ranges.
to extract a region we could pass a dictionary {'lon': [min, max], 'lat': [min, max], 'time': [min, max]}

Lagrangian observations datasets

What list of datasets should we use as examples for the project?

GulfDrifters: A consolidated surface drifter dataset for the Gulf of Mexico (link)
Strateole-2: Long-duration balloon flights at the tropical tropopause (link)
WHOI gliders (link)
AOML gliders (link)

Implement `RaggedArray.from_xarray()`

As discussed with @selipot on 10/20, it's in scope for clouddrift to allow getting a RaggedArray instance from an xarray.Dataset.

RaggedArray.from_netcdf() already does this internally, i.e.:

clouddrift/clouddrift/dataformat.py

Lines 78 to 97 in 437c0e4

 with xr.open_dataset(filename) as ds: 

 nb_traj = ds.dims["traj"] 

 nb_obs = ds.dims["obs"] 

 attrs_global = ds.attrs 

 for var in ds.coords.keys(): 

 coords[var] = ds[var].data 

 attrs_variables[var] = ds[var].attrs 

 for var in ds.data_vars.keys(): 

 if len(ds[var]) == nb_traj: 

 metadata[var] = ds[var].data 

 elif len(ds[var]) == nb_obs: 

 data[var] = ds[var].data 

 else: 

 print( 

 f"Error: variable '{var}' has unknown dimension size of {len(ds[var])}, which is not traj={nb_traj} or obs={nb_obs}." 

 ) 

 attrs_variables[var] = ds[var].attrs

Some assumptions are currently needed for the dimension names (currently assumed "traj" and "obs"). The method should allow the user to specify dimension names to use, in absence of an established convention.

GDP hourly dataset adapter

Part of #53.

Incorporate from clouddrift-examples/data/gdp.py into clouddrift/adapters.py.

create github actions to update pypi package

create secret in the CloudDrift account for pypitest and pypi API tokens
create a rule to build (this should work):

on:
  release:
    types:
      - published
  push:
    tags:
      - 'v*'

that triggers on new release, if the tags starts with v.

send to pypi test
send to pypi
validate that the package is properly upload to pypi on a new version released

Reference (https://github.com/OceanParcels/parcels/blob/master/.github/workflows/pypi-release.yaml)

RaggedArray from numerical

I am following the example dataformat-numerical.ipynb to convert the output of an ocean parcels simulation to a ragged array and save to a NetCDF file but I do not understand how the time variable is handled and/or if the units can be specified. The NetCDF file written by parcels contain the variable time in units of seconds since a pivot date but the NetCDF file written by clouddrift after converting to a ragged array seems to be in minutes since the origin of the experiment. I dug through dataformat.py to understand but could not figure it out.

`velocity_from_position`: Handle n-d arrays

Previous discussion in #68.

This issue is to discuss whether and how should velocity_from_position handle n-d arrays. Some specific questions:

Is 2-d sufficient for most use cases or is n-d also useful? The second obvious dimension here are trajectories.
If not, what are the use cases for n-d?

failed install of clouddrift on HPC triton at UM

Installing the developper version of clouddrift fails even after loading a module to have a recent version of cmake. Error is below. Note it says clouddrift built successfully but it is not available in my environment
``
...
copying pyarrow/tests/data/parquet/v0.7.1.some-named-index.parquet -> build/lib.linux-ppc64le-cpython-310/pyarrow/tests/data/parquet
running build_ext
creating /tmp/pip-install-dc72ev4z/pyarrow_d29963b5e592436b82282ff9a8c4a3d3/build/cpp
-- Running CMake for PyArrow C++
cmake -DARROW_BUILD_DIR=build -DCMAKE_BUILD_TYPE=release -DCMAKE_INSTALL_LIBDIR=lib -DCMAKE_INSTALL_PREFIX=/tmp/pip-install-dc72ev4z/pyarrow_d29963b5e592436b82282ff9a8c4a3d3/build/dist -DPYTHON_EXECUTABLE=/home/selipot/miniconda3/envs/research/bin/python3.10 -DPython3_EXECUTABLE=/home/selipot/miniconda3/envs/research/bin/python3.10 -DPYARROW_CXXFLAGS= -DPYARROW_WITH_DATASET=off -DPYARROW_WITH_PARQUET_ENCRYPTION=off -DPYARROW_WITH_HDFS=off -DPYARROW_WITH_FLIGHT=off /tmp/pip-install-dc72ev4z/pyarrow_d29963b5e592436b82282ff9a8c4a3d3/pyarrow/src
-- The C compiler identification is GNU 8.3.1
-- The CXX compiler identification is GNU 8.3.1
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working C compiler: /opt/rh/devtoolset-8/root/usr/bin/cc - skipped
-- Detecting C compile features
-- Detecting C compile features - done
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /opt/rh/devtoolset-8/root/usr/bin/c++ - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
CMake Error at CMakeLists.txt:63 (find_package):
By not providing "FindArrow.cmake" in CMAKE_MODULE_PATH this project has
asked CMake to find a package configuration file provided by "Arrow", but
CMake did not find one.

    Could not find a package configuration file provided by "Arrow" with any of
    the following names:
  
      ArrowConfig.cmake
      arrow-config.cmake
  
    Add the installation prefix of "Arrow" to CMAKE_PREFIX_PATH or set
    "Arrow_DIR" to a directory containing one of the above files.  If "Arrow"
    provides a separate development package or SDK, be sure it has been
    installed.
  
  
  -- Configuring incomplete, errors occurred!
  See also "/tmp/pip-install-dc72ev4z/pyarrow_d29963b5e592436b82282ff9a8c4a3d3/build/cpp/CMakeFiles/CMakeOutput.log".
  error: command '/share/builds/ppcle/spack/opt/spack/linux-rhel7-power9le/gcc-8.3.1/cmake-3.20.2-ghhqkkvhbflxpgxyzumbfrip46j4ga3f/bin/cmake' failed with exit code 1
  [end of output]

note: This error originates from a subprocess, and is likely not a problem with pip.
ERROR: Failed building wheel for pyarrow
Successfully built clouddrift
Failed to build pyarrow
ERROR: Could not build wheels for pyarrow, which is required to install pyproject.toml-based projects
``

	ds = ak.from_parquet(filename)
	attrs_global = ds.layout.parameters["attrs"]

	name_coords = ["time", "lon", "lat", "ids"]
	for var in name_coords:
	coords[var] = ak.flatten(ds.obs[var]).to_numpy()
	attrs_variables[var] = ds.obs[var].layout.parameters["attrs"]

	for var in [v for v in ds.fields if v != "obs"]:
	metadata[var] = ds[var].to_numpy()
	attrs_variables[var] = ds[var].layout.parameters["attrs"]

	for var in [v for v in ds.obs.fields if v not in name_coords]:
	data[var] = ak.flatten(ds.obs[var]).to_numpy()
	attrs_variables[var] = ds.obs[var].layout.parameters["attrs"]

	with xr.open_dataset(filename) as ds:
	nb_traj = ds.dims["traj"]
	nb_obs = ds.dims["obs"]

	attrs_global = ds.attrs

	for var in ds.coords.keys():
	coords[var] = ds[var].data
	attrs_variables[var] = ds[var].attrs

	for var in ds.data_vars.keys():
	if len(ds[var]) == nb_traj:
	metadata[var] = ds[var].data
	elif len(ds[var]) == nb_obs:
	data[var] = ds[var].data
	else:
	print(
	f"Error: variable '{var}' has unknown dimension size of {len(ds[var])}, which is not traj={nb_traj} or obs={nb_obs}."
	)
	attrs_variables[var] = ds[var].attrs

cloud-drift / clouddrift Goto Github PK

clouddrift's People

Contributors

Stargazers

Watchers

Forkers

clouddrift's Issues

Recommend Projects

Recommend Topics

Recommend Org

Jobs