hdmf-dev / hdmf-zarr Goto Github PK

View Code? Open in Web Editor NEW

7.0 7.0 5.0 3.9 MB

Zarr I/O backend for HDMF

Home Page: https://hdmf-zarr.readthedocs.io/

License: Other

Python 100.00%

hdmf-zarr's People

Contributors

Stargazers

Watchers

Forkers

alejoe91 catalystneuro

hdmf-zarr's Issues

Avoid use of special characters in tutorial file

As discussed in #53, the use of special characters (here ":") can cause issues on Windows as folders containing special characters can not found by Zarr.

Set up readthedocs auto-build on PR

On each PR on the HDMF repo, readthedocs will build the documentation which we can then review. @mavaylon1 could you please set this up on this repo?

Create release on Conda

The Conda release is not a must but would be nice to have and will be good to do as learning experience

[Feature]: Parallel Write Support for HDMF-Zarr

What would you like to see added to HDMF-ZARR?

Parallel Write Support for HDMF-Zarr

Allow NWB files written using the Zarr backend to leverage multiple threads or CPUs to enhance speed of operation.
Objectives

Zarr is built to support efficient Python parallelization strategies, both multi-processing and multi-threaded
HDMF-Zarr currently handles all write operations (including buffering and slicing) without exposing the necessary controls to enable these strategies

Approach and Plan
Identify the best injection point for parallelization parameters in the io.write() stack of HDMF-Zarr
Progress and Next Steps
TODO
Background and References

Is your feature request related to a problem?

No response

What solution would you like?

Identify the best injection point for parallelization parameters in the io.write() stack of HDMF-Zarr (as controlled via the NWBZarrIO)

Essentially revolving around the line https://github.com/hdmf-dev/hdmf/blob/2f9ec567ebe1df9fccb05f139d2f669661e50018/src/hdmf/backends/hdf5/h5_utils.py#L61 from the main repo (which might be what is used to delegate the command here as well?

Do you have any interest in helping implement the feature?

Yes.

Code of Conduct

I agree to follow this project's Code of Conduct
Have you checked the Contributing document?
Have you ensured this change was not already requested?

[Feature] Explore adding support for more Zarr storage backends

Explore adding support for other relevant Zarr backend stores to ZarrIO. See https://zarr.readthedocs.io/en/stable/api/storage.html for a list of possibly relevant stores. A few relevant tasks related to this issue are:

Add new data stores to ZarrIO
- Add support for SQLite store ZarrIO
- Add support for read-only ZipStore to ZarrIO. The ZipStore in Zarr is not mutable, i.e., writing to datasets must be aligned with chunks and attributes must be added all at once since files cannot be updated in the zip file once created. Because of this, it will be challenging to support write using ZipStore using the current implementation. Users would need to instead write with the DirectoryStore and Zip the folder afterwards.
- Evaluate adding support for other Zarr stores (e.g., database stores). Resolution of links may need to be handled differently for different stores.
Add tests and tutorials
- Add unit tests for NWBZarrIO to test writing with multiple different storage backends
- Add tutorial for using different data backends
ZarrIO updates
- Update handling of references on export when using file-based Zarr stores to make sure links are created correctly
- Update handling of references on read to ensure references to the new stores are resolved correctly on read

This issue is also related to #62, which added support for different variants of the DirectoryStore (e.g,. TempStore and NestedDirectoryStore``).

Setup CodeCov

Setup CodeCov pipeline for the repo
Require minimum coverage for patches as part of PRs
Add CodeCov badge to the README.md

Setup issue and pull request templates

Add the issue and pull request templates from https://github.com/hdmf-dev/hdmf/tree/dev/.github

Pipelines failing due to codecov

CI pipelines are failing with:

ERROR: Could not find a version that satisfies the requirement codecov==2.1.12 (from versions: 2.1.13)

Setup GitHub Actions/Pipelines for automatic CI

Similar to the main HDMF we should run the following all pull requests

flake8 (or black)
unit tests
gallery tests for sphinx gallery as part of the docs
docs build
linkcheck on docs
May Require #18 (but not sure)

For this we can most likely build off the workflows defined in HDMF https://github.com/hdmf-dev/hdmf/tree/dev/.github/workflows

PyNWB warnings in plot_convert_nwb_hdf5.py

The tutorial plot_convert_nwb_hdf5.py currently raises the following warnings.

/Users/oruebel/Devel/nwb/pynwb/src/pynwb/core.py:47: UserWarning: OpticalSeries 'StimulusPresentation_encoding': The number of frame indices in 'starting_frame' should have the same length as 'external_file'.
  warn(error_msg)
/Users/oruebel/Devel/nwb/pynwb/src/pynwb/core.py:47: UserWarning: OpticalSeries 'StimulusPresentation_encoding': Either external_file or data must be specified (not None), but not both.
  warn(error_msg)

These warnings were added in the latest version of PyNWB and it appears to be an issue with the data on DANDI itself rather than being an issue in the tutorial or HDMF-Zarr. Changing the dataset that is being used should address this issue.

Setup readthedocs

Setup readthedocs built for the dev branch and the stable release
Add docs badge to the README

HDMF Zarr needs to be compatible with HDMF 3.5.5 and up

Currently, hdmf-zarr is only compatible with >=3.5.2 and <=3.5.4
We need to fix the issues on why 3.5.5 won't work

Add support for ruff and the other modern tooling that is now in HDMF

To keep in line with HDMF:
The Python Packaging Authority is gradually phasing out setup.py in favor of pyproject.toml. We make the change here.

This involves removing versioneer as a dependency due to challenges to get it to play nicely with pyproject.toml. Using setuptools_scm for setting the package version appears to be an adequate replacement.

We will also now be using the popular black and ruff tools to impose a strict, mostly uncompromising style on the code base. ruff replaces flake8 and isort and performs additional checks. Running these tools on the code base involves changing basically every python file...

We will now use ruff to replace flake8. It is significantly faster, sorts imports, and performs additional checks.

Finally, to help automate the usage of ~~black~~, ruff, codespell, I recommend that developers install and use pre-commit, which runs these tools as well as several other helpful utility functions, to clean up the code and identify issues prior to every commit.

Fix support for external links on export

Exporting of HDF5 files with external links is not yet fully implemented/tested. tests/unit/test_io_zarr.py defines several test cases for this scenario that are not yet passing that would need to be addressed in order to complete support for external links on export:, e.g.,

test_soft_link_dataset
test_external_link_group
test_external_link_dataset
test_external_link_link
test_attr_reference
test_append_data
test_append_external_link_data
test_append_external_link_copy_data
test_export_dset_refs
test_export_cpd_dset_refs,

hdmf.backends.hdf5.h5tools.HDF5IO uses the export_source argument on export. Need to check whether we may need to use it here as well to address this issue.

Arrays possibly being transposed when converting NWB files from HDF5 to ZARR

The tutorial for converting NWB data from HDF5 to Zarr, currently shows the following warnings (see also See https://hdmf-zarr.readthedocs.io/en/latest/tutorials/plot_convert_nwb.html#read-the-zarr-file-back-in)

In particular the warning warn("Length of data does not match length of timestamps. Your data may be transposed. Time should be on " is something that should be looked at, as it appears that (some) arrays may for some reason be transposed in the conversion.

Copy and adapt files for creating of wheel and release

Certain files in HDMF are important for the release process: .gitattributes, MAINFEST.in

These should be copied over and adapted for this repo.

Update installation instruction

https://github.com/hdmf-dev/hdmf-zarr/blob/dev/docs/source/installation.rst is currently missing installation instructions for users (e.g., via pip and conda once available). We should add install instructions for users and check that the instructions for developers are current.

Fix deploy and conda CI

This PR in hdmf fixes the deploy and conda CI. hdmf-dev/hdmf#759

@mavaylon1 can you copy those changes to hdmf-zarr as well?

Update tox.ini to use test_gallery.py and fix gallery-python-3.7 tests

Update tox.ini to use test_gallery.py to be in line with HDMF
Currently both linux-gallery-python3.7-minimum and windows-gallery-python3.7-minimum will pass locally when running "python test.py --example", but not during the github checks. I've also tested a version of test_gallery.py by running in a branch "python test_gallery.py"; however, this returns an error regarding missing files. (Refer to attached images)

Setup tox for running unit tests

See https://github.com/hdmf-dev/hdmf/blob/dev/tox.ini for an example

[Bug]: Test are failing with latest HDMF

What happened?

The latest HDMF adds the HDMFIO.can_read method. Several tests are using a "dummy" OtherIO class which is missing this method. As a result, tests are failing because the OtherIO class cannot be instantiated.

TypeError: Can't instantiate abstract class OtherIO with abstract method can_read

Steps to Reproduce

See e.g., https://github.com/hdmf-dev/hdmf/actions/runs/5504859898/jobs/10031532667?pr=890

Traceback

See e.g., https://github.com/hdmf-dev/hdmf/actions/runs/5504859898/jobs/10031532667?pr=890

Operating System

Linux

Python Executable

Conda

Python Version

3.9

Package Versions

No response

Code of Conduct

I agree to follow this project's Code of Conduct
Have you checked the Contributing document?
Have you ensured this bug was not already reported?

Add PyNWB tests

NeurodataWithoutBorders/pynwb#1018 we updates the PyNWB test harness to add ZarrIO to the rountrip tests, which in turn runs all HDF5 roundtrip tests that are defined in PyNWB also for Zarr. This requires changing the test harness in PyNWB, instead it would be useful to be able to “inject” new I/O backends in the PyNWB test harness so that we can specify those tests here, rather than implementing this in PyNWB and making PyNWB dependent on hdmf-zarr.

Note: The following changes from NeurodataWithoutBorders/pynwb#1018 have already been ported:

docs/notebooks/zarr_file_conversion_test.ipynb from the the PyNWB PR has been ported to docs/gallery/plot_convert_nwb.py here
The changes in src/pynwb/__init__.py from the PyNWB PR have been added in src/hdmf_zarr/nwb.py here
However, the changes to the tests have not been ported (at least not fully) and the test harness of PyNWB has undergone some refactoring in the meantime as well so we'll need to check how to best implement these tests

Dimension warning for ElectricalSeries in NWBZarrIO Tutorial

The tutorial for Creating NWB files using NWBZarrIO currently raises the following warnings:

/home/docs/checkouts/readthedocs.org/user_builds/hdmf-zarr/envs/dev/lib/python3.7/site-packages/pynwb/ecephys.py:93: UserWarning: The second dimension of data does not match the length of electrodes. Your data may be transposed.
  warnings.warn("The second dimension of data does not match the length of electrodes. Your data may be "
/home/docs/checkouts/readthedocs.org/user_builds/hdmf-zarr/envs/dev/lib/python3.7/site-packages/pynwb/ecephys.py:93: UserWarning: The second dimension of data does not match the length of electrodes. Your data may be transposed.
  warnings.warn("The second dimension of data does not match the length of electrodes. Your data may be "

It appears that those warnings are due to errors in the tutorial itself regarding the initialization of the test data, rather than being a bug in the library itself.

Sphinx TypeError

See https://github.com/hdmf-dev/hdmf-zarr/actions/runs/4955432768/jobs/8864834650
and
sphinx-doc/sphinx#11094

Add support for storing region references

Region references are not yet fully implemented in ZarrIO. To implement region references will require updating:

ZarrReference to add a region key to support storing the selection for the region,
ZarrIO.__get_ref to support passing in the region definition to be added to theZarrReference
ZarrIO.write_dataset already partially implements the required logic for creating region references by checking for hdmf.build.RegionBuilder` inputs but will likely need updates as well
ZarrIO.__read_dataset to support reading region references, which may also require updates to ZarrIO.__parse_ref and
ZarrIO.__resolve_ref,
and possibly other parts of ZarrIO

Support lazy read of object references

Currently object references are always loaded and resolved on read. To avoid loading potentially reading and resolving large amounts of references on read, it would be ideal if references could be resolved lazily.

[Bug]: DeployRelease

What happened?

I made a release that had a bug due to using 3.10 tox tests instead of 3.11. We merged a fix to that. I did a manual release but noticed that the password secret for PYPI was being sliced at a special character. I am assuming the fact it failed was due to me doing a manually release earlier and it existing on PYPI already. As a result, this bug report is to be a point of reference for the next release if it fails. If it passes, this bug will be removed.

Steps to Reproduce

Run the workflow for deploy_release

Traceback

Successfully installed Pygments-2.15.1 SecretStorage-3.3.3 bleach-6.0.0 certifi-2023.7.22 cffi-1.15.1 charset-normalizer-3.2.0 cryptography-41.0.2 docutils-0.20.1 idna-3.4 importlib-metadata-6.8.0 jaraco.classes-3.3.0 jeepney-0.8.0 keyring-24.2.0 markdown-it-py-3.0.0 mdurl-0.1.2 more-itertools-9.1.0 pkginfo-1.9.6 pycparser-2.21 readme-renderer-40.0 requests-2.31.0 requests-toolbelt-1.0.0 rfc3986-2.0.0 rich-13.4.2 six-1.16.0 twine-4.0.2 urllib3-2.0.4 webencodings-0.5.1 zipp-3.16.2
hdmf_zarr-0.3.0-py3-none-any.whl
hdmf_zarr-0.3.0.tar.gz
/home/runner/work/_temp/3aa02331-9c9c-4b3b-ae0b-61969c85efb9.sh: line 4: M7je3: command not found
usage: twine upload [-h] [-r REPOSITORY] [--repository-url REPOSITORY_URL]
                    [-s] [--sign-with SIGN_WITH] [-i IDENTITY] [-u USERNAME]
                    [-p PASSWORD] [--non-interactive] [-c COMMENT]
                    [--config-file CONFIG_FILE] [--skip-existing]
                    [--cert path] [--client-cert path] [--verbose]
                    [--disable-progress-bar]
                    dist [dist ...]
twine upload: error: the following arguments are required: dist

Operating System

Linux

Python Executable

Python

Python Version

3.11

Package Versions

No response

Code of Conduct

I agree to follow this project's Code of Conduct
Have you checked the Contributing document?
Have you ensured this bug was not already reported?

Zarr links should be relative to root

Hi guys,

I was able to successfully export to NWB-zarr sorting info + waveforms and electrode table (using neuroconv).

I performed the conversion remotely and then downloaded the resulting files. When I try to read the file locally, I get a bad link error:

ValueError                                Traceback (most recent call last)
Cell In [5], line 4
      1 nwbfile_path = "/home/alessio/Documents/data/debug/ecephys_632269_2022-10-13_15-41-42_zarr.nwb"
      3 io = NWBZarrIO(nwbfile_path, "r")
----> 4 nwbfile = io.read()

File ~/anaconda3/envs/nwb/lib/python3.9/site-packages/hdmf/utils.py:645, in docval.<locals>.dec.<locals>.func_call(*args, **kwargs)
    643 def func_call(*args, **kwargs):
    644     pargs = _check_args(args, kwargs)
--> 645     return func(args[0], **pargs)

File ~/anaconda3/envs/nwb/lib/python3.9/site-packages/hdmf/backends/io.py:38, in HDMFIO.read(self, **kwargs)
     35 @docval(returns='the Container object that was read in', rtype=Container)
     36 def read(self, **kwargs):
     37     """Read a container from the IO source."""
---> 38     f_builder = self.read_builder()
     39     if all(len(v) == 0 for v in f_builder.values()):
     40         # TODO also check that the keys are appropriate. print a better error message
     41         raise UnsupportedOperation('Cannot build data. There are no values.')

File ~/anaconda3/envs/nwb/lib/python3.9/site-packages/hdmf/utils.py:645, in docval.<locals>.dec.<locals>.func_call(*args, **kwargs)
    643 def func_call(*args, **kwargs):
    644     pargs = _check_args(args, kwargs)
--> 645     return func(args[0], **pargs)

File ~/anaconda3/envs/nwb/lib/python3.9/site-packages/hdmf_zarr/backend.py:937, in ZarrIO.read_builder(self)
    935 @docval(returns='a GroupBuilder representing the NWB Dataset', rtype='GroupBuilder')
    936 def read_builder(self):
--> 937     f_builder = self.__read_group(self.__file, ROOT_NAME)
    938     return f_builder

File ~/anaconda3/envs/nwb/lib/python3.9/site-packages/hdmf_zarr/backend.py:973, in ZarrIO.__read_group(self, zarr_obj, name)
    971 # read sub groups
    972 for sub_name, sub_group in zarr_obj.groups():
--> 973     sub_builder = self.__read_group(sub_group, sub_name)
    974     ret.set_group(sub_builder)
    976 # read sub datasets

File ~/anaconda3/envs/nwb/lib/python3.9/site-packages/hdmf_zarr/backend.py:973, in ZarrIO.__read_group(self, zarr_obj, name)
    971 # read sub groups
    972 for sub_name, sub_group in zarr_obj.groups():
--> 973     sub_builder = self.__read_group(sub_group, sub_name)
    974     ret.set_group(sub_builder)
    976 # read sub datasets

File ~/anaconda3/envs/nwb/lib/python3.9/site-packages/hdmf_zarr/backend.py:973, in ZarrIO.__read_group(self, zarr_obj, name)
    971 # read sub groups
    972 for sub_name, sub_group in zarr_obj.groups():
--> 973     sub_builder = self.__read_group(sub_group, sub_name)
    974     ret.set_group(sub_builder)
    976 # read sub datasets

File ~/anaconda3/envs/nwb/lib/python3.9/site-packages/hdmf_zarr/backend.py:982, in ZarrIO.__read_group(self, zarr_obj, name)
    979     ret.set_dataset(sub_builder)
    981 # read the links
--> 982 self.__read_links(zarr_obj=zarr_obj, parent=ret)
    984 self._written_builders.set_written(ret)  # record that the builder has been written
    985 self.__set_built(zarr_obj, ret)

File ~/anaconda3/envs/nwb/lib/python3.9/site-packages/hdmf_zarr/backend.py:1008, in ZarrIO.__read_links(self, zarr_obj, parent)
   1006     l_path = os.path.join(link['source'], link['path'].lstrip("/"))
   1007 if not os.path.exists(l_path):
-> 1008     raise ValueError("Found bad link %s in %s to %s" % (link_name, self.__path, l_path))
   1010 target_name = str(os.path.basename(l_path))
   1011 target_zarr_obj = zarr.open(l_path, mode='r')

ValueError: Found bad link device in /home/alessio/Documents/data/debug/ecephys_632269_2022-10-13_15-41-42_zarr.nwb to results/ecephys_632269_2022-10-13_15-41-42_zarr.nwb/general/devices/Device

After debugging, the l_path is indeed the path on my remote machine. I think saving links relative to the zarr root should fix it.

Setup requires.io

For HDMF we have setup requires.io to check that requirements are up-to-date https://requires.io/github/hdmf-dev/hdmf/requirements/?branch=dev It would be useful to also set this up for hdmf-zarr

[Feature]: Remove support for python 3.7

What would you like to see added to HDMF-ZARR?

Remove all python 3.7 options and requirements.

Is your feature request related to a problem?

No response

What solution would you like?

Remove all python 3.7 options and requirements.

Do you have any interest in helping implement the feature?

Yes.

Code of Conduct

I agree to follow this project's Code of Conduct
Have you checked the Contributing document?
Have you ensured this change was not already requested?

[Bug]: Could not find already-built Builder for DynamicTable 'electrodes' in BuildManager

What happened?

Attempted a basic NWB file write attempt using the Zarr backend and hit a snag - unsure how to proceed

Steps to Reproduce

from pynwb.testing.mock.file import mock_NWBFile
from pynwb.testing.mock.ecephys import mock_ElectricalSeries
from hdmf_zarr import NWBZarrIO

nwbfile = mock_NWBFile()
nwbfile.add_acquisition(mock_ElectricalSeries())

with NWBZarrIO(path="/home/jovyan/Downloads/test_zarr.nwb", mode="w") as io:
    io.write(nwbfile)

Traceback

/opt/conda/lib/python3.10/site-packages/hdmf_zarr/backend.py:92: UserWarning: The ZarrIO backend is experimental. It is under active development. The ZarrIO backend may change any time and backward compatibility is not guaranteed.
  warnings.warn(warn_msg)
---------------------------------------------------------------------------
ReferenceTargetNotBuiltError              Traceback (most recent call last)
Cell In[9], line 2
      1 with NWBZarrIO(path="/home/jovyan/Downloads/test_zarr.nwb", mode="w") as io:
----> 2     io.write(nwbfile)

File /opt/conda/lib/python3.10/site-packages/hdmf/utils.py:645, in docval.<locals>.dec.<locals>.func_call(*args, **kwargs)
    643 def func_call(*args, **kwargs):
    644     pargs = _check_args(args, kwargs)
--> 645     return func(args[0], **pargs)

File /opt/conda/lib/python3.10/site-packages/hdmf_zarr/backend.py:160, in ZarrIO.write(self, **kwargs)
    158 """Overwrite the write method to add support for caching the specification"""
    159 cache_spec = popargs('cache_spec', kwargs)
--> 160 super(ZarrIO, self).write(**kwargs)
    161 if cache_spec:
    162     self.__cache_spec()

File /opt/conda/lib/python3.10/site-packages/hdmf/utils.py:645, in docval.<locals>.dec.<locals>.func_call(*args, **kwargs)
    643 def func_call(*args, **kwargs):
    644     pargs = _check_args(args, kwargs)
--> 645     return func(args[0], **pargs)

File /opt/conda/lib/python3.10/site-packages/hdmf/backends/io.py:56, in HDMFIO.write(self, **kwargs)
     54 """Write a container to the IO source."""
     55 container = popargs('container', kwargs)
---> 56 f_builder = self.__manager.build(container, source=self.__source, root=True)
     57 self.write_builder(f_builder, **kwargs)

File /opt/conda/lib/python3.10/site-packages/hdmf/utils.py:645, in docval.<locals>.dec.<locals>.func_call(*args, **kwargs)
    643 def func_call(*args, **kwargs):
    644     pargs = _check_args(args, kwargs)
--> 645     return func(args[0], **pargs)

File /opt/conda/lib/python3.10/site-packages/hdmf/build/manager.py:188, in BuildManager.build(self, **kwargs)
    184     self.logger.debug("Using prebuilt %s '%s' for %s '%s'"
    185                       % (result.__class__.__name__, result.name,
    186                          container.__class__.__name__, container.name))
    187 if root:  # create reference builders only after building all other builders
--> 188     self.__add_refs()
    189     self.__active_builders.clear()  # reset active builders now that build process has completed
    190 return result

File /opt/conda/lib/python3.10/site-packages/hdmf/build/manager.py:233, in BuildManager.__add_refs(self)
    230 call = self.__ref_queue.popleft()
    231 self.logger.debug("Adding ReferenceBuilder with call id %d from queue (length %d)"
    232                   % (id(call), len(self.__ref_queue)))
--> 233 call()

File /opt/conda/lib/python3.10/site-packages/hdmf/build/objectmapper.py:952, in ObjectMapper.__set_attr_to_ref.<locals>._filler()
    948 def _filler():
    949     self.logger.debug("Setting reference attribute on %s '%s' attribute '%s' to %s"
    950                       % (builder.__class__.__name__, builder.name, spec.name,
    951                          attr_value.__class__.__name__))
--> 952     target_builder = self.__get_target_builder(attr_value, build_manager, builder)
    953     ref_attr_value = ReferenceBuilder(target_builder)
    954     builder.set_attribute(spec.name, ref_attr_value)

File /opt/conda/lib/python3.10/site-packages/hdmf/build/objectmapper.py:895, in ObjectMapper.__get_target_builder(self, container, build_manager, builder)
    893 target_builder = build_manager.get_builder(container)
    894 if target_builder is None:
--> 895     raise ReferenceTargetNotBuiltError(builder, container)
    896 return target_builder

ReferenceTargetNotBuiltError: electrodes (root/acquisition/ElectricalSeries/electrodes): Could not find already-built Builder for DynamicTable 'electrodes' in BuildManager

Operating System

Linux

Python Executable

Conda

Python Version

3.11

Package Versions

DANDI Hub basic kernel on 6/15/2023 with only hdmf-zarr installed manually

Code of Conduct

I agree to follow this project's Code of Conduct
Have you checked the Contributing document?
Have you ensured this bug was not already reported?

Remove test.py

Testing using test.py is deprecated. Tests should be run using either pytest or python test_gallery.py. Let's remove test.py to reduce confusion.

Add test to ensure links keep working after files are moved

In #44 and #46 we changed reference to use relative paths to improve portability of files. During debugging we tested links continue to function when file paths change by changing the current working directory. For future testing we should add a test where we generate a file with references and then move the file to a different path and then open the file with different relative and absolute path (and using different Python working directories) to make sure links continue to function when files are being moved. See the example here for the tutorial which can be turned into a unit test:

hdmf-zarr/docs/gallery/plot_nwb_zarrio.py

Lines 138 to 158 in 6841196

 ############################################################################### 

 # Test opening the file 

 # --------------------- 

 with NWBZarrIO(path=path, mode="r") as io: 

 infile = io.read() 

 ############################################################################### 

 # Test opening with the absolute path instead 

 # ------------------------------------------- 

 with NWBZarrIO(path=absolute_path, mode="r") as io: 

 infile = io.read() 

 ############################################################################### 

 # Test changing the current directory 

 # ------------------------------------ 

 import os 

 os.chdir(os.path.abspath(os.path.join(os.getcwd(), "../"))) 

 with NWBZarrIO(path=absolute_path, mode="r") as io: 

 infile = io.read()

Save object id's as part of links and references

To enhance portability of links and references it would be nice to store the object_id of the Zarr file in addition to the relative path when a link/reference points to an external file. This will be useful both for error checking but can also help resolve links in case that path's are not valid

[Documentation]: favicon

What would you like changed or added to the documentation and why?

Add favicon to docs

Do you have any interest in helping write or edit the documentation?

No.

Code of Conduct

I agree to follow this project's Code of Conduct
Have you checked the Contributing document?
Have you ensured this change was not already requested?

Rename Mixin classes in test_io_convert

Rename the Mixin classes used to implement conversion tests from TestCaseConvertMixin to MixinTestCaseConvert to avoid issues with pytest picking up the abstract mixin classes as actual tests.

Gallery tests failing

Apparently they have been failing for three months...

The first failure is here:
https://github.com/hdmf-dev/hdmf-zarr/actions/runs/4180834506

The last success (the day before) is here:
https://github.com/hdmf-dev/hdmf-zarr/actions/runs/4170704234

[Bug]: Min req tests failing on `import zarr`

What happened?

The nightly macos-python3.7-minimum tests have been failing for 2 days. See stacktrace.

Steps to Reproduce

See https://github.com/hdmf-dev/hdmf-zarr/actions/runs/5301947163/jobs/9596558206

Traceback

==================================== ERRORS ====================================
________________ ERROR collecting tests/unit/test_io_convert.py ________________
ImportError while importing test module '/Users/runner/work/hdmf-zarr/hdmf-zarr/tests/unit/test_io_convert.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
../../../hostedtoolcache/Python/3.7.17/x64/lib/python3.7/importlib/__init__.py:127: in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
tests/unit/test_io_convert.py:40: in <module>
    from hdmf_zarr.backend import (ZarrIO,
src/hdmf_zarr/__init__.py:1: in <module>
    from .backend import ZarrIO
src/hdmf_zarr/backend.py:12: in <module>
    import zarr
.tox/py37-minimum/lib/python3.7/site-packages/zarr/__init__.py:2: in <module>
    from zarr.codecs import *
.tox/py37-minimum/lib/python3.7/site-packages/zarr/codecs.py:2: in <module>
    from numcodecs import *
.tox/py37-minimum/lib/python3.7/site-packages/numcodecs/__init__.py:32: in <module>
    from numcodecs.bz2 import BZ2
.tox/py37-minimum/lib/python3.7/site-packages/numcodecs/bz2.py:1: in <module>
    import bz2 as _bz2
../../../hostedtoolcache/Python/3.7.17/x64/lib/python3.7/bz2.py:19: in <module>
    from _bz2 import BZ2Compressor, BZ2Decompressor
E   ModuleNotFoundError: No module named '_bz2'
__________________ ERROR collecting tests/unit/test_zarrio.py __________________
ImportError while importing test module '/Users/runner/work/hdmf-zarr/hdmf-zarr/tests/unit/test_zarrio.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
../../../hostedtoolcache/Python/3.7.17/x64/lib/python3.7/importlib/__init__.py:127: in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
tests/unit/test_zarrio.py:12: in <module>
    from tests.unit.base_tests_zarrio import (BaseTestZarrWriter,
tests/unit/base_tests_zarrio.py:13: in <module>
    import zarr
.tox/py37-minimum/lib/python3.7/site-packages/zarr/__init__.py:2: in <module>
    from zarr.codecs import *
.tox/py37-minimum/lib/python3.7/site-packages/zarr/codecs.py:2: in <module>
    from numcodecs import *
.tox/py37-minimum/lib/python3.7/site-packages/numcodecs/__init__.py:32: in <module>
    from numcodecs.bz2 import BZ2
.tox/py37-minimum/lib/python3.7/site-packages/numcodecs/bz2.py:1: in <module>
    import bz2 as _bz2
../../../hostedtoolcache/Python/3.7.17/x64/lib/python3.7/bz2.py:19: in <module>
    from _bz2 import BZ2Compressor, BZ2Decompressor
E   ModuleNotFoundError: No module named '_bz2'

Operating System

Linux

Python Executable

Conda

Python Version

3.7

Package Versions

No response

Code of Conduct

I agree to follow this project's Code of Conduct
Have you checked the Contributing document?
Have you ensured this bug was not already reported?

Investigate SpikeInterfaceRecordingDataChunkIterator I/O error

See catalystneuro/neuroconv#202

conda-linux-python3.7-minimum test failing

py37-minimum create: /home/runner/work/hdmf-zarr/hdmf-zarr/.tox/py37-minimum
ERROR: invocation failed (exit code 1), logfile: /home/runner/work/hdmf-zarr/hdmf-zarr/.tox/py37-minimum/log/py37-minimum-0.log
================================== log start ===================================
AttributeError: 'dict' object has no attribute 'select'

=================================== log end ====================================
ERROR: InvocationError for command /usr/share/miniconda/envs/true/bin/python3.7 -m virtualenv --download --python /usr/share/miniconda/envs/true/bin/python3.7 py37-minimum (exited with code 1)
___________________________________ summary ____________________________________
ERROR:   py37-minimum: InvocationError for command /usr/share/miniconda/envs/true/bin/python3.7 -m virtualenv --download --python /usr/share/miniconda/envs/true/bin/python3.7 py37-minimum (exited with code 1)

Setup branch protections

Setup GitHub branch protections for the dev branch (similar to the setup in HDMF) to: i) require PRs and prevent direct commits to the dev branch, ii) require that PRs pass the main CI checks (see #10)

Add support for dtype and shape on ZarrDataIO

In HDMF hdmf-dev/hdmf#747 added the ability to setup datasets on write ahead of time without having the actual data. This was accomplished by adding the option shape and dtype parameters on DataIO. ZarrDataIO currently does not support these parameters; see:

hdmf-zarr/src/hdmf_zarr/utils.py

Lines 191 to 196 in 2ad64cd

 # NOTE: dtype and shape of the DataIO base class are not yet supported by ZarrDataIO. 

 # These parameters are used to create empty data to allocate the data but 

 # leave the I/O to fill the data to the user. 

 super(ZarrDataIO, self).__init__(data=data, 

 dtype=None, 

 shape=None)

To match functionality with HDF5DataIO it would be useful to add support for shape and dtype to ZarrDataIO and update the ZarrIO backend to support creation of empty datasets from ZarrDataIO objects that only have the shape and dtype specified but contain no actual data. The changes needed to ZarrDataIO should be fairly minimal (i.e., essentially just update the docval and handling in init). The main changes required should be in ZarrIO to check for the case when ZarrDataIO.data is empty and support creation of empty Zarr datasets.

Create release on PyPI

Create 0.1.0 release of hdmf-zarr on PyPi
Add PyPI badge to README

Add roundtrip test for Zarr with DataChunkIterator

#72 fixed an issue where the zarr_dtype attribute was not set on write when DataChunkIterator is being used (which in turn caused an error on read). We should add rountrip tests using DataChunkIterator for write to cover this case.

[Documentation]: Add installation instructions

What would you like changed or added to the documentation and why?

https://hdmf-zarr.readthedocs.io/en/latest/installation.html

The "latest" still appears as:

Do you have any interest in helping write or edit the documentation?

No.

Code of Conduct

I agree to follow this project's Code of Conduct
Have you checked the Contributing document?
Have you ensured this change was not already requested?

Add tests for Python 3.11

HDMF supports Python 3.11. We should add tests for 3.11 here too.

	if has_reference:
	try:
	# TODO Should implement a lazy way to evaluate references for Zarr
	data = deepcopy(data[:])
	self.__parse_ref(kwargs['maxshape'], obj_refs, reg_refs, data)
	except ValueError as e:
	raise ValueError(str(e) + " zarr-name=" + str(zarr_obj.name) + " name=" + str(name))

	###############################################################################
	# Test opening the file
	# ---------------------
	with NWBZarrIO(path=path, mode="r") as io:
	infile = io.read()

	###############################################################################
	# Test opening with the absolute path instead
	# -------------------------------------------
	with NWBZarrIO(path=absolute_path, mode="r") as io:
	infile = io.read()

	###############################################################################
	# Test changing the current directory
	# ------------------------------------

	import os
	os.chdir(os.path.abspath(os.path.join(os.getcwd(), "../")))

	with NWBZarrIO(path=absolute_path, mode="r") as io:
	infile = io.read()

	# NOTE: dtype and shape of the DataIO base class are not yet supported by ZarrDataIO.
	# These parameters are used to create empty data to allocate the data but
	# leave the I/O to fill the data to the user.
	super(ZarrDataIO, self).__init__(data=data,
	dtype=None,
	shape=None)

hdmf-dev / hdmf-zarr Goto Github PK

hdmf-zarr's People

Contributors

Stargazers

Watchers

Forkers

hdmf-zarr's Issues

What would you like to see added to HDMF-ZARR?

Is your feature request related to a problem?

What solution would you like?

Do you have any interest in helping implement the feature?

Code of Conduct

What happened?

Steps to Reproduce

Traceback

Operating System

Python Executable

Python Version

Package Versions

Code of Conduct

What happened?

Steps to Reproduce

Traceback

Operating System

Python Executable

Python Version

Package Versions

Code of Conduct

What would you like to see added to HDMF-ZARR?

Is your feature request related to a problem?

What solution would you like?

Do you have any interest in helping implement the feature?

Code of Conduct

What happened?

Steps to Reproduce

Traceback

Operating System

Python Executable

Python Version

Package Versions

Code of Conduct

What would you like changed or added to the documentation and why?

Do you have any interest in helping write or edit the documentation?

Code of Conduct

What happened?

Steps to Reproduce

Traceback

Operating System

Python Executable

Python Version

Package Versions

Code of Conduct

What would you like changed or added to the documentation and why?

Do you have any interest in helping write or edit the documentation?

Code of Conduct

Recommend Projects

Recommend Topics

Recommend Org

Jobs