GithubHelp home page GithubHelp logo

grabbit's Introduction

grabbit

Get grabby with file trees

Overview

Grabbit is a lightweight Python 3 package for simple queries over filenames within a project. It's geared towards projects or applications with highly structured filenames that allow useful queries to be performed without having to inspect the file metadata or contents.

Status

  • Build Status
  • Coverage Status

Installation

$ pip install grabbit

Or, if you like to (a) do things the hard way or (b) live on the bleeding edge:

$ git clone https://github.com/grabbles/grabbit
$ cd grabbit
$ python setup.py develop

Quickstart

Suppose we've already defined (or otherwise obtained) a grabbit JSON configuration file that looks like this. And suppose we also have some kind of many-filed project that needs managing. Maybe it looks like this:

├── dataset_description.json
├── participants.tsv
├── sub-01
│   ├── ses-1
│   │   ├── anat
│   │   │   ├── sub-01_ses-1_T1map.nii.gz
│   │   │   └── sub-01_ses-1_T1w.nii.gz
│   │   ├── fmap
│   │   │   ├── sub-01_ses-1_run-1_magnitude1.nii.gz
│   │   │   ├── sub-01_ses-1_run-1_magnitude2.nii.gz
│   │   │   ├── sub-01_ses-1_run-1_phasediff.json
│   │   │   ├── sub-01_ses-1_run-1_phasediff.nii.gz
│   │   │   ├── sub-01_ses-1_run-2_magnitude1.nii.gz
│   │   │   ├── sub-01_ses-1_run-2_magnitude2.nii.gz
│   │   │   ├── sub-01_ses-1_run-2_phasediff.json
│   │   │   └── sub-01_ses-1_run-2_phasediff.nii.gz
│   │   ├── func
│   │   │   ├── sub-01_ses-1_task-rest_acq-fullbrain_run-1_bold.nii.gz
│   │   │   ├── sub-01_ses-1_task-rest_acq-fullbrain_run-1_physio.tsv.gz
│   │   │   ├── sub-01_ses-1_task-rest_acq-fullbrain_run-2_bold.nii.gz
│   │   │   ├── sub-01_ses-1_task-rest_acq-fullbrain_run-2_physio.tsv.gz
│   │   │   ├── sub-01_ses-1_task-rest_acq-prefrontal_bold.nii.gz
│   │   │   └── sub-01_ses-1_task-rest_acq-prefrontal_physio.tsv.gz
│   │   └── sub-01_ses-1_scans.tsv
│   ├── ses-2
│   │   ├── fmap
│   │   │   ├── sub-01_ses-2_run-1_magnitude1.nii.gz
│   │   │   ├── sub-01_ses-2_run-1_magnitude2.nii.gz
│   │   │   ├── sub-01_ses-2_run-1_phasediff.json
│   │   │   ├── sub-01_ses-2_run-1_phasediff.nii.gz
│   │   │   ├── sub-01_ses-2_run-2_magnitude1.nii.gz
│   │   │   ├── sub-01_ses-2_run-2_magnitude2.nii.gz
│   │   │   ├── sub-01_ses-2_run-2_phasediff.json
│   │   │   └── sub-01_ses-2_run-2_phasediff.nii.gz

We can initialize a grabbit Layout object like so:

from grabbit import Layout
config_file = 'my_config.json'
project_root = '/my_project' 
layout = Layout(project_root, config_file)

The Layout instance is a lightweight container for all of the files in the project directory. It automatically detects any entities found in the file paths, and allows us to perform simple but relatively powerful queries over the file tree. The entities are defined in a JSON configuration file (or explicitly added via add_entity() calls). For example, we might have "subject", "session", "run", and "type" entities defined as follows:

entities = [
    {
      "name": "subject",
      "pattern": "(sub-\\d+)",
      "directory": "{{root}}/{subject}",
    },
    {
      "name": "session",
      "pattern": "(ses-\\d)",
      "directory": "{{root}}/{subject}/{session}",
    },
    {
      "name": "run",
      "pattern": "(run-\\d+)"
    },
    {
      "name": "type",
      "pattern": ".*_(.*?)\\."
    }
]

In each case, the "name" key defines the name of the entity, and the "pattern" key defines the search path within each file. These are the only two mandatory keys. Notice that we use regex groups to define the unique ID to capture for each entity. This allows us to match files to entities, but keep only part of the match as the unique identifier (e.g., if we wanted to detect 'sub-05' as a subject, but keep only '05' as the subject ID, we could use the pattern "sub-(\d+)").

For entities where each instance is associated with a directory (e.g., each subject's data is contained in a single directory), we can also specify the full directory path. Notice that we can refer to other entities within the path--e.g., "{{root}}/{subject}/{session}" will substitute unique subject and session IDs when they are detected ({{root}} is a magic constant that is always replaced with the root directory of the project specified at Layout initialization).

Getting unique values and counts

Once we've initialized a Layout, we can do simple things like counting and listing all unique values of a given entity:

>>> layout.unique('subject')
['sub-09', 'sub-05', 'sub-08', 'sub-01', 'sub-02', 'sub-06', 'sub-04', 'sub-03', 'sub-07', 'sub-10']

>>> layout.count('run')
2

Querying and filtering

Counting is kind of trivial; everyone can count! More usefully, we can run simple logical queries, returning the results in a variety of formats:

>>> files = layout.get(subject='sub-0[12]', run=1, extensions='.nii.gz')
>>> files[0]
File(filename='sub-02/ses-1/fmap/sub-02_ses-1_run-1_magnitude1.nii.gz', subject='sub-02', run='run-1', session='ses-1', type='magnitude1')

>>> [f.path for f in files]
['sub-02/ses-2/fmap/sub-02_ses-2_run-1_phasediff.nii.gz',
 'sub-01/ses-2/func/sub-01_ses-2_task-rest_acq-fullbrain_run-1_bold.nii.gz',
 'sub-02/ses-1/fmap/sub-02_ses-1_run-1_phasediff.nii.gz',
 ...,
 ]

In the above snippet, we retrieve all files with subject id 1 or 2 and run id 1 (notice that any entity defined in the config file can be used a filtering argument), and with a file extension of .nii.gz. The returned result is a list of named tuples, one per file, allowing direct access to the defined entities as attributes.

Some other examples of get() requests:

>>> # Return all unique 'session' directories
>>> layout.get(target='session', return_type='dir')
['sub-08/ses-1',
 'sub-06/ses-2',
 'sub-01/ses-2',
 ...
 ]

>>> # Return a list of unique file types available for subject 1
>>> layout.get(target='type', return_type='id', subject=1)
['T1map', 'magnitude2', 'magnitude1', 'scans', 'bold', 'phasediff', 'T1w', 'physio']

For convenience, it's also possible to create getters for all entities when initializing the Layout, by passing dynamic_getters=True:

>>> layout = Layout(project_root, dynamic_getters=True)
>>> # Now we can call, e.g., get_subjects()
>>> layout.get_subjects()
['sub-09', 'sub-05', 'sub-08', 'sub-01', 'sub-02', 'sub-06', 'sub-04', 'sub-03', 'sub-07', 'sub-10']

Internally, the get_{entity}() methods are simply a partial function of the main get() method that sets target={entity}. So you can still pass all of the other arguments (e.g., to filter subjects by any of the other entities or return subject directories rather than unique IDs by specifying return_type='dir').

By default, .get() calls will return either absolute or relative paths, with behavior dictated by the project root passed in when the Layout was created (i.e., if an absolute project root was provided, returned paths will also be absolute, and similarly for relative project roots). You can force the Layout to always return absolute paths by setting Layout(absolute_paths=True).

For everything else, there's pandas

If you want to run more complex queries, grabbit provides an easy way to return the full project tree (or a subset of it) as a pandas DataFrame:

# Return all session 1 files as a pandas DF
>>> layout.as_data_frame(session=1)

Each row is a single file, and each defined entity is automatically mapped to a column in the DataFrame.

grabbit's People

Contributors

tyarkoni avatar qmac avatar jbpoline avatar effigies avatar adelavega avatar valhayot avatar yarikoptic avatar musicinmybrain avatar kleinschmidt avatar duncanmmacleod avatar leej3 avatar chrisgorgo avatar paulineroca avatar

Stargazers

Jon Clucas avatar Remington Mallett avatar Lea Waller avatar tom avatar Niklas Wilming avatar Ankur Sinha avatar Philip avatar Ferran Jorba avatar Jérémy Guillon avatar Eshin Jolly avatar Beau Sievers avatar  avatar John Pellman avatar

Watchers

 avatar  avatar  avatar James Cloos avatar

grabbit's Issues

Handle entity names that conflict with reserved keywords

In cases where an Entity uses a reserved keyword as its name (e.g., class), exceptions can occur for some .get() queries (e.g., when return_type='tuple', because reserved keywords can't be attributes). We need to find some workaround for this--e.g., setting a different name internally, or adopting a convention of appending underscores, etc. See bids-standard/pybids#142 for relevant discussion.

"parse_file_entities" unintended behavior with domain = None

When domain is not specified to parse_file_entities, there are a few bugs

target = '/datasets/SherlockMerlin/derivatives/fmriprep/sub-37/func/sub-37_task-MerlinMovie_run-1_bold_space-MNI152NLin2009cAsym_preproc.nii.gz'   
bids_dir = '/datasets/SherlockMerlin'
preproc_dir = '/datasets/SherlockMerlin/derivatives/fmriprep`  
layout = BIDSLayout(bids_dir, config=[('bids', bids_dir), ('derivatives', preproc_dir)])  

The following both return the same output:

layout.files[fname].entities 
layout.parse_file_entities(target, domains=['bids'])  
{'subject': '37',  'task': 'MerlinMovie' 'run': '1', 'type': 'preproc', 'modality': 'func'}  

So far so good, but:

layout.parse_file_entities(target, domains=None)  

First, this will fail because it tries to loop over None domains. Fixing that however, the result is {}. Because of how this function works, however, the original file will be ovewritten in the file index, so calling with domains=['bids'] afterwards will also return {}.

Probably the easy fix here is to default to a domain, and not allow None, since that will simply create an empty file object and replace the original contents in the index.

note: GPL licensed code

Just mentioned that grabbit/external/inflect.py is a borrowed code which is released under GPL-3+. Although nothing "illegal" since grabbit code base is under MIT which is compatible with GPL, in effect it makes the entire grabbit fall under GPL as a whole when used. As such, it might cause then problems if someone decides to use grabbit, or using it pybids within some project which is not under GPL-compatible license.

Use a relational database

Grabbit started out as a minimalistic tool for managing/manipulating filenames, but we're now at the point where the internal model is getting too complicated to justify a pure Python solution. A lot of the queries are now complex enough that a SQL-based solution would be much more elegant and performant. I'll likely reimplement grabbit to use peewee internally in the next few days. The API should change minimally if at all--though I'll probably use the opportunity to introduce some other breaking changes (e.g., #57 ). Comments are welcome.

Files excluded from one domain are included in another

Say you have a Layout with nested folders: /domain and /domain/sub-domain.
If you use a layout.json file to exclude files from the sub-domain (e.g. /domain/sub-domain/layout.json), although the files will be excluded from the sub-domains file list, they will be included in the broader scoped domain. Thus, if you use .get, you will be querying all files, included those excluded.

IMO, if a sub-domain is nested within a another one, it should manage the files within than domain, and they should be automatically excluded from the broader domain, to avoid conflicts.

pypi package is broken

In [2]: from grabbit import Layout
---------------------------------------------------------------------------
ImportError                               Traceback (most recent call last)
<ipython-input-2-e10b6e141794> in <module>()
----> 1 from grabbit import Layout

/Users/filo/anaconda3/lib/python3.5/site-packages/grabbit-0.0.2-py3.5.egg/grabbit/__init__.py in <module>()
----> 1 from .core import *

/Users/filo/anaconda3/lib/python3.5/site-packages/grabbit-0.0.2-py3.5.egg/grabbit/core.py in <module>()
      3 import re
      4 from collections import defaultdict, OrderedDict, namedtuple
----> 5 from grabbit.external import string_types, inflect
      6 from os.path import join, exists, basename, dirname
      7 import os

/Users/filo/anaconda3/lib/python3.5/site-packages/grabbit-0.0.2-py3.5.egg/grabbit/external/__init__.py in <module>()
      1 from six import string_types
----> 2 import inflect

ImportError: No module named 'inflect'

Return paths with consistent separators

Currently grabbit returns paths with some amount of backslashes

D:/OpenfMRI/ds114_with_dates/sub-08\ses-retest\func\sub-08_ses-retest_task-covertverbgeneration_bold.nii.gz

instead of:

D:\OpenfMRI\ds114_with_dates\sub-08\ses-retest\func\sub-08_ses-retest_task-covertverbgeneration_bold.nii.gz
on Windows
and
/data/OpenfMRI/ds114_with_dates/sub-08/ses-retest/func/sub-08_ses-retest_task-covertverbgeneration_bold.nii.gz
on Linux/OS X

Hierarchical specifications

Currently grabbit assumes that all folders/files below the project root obey the same spec. For a variety of reasons, it would be good to allow hierarchical specifications, so that on the initial scan, each folder would be checked for its own .json spec, and if one is found, the entities defined within that file will be used for any files below. Internally, it's probably best to handle this by initializing multiple Layout objects and maintaining a common index across them (to enable returning of files that match entities shared across specs).

add 'object' option for `return_type` argument in `get()`

Strangely, there's no current option to retrieve the original File object when specifying the return_type argument in Layout.get(). This is kind of silly, as there are all kinds of cases where a user might prefer to have a File rather than a namedtuple containing entities as properties (which is what's currently returned by the default return_type='tuple'. It would be trivial to add an 'obj' or 'object' option that returns the File objects directly instead of converting them to namedtuples first.

Extract entities from filename without updating Layout

Currently there's no way to extract entities from a File without updating the Layout instance and associated Entity objects. This is a glaring oversight; we should expose something like an .index_file method that returns entities without mutating the Layout or its properties in any way.

Fix build_path to look in domains

Currently, build_path is broken, because the path_patterns in the config files, which used to be stored in the Layout instance, are now tied to individual Domain objects that build_path doesn't check. As a result, Layout.path_patterns is always an empty list. We should add a domains argument to build_path that extracts the specified path_patterns from the passed domains, or uses all available domains if domains=None. See write_contents_to_file for a similar implementation, and bids-standard/pybids#213 for more context.

Write out, and reconstruct Layout from, index file

Currently grabbit constructs the full file index at initialization time, after scanning the entire project directory. This is inefficient, and also rules out important use cases (e.g., in DataLad) where there's a need to run queries on a project even though the file names and/or contents aren't locally available. We should add methods to (a) write out a generic, plaintext representation of the internal index, and (b) allow construction of the Layout object from a saved file.

Add default value to path pattern segments

When building paths, it would be extremely helpful to be able to assign default values to path pattern segments (i.e., if the entity isn't explicitly passed, a default value is plugged into the resulting path).

Allow Layout merging

There may occasionally be a need to merge multiple Layout objects into a single Layout. Practically, this shouldn't entail doing much other than updating the File and Entity containers of the first passed Layout with the corresponding objects from later Layouts. I've already written a working implementation of this for pybids, but it should probably be cleaned up a bit (there may be side effects I haven't thought about) and moved to grabbit.

Failure during tests

Hello, I am looking to package your software for Gentoo Linux.

I am having issues with one test in particular (see end of log):

>>> Emerging (1 of 1) dev-python/grabbit-0.1.0::local
 * grabbit-0.1.0.tar.gz BLAKE2B SHA512 size ;-) ...                                                      [ ok ]
>>> Unpacking source...
>>> Unpacking grabbit-0.1.0.tar.gz to /var/tmp/portage/dev-python/grabbit-0.1.0/work
>>> Source unpacked in /var/tmp/portage/dev-python/grabbit-0.1.0/work
>>> Preparing source in /var/tmp/portage/dev-python/grabbit-0.1.0/work/grabbit-0.1.0 ...
>>> Source prepared.
>>> Configuring source in /var/tmp/portage/dev-python/grabbit-0.1.0/work/grabbit-0.1.0 ...
>>> Source configured.
>>> Compiling source in /var/tmp/portage/dev-python/grabbit-0.1.0/work/grabbit-0.1.0 ...
 * python2_7: running distutils-r1_run_phase distutils-r1_python_compile
/usr/bin/python2.7 setup.py build
running build
running build_py
creating /var/tmp/portage/dev-python/grabbit-0.1.0/work/grabbit-0.1.0-python2_7/lib/grabbit
copying grabbit/utils.py -> /var/tmp/portage/dev-python/grabbit-0.1.0/work/grabbit-0.1.0-python2_7/lib/grabbit
copying grabbit/__init__.py -> /var/tmp/portage/dev-python/grabbit-0.1.0/work/grabbit-0.1.0-python2_7/lib/grabbit
copying grabbit/core.py -> /var/tmp/portage/dev-python/grabbit-0.1.0/work/grabbit-0.1.0-python2_7/lib/grabbit
creating /var/tmp/portage/dev-python/grabbit-0.1.0/work/grabbit-0.1.0-python2_7/lib/grabbit/extensions
copying grabbit/extensions/writable.py -> /var/tmp/portage/dev-python/grabbit-0.1.0/work/grabbit-0.1.0-python2_7/lib/grabbit/extensions
copying grabbit/extensions/__init__.py -> /var/tmp/portage/dev-python/grabbit-0.1.0/work/grabbit-0.1.0-python2_7/lib/grabbit/extensions
copying grabbit/extensions/hdfs.py -> /var/tmp/portage/dev-python/grabbit-0.1.0/work/grabbit-0.1.0-python2_7/lib/grabbit/extensions
creating /var/tmp/portage/dev-python/grabbit-0.1.0/work/grabbit-0.1.0-python2_7/lib/grabbit/external
copying grabbit/external/inflect.py -> /var/tmp/portage/dev-python/grabbit-0.1.0/work/grabbit-0.1.0-python2_7/lib/grabbit/external
copying grabbit/external/six.py -> /var/tmp/portage/dev-python/grabbit-0.1.0/work/grabbit-0.1.0-python2_7/lib/grabbit/external
copying grabbit/external/__init__.py -> /var/tmp/portage/dev-python/grabbit-0.1.0/work/grabbit-0.1.0-python2_7/lib/grabbit/external
warning: build_py: byte-compiling is disabled, skipping.

 * python3_4: running distutils-r1_run_phase distutils-r1_python_compile
/usr/bin/python3.4 setup.py build
running build
running build_py
creating /var/tmp/portage/dev-python/grabbit-0.1.0/work/grabbit-0.1.0-python3_4/lib/grabbit
copying grabbit/utils.py -> /var/tmp/portage/dev-python/grabbit-0.1.0/work/grabbit-0.1.0-python3_4/lib/grabbit
copying grabbit/__init__.py -> /var/tmp/portage/dev-python/grabbit-0.1.0/work/grabbit-0.1.0-python3_4/lib/grabbit
copying grabbit/core.py -> /var/tmp/portage/dev-python/grabbit-0.1.0/work/grabbit-0.1.0-python3_4/lib/grabbit
creating /var/tmp/portage/dev-python/grabbit-0.1.0/work/grabbit-0.1.0-python3_4/lib/grabbit/extensions
copying grabbit/extensions/writable.py -> /var/tmp/portage/dev-python/grabbit-0.1.0/work/grabbit-0.1.0-python3_4/lib/grabbit/extensions
copying grabbit/extensions/__init__.py -> /var/tmp/portage/dev-python/grabbit-0.1.0/work/grabbit-0.1.0-python3_4/lib/grabbit/extensions
copying grabbit/extensions/hdfs.py -> /var/tmp/portage/dev-python/grabbit-0.1.0/work/grabbit-0.1.0-python3_4/lib/grabbit/extensions
creating /var/tmp/portage/dev-python/grabbit-0.1.0/work/grabbit-0.1.0-python3_4/lib/grabbit/external
copying grabbit/external/inflect.py -> /var/tmp/portage/dev-python/grabbit-0.1.0/work/grabbit-0.1.0-python3_4/lib/grabbit/external
copying grabbit/external/six.py -> /var/tmp/portage/dev-python/grabbit-0.1.0/work/grabbit-0.1.0-python3_4/lib/grabbit/external
copying grabbit/external/__init__.py -> /var/tmp/portage/dev-python/grabbit-0.1.0/work/grabbit-0.1.0-python3_4/lib/grabbit/external
warning: build_py: byte-compiling is disabled, skipping.

>>> Source compiled.
>>> Test phase: dev-python/grabbit-0.1.0
 * python2_7: running distutils-r1_run_phase python_test
============================================= test session starts ==============================================
platform linux2 -- Python 2.7.14, pytest-3.2.2, py-1.4.34, pluggy-0.4.0 -- /usr/bin/python2.7
cachedir: .cache
rootdir: /var/tmp/portage/dev-python/grabbit-0.1.0/work/grabbit-0.1.0, inifile:
plugins: cython-0.1.0
collecting ... collected 37 items

grabbit/tests/test_core.py::TestFile::test_init PASSED
grabbit/tests/test_core.py::TestFile::test_matches PASSED
grabbit/tests/test_core.py::TestFile::test_named_tuple PASSED
grabbit/tests/test_core.py::TestEntity::test_init PASSED
grabbit/tests/test_core.py::TestEntity::test_matches PASSED
grabbit/tests/test_core.py::TestEntity::test_unique_and_count PASSED
grabbit/tests/test_core.py::TestEntity::test_add_file PASSED
grabbit/tests/test_core.py::TestLayout::test_init[local] PASSED
grabbit/tests/test_core.py::TestLayout::test_absolute_paths[local] PASSED
grabbit/tests/test_core.py::TestLayout::test_querying[local] PASSED
grabbit/tests/test_core.py::TestLayout::test_natsort[local] PASSED
grabbit/tests/test_core.py::TestLayout::test_unique_and_count[local] PASSED
grabbit/tests/test_core.py::TestLayout::test_get_nearest[local] PASSED
grabbit/tests/test_core.py::TestLayout::test_index_regex[local] FAILED
grabbit/tests/test_core.py::TestLayout::test_save_index[local] PASSED
grabbit/tests/test_core.py::TestLayout::test_load_index[local] PASSED
grabbit/tests/test_core.py::TestLayout::test_clone[local] PASSED
grabbit/tests/test_core.py::test_merge_layouts[local] PASSED
grabbit/tests/test_core.py::TestLayout::test_init[hdfs] SKIPPED
grabbit/tests/test_core.py::TestLayout::test_absolute_paths[hdfs] SKIPPED
grabbit/tests/test_core.py::TestLayout::test_querying[hdfs] SKIPPED
grabbit/tests/test_core.py::TestLayout::test_natsort[hdfs] SKIPPED
grabbit/tests/test_core.py::TestLayout::test_unique_and_count[hdfs] SKIPPED
grabbit/tests/test_core.py::TestLayout::test_get_nearest[hdfs] SKIPPED
grabbit/tests/test_core.py::TestLayout::test_index_regex[hdfs] SKIPPED
grabbit/tests/test_core.py::TestLayout::test_save_index[hdfs] SKIPPED
grabbit/tests/test_core.py::TestLayout::test_load_index[hdfs] SKIPPED
grabbit/tests/test_core.py::TestLayout::test_clone[hdfs] SKIPPED
grabbit/tests/test_core.py::test_merge_layouts[hdfs] SKIPPED
grabbit/tests/test_core.py::TestLayout::test_dynamic_getters[/var/tmp/portage/dev-python/grabbit-0.1.0/work/grabbit-0.1.0/grabbit/tests/data/7t_trt-/var/tmp/portage/dev-python/grabbit-0.1.0/work/grabbit-0.1.0/grabbit/tests/specs/test.json] PASSED
grabbit/tests/test_core.py::TestLayout::test_dynamic_getters[hdfs://localhost:9000/grabbit/test/data/7t_trt-hdfs://localhost:9000/grabbit/test/specs/test.json] SKIPPED
grabbit/tests/test_core.py::TestLayout::test_entity_mapper PASSED
grabbit/tests/test_extensions.py::TestWritableFile::test_build_path PASSED
grabbit/tests/test_extensions.py::TestWritableFile::test_build_file ERROR
grabbit/tests/test_extensions.py::TestWritableLayout::test_write_files PASSED
grabbit/tests/test_extensions.py::TestWritableLayout::test_write_contents_to_file PASSED
grabbit/tests/test_extensions.py::TestWritableLayout::test_write_contents_to_file_defaults PASSED

==================================================== ERRORS ====================================================
______________________________ ERROR at setup of TestWritableFile.test_build_file ______________________________
file /var/tmp/portage/dev-python/grabbit-0.1.0/work/grabbit-0.1.0/grabbit/tests/test_extensions.py, line 45
      def test_build_file(self, writable_file, tmpdir, caplog):
E       fixture 'caplog' not found
>       available fixtures: cache, capfd, capsys, doctest_namespace, monkeypatch, pytestconfig, record_xml_property, recwarn, tmpdir, tmpdir_factory, writable_file
>       use 'pytest --fixtures [testpath]' for help on them.

/var/tmp/portage/dev-python/grabbit-0.1.0/work/grabbit-0.1.0/grabbit/tests/test_extensions.py:45
=================================================== FAILURES ===================================================
______________________________________ TestLayout.test_index_regex[local] ______________________________________

self = <test_core.TestLayout instance at 0x7f099871a320>
bids_layout = <grabbit.core.Layout object at 0x7f0998093050>
layout_include = <grabbit.core.Layout object at 0x7f099808b250>

    def test_index_regex(self, bids_layout, layout_include):
        targ = os.path.join(bids_layout.root, 'derivatives', 'excluded.json')
        assert targ not in bids_layout.files
        targ = os.path.join(layout_include.root, 'models',
                            'excluded_model.json')
>       assert targ not in layout_include.files
E       AssertionError: assert '/var/tmp/portage/dev-python/grabbit-0.1.0/work/grabbit-0.1.0/grabbit/tests/data/7t_trt/models/excluded_model.json' not in {'/var/tmp/portage/dev-python/grabbit-0.1.0/work/grabbit-0.1.0/grabbit/tests/data/7t_trt/dataset_description.json': <g....1.0/work/grabbit-0.1.0/grabbit/tests/data/7t_trt/participants.tsv': <grabbit.core.File object at 0x7f09986d8ed0>, ...}
E        +  where {'/var/tmp/portage/dev-python/grabbit-0.1.0/work/grabbit-0.1.0/grabbit/tests/data/7t_trt/dataset_description.json': <g....1.0/work/grabbit-0.1.0/grabbit/tests/data/7t_trt/participants.tsv': <grabbit.core.File object at 0x7f09986d8ed0>, ...} = <grabbit.core.Layout object at 0x7f099808b250>.files

grabbit/tests/test_core.py:261: AssertionError
=========================== 1 failed, 23 passed, 12 skipped, 1 error in 0.45 seconds ===========================

Can you help me out?

Add type declaration to entity specification

This is an oversight on my part, but there should really be a field for type declarations in the entity specification. Those can then be enforced when entity values are first read in, ensuring that we avoid ambiguous cases (e.g., should '1' be read in as a str or int?).

For example:

{
    "entities": [
        {
            "name": "subject",
            "pattern": ".*sub-([a-zA-Z0-9]+)",
            "directory": "{{root}}/{subject}",
            "type": "str"
        }
    ]
}

Reconsider over-eager warning

I suspect this is due to a refactoring in which directories, and not domains, are the objects iterated over, but this warning is too-easily triggered:

grabbit/grabbit/core.py

Lines 446 to 450 in a4eb518

if name in self.domains:
msg = ("Domain with name '{}' already exists; returning existing "
"Domain configuration.".format(name))
warnings.warn(msg)
return self.domains[name]

For example:

layout = gb.BIDSLayout([(bids_dir, 'bids'), (preproc_dir, ['bids', 'derivatives'])])

Because the 'bids' domain appears twice, this is warning is displayed. However, I expect this to be very common among multi-root use cases, so we should reconsider this warning.

I see three possibilities:

  1. Re-think the potentially conflicting behavior, given the new invocation patterns, and trigger in that case.
  2. Do some work to predict whether an actual conflict may arise, and trigger only if that is the case.
  3. Remove the warning because there are not actual conflicts.

Failing tests

The tests currently fail. I've tested it out in a Fedora build, and in a fresh virtual environment:

(ins)[asinha@ankur  grabbit(master=)]$ git status
On branch master
Your branch is up to date with 'origin/master'.

nothing to commit, working tree clean

(ins)[asinha@ankur  grabbit(master=)]$ python3 -m venv ~/dump/grabbit-virt
(ins)[asinha@ankur  grabbit(master=)]$ source ~/dump/grabbit-virt/bin/activate
(ins)(grabbit-virt) [asinha@ankur  grabbit(master=)]$ python setup.py install
running install
running bdist_egg
running egg_info
creating grabbit.egg-info
writing grabbit.egg-info/PKG-INFO
writing dependency_links to grabbit.egg-info/dependency_links.txt
writing top-level names to grabbit.egg-info/top_level.txt
writing manifest file 'grabbit.egg-info/SOURCES.txt'
reading manifest file 'grabbit.egg-info/SOURCES.txt'
reading manifest template 'MANIFEST.in'
writing manifest file 'grabbit.egg-info/SOURCES.txt'
installing library code to build/bdist.linux-x86_64/egg
running install_lib
running build_py
creating build
creating build/lib
creating build/lib/grabbit
copying grabbit/_version.py -> build/lib/grabbit
copying grabbit/core.py -> build/lib/grabbit
copying grabbit/utils.py -> build/lib/grabbit
copying grabbit/__init__.py -> build/lib/grabbit
creating build/lib/grabbit/external
copying grabbit/external/six.py -> build/lib/grabbit/external
copying grabbit/external/__init__.py -> build/lib/grabbit/external
copying grabbit/external/inflect.py -> build/lib/grabbit/external
creating build/lib/grabbit/extensions
copying grabbit/extensions/hdfs.py -> build/lib/grabbit/extensions
copying grabbit/extensions/__init__.py -> build/lib/grabbit/extensions
copying grabbit/extensions/writable.py -> build/lib/grabbit/extensions
UPDATING build/lib/grabbit/_version.py
set build/lib/grabbit/_version.py to '0.2.0'
creating build/bdist.linux-x86_64
creating build/bdist.linux-x86_64/egg
creating build/bdist.linux-x86_64/egg/grabbit
creating build/bdist.linux-x86_64/egg/grabbit/external
copying build/lib/grabbit/external/six.py -> build/bdist.linux-x86_64/egg/grabbit/external
copying build/lib/grabbit/external/__init__.py -> build/bdist.linux-x86_64/egg/grabbit/external
copying build/lib/grabbit/external/inflect.py -> build/bdist.linux-x86_64/egg/grabbit/external
copying build/lib/grabbit/_version.py -> build/bdist.linux-x86_64/egg/grabbit
copying build/lib/grabbit/core.py -> build/bdist.linux-x86_64/egg/grabbit
creating build/bdist.linux-x86_64/egg/grabbit/extensions
copying build/lib/grabbit/extensions/hdfs.py -> build/bdist.linux-x86_64/egg/grabbit/extensions
copying build/lib/grabbit/extensions/__init__.py -> build/bdist.linux-x86_64/egg/grabbit/extensions
copying build/lib/grabbit/extensions/writable.py -> build/bdist.linux-x86_64/egg/grabbit/extensions
copying build/lib/grabbit/utils.py -> build/bdist.linux-x86_64/egg/grabbit
copying build/lib/grabbit/__init__.py -> build/bdist.linux-x86_64/egg/grabbit
byte-compiling build/bdist.linux-x86_64/egg/grabbit/external/six.py to six.cpython-36.pyc
byte-compiling build/bdist.linux-x86_64/egg/grabbit/external/__init__.py to __init__.cpython-36.pyc
byte-compiling build/bdist.linux-x86_64/egg/grabbit/external/inflect.py to inflect.cpython-36.pyc
byte-compiling build/bdist.linux-x86_64/egg/grabbit/_version.py to _version.cpython-36.pyc
byte-compiling build/bdist.linux-x86_64/egg/grabbit/core.py to core.cpython-36.pyc
byte-compiling build/bdist.linux-x86_64/egg/grabbit/extensions/hdfs.py to hdfs.cpython-36.pyc
byte-compiling build/bdist.linux-x86_64/egg/grabbit/extensions/__init__.py to __init__.cpython-36.pyc
byte-compiling build/bdist.linux-x86_64/egg/grabbit/extensions/writable.py to writable.cpython-36.pyc
byte-compiling build/bdist.linux-x86_64/egg/grabbit/utils.py to utils.cpython-36.pyc
byte-compiling build/bdist.linux-x86_64/egg/grabbit/__init__.py to __init__.cpython-36.pyc
creating build/bdist.linux-x86_64/egg/EGG-INFO
copying grabbit.egg-info/PKG-INFO -> build/bdist.linux-x86_64/egg/EGG-INFO
copying grabbit.egg-info/SOURCES.txt -> build/bdist.linux-x86_64/egg/EGG-INFO
copying grabbit.egg-info/dependency_links.txt -> build/bdist.linux-x86_64/egg/EGG-INFO
copying grabbit.egg-info/top_level.txt -> build/bdist.linux-x86_64/egg/EGG-INFO
zip_safe flag not set; analyzing archive contents...
grabbit.external.__pycache__.six.cpython-36: module references __path__
creating dist
creating 'dist/grabbit-0.2.0-py3.6.egg' and adding 'build/bdist.linux-x86_64/egg' to it
removing 'build/bdist.linux-x86_64/egg' (and everything under it)
Processing grabbit-0.2.0-py3.6.egg
creating /home/asinha/dump/grabbit-virt/lib/python3.6/site-packages/grabbit-0.2.0-py3.6.egg
Extracting grabbit-0.2.0-py3.6.egg to /home/asinha/dump/grabbit-virt/lib/python3.6/site-packages
Adding grabbit 0.2.0 to easy-install.pth file

Installed /home/asinha/dump/grabbit-virt/lib/python3.6/site-packages/grabbit-0.2.0-py3.6.egg
Processing dependencies for grabbit==0.2.0
Finished processing dependencies for grabbit==0.2.0
(ins)(grabbit-virt) [asinha@ankur  grabbit(master=)]$ python setup.py test
running test
Searching for pytest>=3.3.0
Reading https://pypi.org/simple/pytest/
Downloading https://files.pythonhosted.org/packages/11/c4/cfb5f51f401cd54bbaaacff530c96827422a29dca2683ff314e4938444c9/pytest-3.6.2-py2.py3-none-any.whl#sha256=90898786b3d0b880b47645bae7b51aa9bbf1e9d1e4510c2cfd15dd65c70ea0cd
Best match: pytest 3.6.2
Processing pytest-3.6.2-py2.py3-none-any.whl
Installing pytest-3.6.2-py2.py3-none-any.whl to /home/asinha/Documents/02_Code/00_repos/01_others/grabbit/.eggs
writing requirements to /home/asinha/Documents/02_Code/00_repos/01_others/grabbit/.eggs/pytest-3.6.2-py3.6.egg/EGG-INFO/requires.txt

Installed /home/asinha/Documents/02_Code/00_repos/01_others/grabbit/.eggs/pytest-3.6.2-py3.6.egg
Searching for six>=1.10.0
Reading https://pypi.org/simple/six/
Downloading https://files.pythonhosted.org/packages/67/4b/141a581104b1f6397bfa78ac9d43d8ad29a7ca43ea90a2d863fe3056e86a/six-1.11.0-py2.py3-none-any.whl#sha256=832dc0e10feb1aa2c68dcc57dbb658f1c7e65b9b61af69048abc87a2db00a0eb
Best match: six 1.11.0
Processing six-1.11.0-py2.py3-none-any.whl
Installing six-1.11.0-py2.py3-none-any.whl to /home/asinha/Documents/02_Code/00_repos/01_others/grabbit/.eggs

Installed /home/asinha/Documents/02_Code/00_repos/01_others/grabbit/.eggs/six-1.11.0-py3.6.egg
Searching for py>=1.5.0
Reading https://pypi.org/simple/py/
Downloading https://files.pythonhosted.org/packages/f3/bd/83369ff2dee18f22f27d16b78dd651e8939825af5f8b0b83c38729069962/py-1.5.4-py2.py3-none-any.whl#sha256=e31fb2767eb657cbde86c454f02e99cb846d3cd9d61b318525140214fdc0e98e
Best match: py 1.5.4
Processing py-1.5.4-py2.py3-none-any.whl
Installing py-1.5.4-py2.py3-none-any.whl to /home/asinha/Documents/02_Code/00_repos/01_others/grabbit/.eggs

Installed /home/asinha/Documents/02_Code/00_repos/01_others/grabbit/.eggs/py-1.5.4-py3.6.egg
Searching for pluggy<0.7,>=0.5
Reading https://pypi.org/simple/pluggy/
Downloading https://files.pythonhosted.org/packages/ba/65/ded3bc40bbf8d887f262f150fbe1ae6637765b5c9534bd55690ed2c0b0f7/pluggy-0.6.0-py3-none-any.whl#sha256=e160a7fcf25762bb60efc7e171d4497ff1d8d2d75a3d0df7a21b76821ecbf5c5
Best match: pluggy 0.6.0
Processing pluggy-0.6.0-py3-none-any.whl
Installing pluggy-0.6.0-py3-none-any.whl to /home/asinha/Documents/02_Code/00_repos/01_others/grabbit/.eggs

Installed /home/asinha/Documents/02_Code/00_repos/01_others/grabbit/.eggs/pluggy-0.6.0-py3.6.egg
Searching for more-itertools>=4.0.0
Reading https://pypi.org/simple/more-itertools/
Downloading https://files.pythonhosted.org/packages/85/40/90c3b0393e12b9827381004224de8814686e3d7182f9d4182477f600826d/more_itertools-4.2.0-py3-none-any.whl#sha256=6703844a52d3588f951883005efcf555e49566a48afd4db4e965d69b883980d3
Best match: more-itertools 4.2.0
Processing more_itertools-4.2.0-py3-none-any.whl
Installing more_itertools-4.2.0-py3-none-any.whl to /home/asinha/Documents/02_Code/00_repos/01_others/grabbit/.eggs
writing requirements to /home/asinha/Documents/02_Code/00_repos/01_others/grabbit/.eggs/more_itertools-4.2.0-py3.6.egg/EGG-INFO/requires.txt

Installed /home/asinha/Documents/02_Code/00_repos/01_others/grabbit/.eggs/more_itertools-4.2.0-py3.6.egg
Searching for attrs>=17.4.0
Reading https://pypi.org/simple/attrs/
Downloading https://files.pythonhosted.org/packages/41/59/cedf87e91ed541be7957c501a92102f9cc6363c623a7666d69d51c78ac5b/attrs-18.1.0-py2.py3-none-any.whl#sha256=4b90b09eeeb9b88c35bc642cbac057e45a5fd85367b985bd2809c62b7b939265
Best match: attrs 18.1.0
Processing attrs-18.1.0-py2.py3-none-any.whl
Installing attrs-18.1.0-py2.py3-none-any.whl to /home/asinha/Documents/02_Code/00_repos/01_others/grabbit/.eggs
writing requirements to /home/asinha/Documents/02_Code/00_repos/01_others/grabbit/.eggs/attrs-18.1.0-py3.6.egg/EGG-INFO/requires.txt

Installed /home/asinha/Documents/02_Code/00_repos/01_others/grabbit/.eggs/attrs-18.1.0-py3.6.egg
Searching for atomicwrites>=1.0
Reading https://pypi.org/simple/atomicwrites/
Downloading https://files.pythonhosted.org/packages/0a/e8/cd6375e7a59664eeea9e1c77a766eeac0fc3083bb958c2b41ec46b95f29c/atomicwrites-1.1.5-py2.py3-none-any.whl#sha256=a24da68318b08ac9c9c45029f4a10371ab5b20e4226738e150e6e7c571630ae6
Best match: atomicwrites 1.1.5
Processing atomicwrites-1.1.5-py2.py3-none-any.whl
Installing atomicwrites-1.1.5-py2.py3-none-any.whl to /home/asinha/Documents/02_Code/00_repos/01_others/grabbit/.eggs

Installed /home/asinha/Documents/02_Code/00_repos/01_others/grabbit/.eggs/atomicwrites-1.1.5-py3.6.egg
running egg_info
writing grabbit.egg-info/PKG-INFO
writing dependency_links to grabbit.egg-info/dependency_links.txt
writing top-level names to grabbit.egg-info/top_level.txt
reading manifest file 'grabbit.egg-info/SOURCES.txt'
reading manifest template 'MANIFEST.in'
writing manifest file 'grabbit.egg-info/SOURCES.txt'
running build_ext
six (unittest.loader._FailedTest) ... ERROR
inflect (unittest.loader._FailedTest) ... ERROR
hdfs (unittest.loader._FailedTest) ... ERROR

======================================================================
ERROR: six (unittest.loader._FailedTest)
----------------------------------------------------------------------
ImportError: Failed to import test module: six
Traceback (most recent call last):
  File "/usr/lib64/python3.6/unittest/loader.py", line 153, in loadTestsFromName
    module = __import__(module_name)
ModuleNotFoundError: No module named 'grabbit.external.six.six'


======================================================================
ERROR: inflect (unittest.loader._FailedTest)
----------------------------------------------------------------------
ImportError: Failed to import test module: inflect
Traceback (most recent call last):
  File "/usr/lib64/python3.6/unittest/loader.py", line 153, in loadTestsFromName
    module = __import__(module_name)
ModuleNotFoundError: No module named 'grabbit.external.six.inflect'


======================================================================
ERROR: hdfs (unittest.loader._FailedTest)
----------------------------------------------------------------------
ImportError: Failed to import test module: hdfs
Traceback (most recent call last):
  File "/usr/lib64/python3.6/unittest/loader.py", line 153, in loadTestsFromName
    module = __import__(module_name)
  File "/home/asinha/Documents/02_Code/00_repos/01_others/grabbit/grabbit/extensions/hdfs.py", line 3, in <module>
    from hdfs import Config
ModuleNotFoundError: No module named 'hdfs'


----------------------------------------------------------------------
Ran 3 tests in 0.001s

FAILED (errors=3)
Test failed: <unittest.runner.TextTestResult run=3 errors=3 failures=0>
error: Test failed: <unittest.runner.TextTestResult run=3 errors=3 failures=0>

The last test is fixed when one installs hdfs, so that maybe needs to be added to the setup.py file. Not too sure about the other two tests yet.

Thanks :)

Domain root should default to directory containing config file

It looks like if no root is specified in a directory-specific domain (or if root == '.') then the root of that domain is set to the global (Layout-level) root. I'm confused by this behavior. I'd have intuitively thought that the domain's root would be the directory it's config file is taken from by default, and that relative paths would be relative to that directory, rather than the layout root. In fact, it seems to go against the purpose of the domains mechanisms to have to specify the relative path of the domain from the layout root in the domain-specific config file, when that information should be taken from the location of that file (by default). Am I mis-understanding something?

As an example, adapted from the examples notebook:

stamps = gb.Layout('../grabbit/tests/data/valuable_stamps/', absolute_paths=False, config_filename='dir_config.json', config='../grabbit/tests/specs/stamps.json')
[(k, d.root) for (k, d) in stamps.domains.items()]

gives

[('stamps', '../grabbit/tests/data/valuable_stamps/'),
 ('usa_stamps', '../grabbit/tests/data/valuable_stamps/')]

while I would have expected

[('stamps', '../grabbit/tests/data/valuable_stamps/'),
 ('usa_stamps', '../grabbit/tests/data/valuable_stamps/USA/')]

Add support for arbitrary functions when indexing files and entities

Per discussion with @yarikoptic, @mih, and @jbpoline, we should add support for arbitrary named functions in JSON config files that allow grabbit to hand a file off to some other function that then returns entity key/value pairs to include in the File's entity map. This will enable DataLad to wrap grabbit for all single-project indexing/querying, and generally make the package more useful in a variety of contexts.

Using lists of patterns for an entity.

Is it currently possible to pass a list of patterns to be searched for to an entity? The pybids BIDSLayout class inherits directly from Layout and does not override the get function. I was not able to pass a list of patterns to get successfully:

>>> from bids.grabbids import BIDSLayout
>>> root = 'fake_data/BIDS-examples-1-enh-ds054/ds054/'
>>> layout = BIDSLayout(root)
>>> len(layout.get(ext='.nii'))
0
>>> len(layout.get(ext='.nii.gz'))
16
>>> len(layout.get(ext=['.nii', '.nii.gz']))
0
>>> 

Are there any changes I should make to how I am calling get?

would fail to work on bids datasets which have a directory starting with . somewhere in their paths

Originally (https://github.com/datalad/datalad/issues/2150) I thought that the issue is symlinked directory somewhere in the path, so was trying to figure out where path realpath-ing is happening... long story short that apparently the issue with my TMPDIR=/home/yoh/.tmp (which is a symlink to /tmp) is having that .tmp/ in the path since ATM https://github.com/INCF/pybids/blob/master/bids/grabbids/config/bids.json#L4 leads to exclusion of the paths which have it anywhere within their path.

I wondered -- may be the relative path under the dataset root could be considered instead of the full path? (will submit a tentative fix in this vein)

'six' needs to bie added to install_requires in setup.py

 File "/opt/conda/lib/python3.5/site-packages/grabbit/__init__.py", line 1, in <module>
    from .core import *
  File "/opt/conda/lib/python3.5/site-packages/grabbit/core.py", line 5, in <module>
    from grabbit.external import string_types, inflect
  File "/opt/conda/lib/python3.5/site-packages/grabbit/external/__init__.py", line 1, in <module>
    from six import string_types

File path construction from entities

Following on discussion in the pybids repo (#63), we should add support for constructing a new path to a File object given its existing Entity values. This will require extending the JSON config spec to allow rules for defining new paths, including mandatory and optional fields.

Add test dataset that spans multiple non-overlapping directories

To avoid bugs like the one addressed in #73, we should modify one of the test datasets to include an additional directory outside the root of the existing structure. This will ensure that we implicitly run most/all tests against more complex projects that include multiple file hierarchies.

get_nearest() returns wrong values

In [11]: layout.get_nearest("/data/sub-100307/fmap/sub-100307_acq-EMOTIONLR_dir-1_epi.json")
Out[11]: '/data/sub-100307/fmap/sub-100307_acq-EMOTIONLR_dir-1_epi.json'

In [12]: layout.get_nearest("/data/sub-100307/fmap/sub-100307_acq-EMOTIONLR_dir-2_epi.json")
Out[12]: '/data/sub-100307/fmap/sub-100307_acq-EMOTIONLR_dir-1_epi.json'

In [14]: !ls -al /data/sub-100307/fmap/
total 10786
drwxr-xr-x 2 root root    4096 Oct  7  2016 .
drwxr-xr-x 2 root root       0 Oct  7  2016 ..
-rwxr-xr-x 1 root root     130 Sep  9  2016 sub-100307_acq-EMOTIONLR_dir-1_epi.json
-rwxr-xr-x 1 root root 4191491 Sep  8  2016 sub-100307_acq-EMOTIONLR_dir-1_epi.nii.gz
-rwxr-xr-x 1 root root     129 Oct  7  2016 sub-100307_acq-EMOTIONLR_dir-2_epi.json
-rwxr-xr-x 1 root root 4190583 Sep  8  2016 sub-100307_acq-EMOTIONLR_dir-2_epi.nii.gz
-rwxr-xr-x 1 root root  787669 Sep  3  2016 sub-100307_acq-forT1w_magnitude1.nii.gz
-rwxr-xr-x 1 root root  774702 Sep  3  2016 sub-100307_acq-forT1w_magnitude2.nii.gz
-rwxr-xr-x 1 root root      95 Sep  3  2016 sub-100307_acq-forT1w_phasediff.json
-rwxr-xr-x 1 root root 1080990 Sep  3  2016 sub-100307_acq-forT1w_phasediff.nii.gz

This is the cause of bids-standard/pybids#40

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.