neurodatawithoutborders / matnwb Goto Github PK
View Code? Open in Web Editor NEWA Matlab interface for reading and writing NWB files
License: BSD 2-Clause "Simplified" License
A Matlab interface for reading and writing NWB files
License: BSD 2-Clause "Simplified" License
It looks like the way to read data is to call DataStub.load
e.g. file.acquisition.get('lfp').data.load
, which reads the entire matrix. Is it possible to only read a section of the data? e.g. h5read
file generated with python:
from pynwb import NWBFile, NWBHDF5IO
from datetime import datetime
nwbfile = NWBFile('source', ' ', ' ',
datetime.now(), datetime.now(),
institution='University of California, San Francisco',
lab='Chang Lab')
nwbfile.add_trial({'start': 0.0, 'end': 1.0})
with NWBHDF5IO('test_trials.nwb', 'w') as io:
io.write(nwbfile)
generates a file you can download here
nwb=nwbRead('test_trials.nwb')
gives the following error:
Error using types.util.checkConstraint (line
15)
Property `tablecolumn.end` should be one of
type(s) { 'TableColumn' }.
Error in
types.util.checkSet>@(nm,val)types.util.checkConstraint(pname,nm,namedprops,constraints,val)
(line 10)
@(nm, val)types.util.checkConstraint(pname,
nm, namedprops, constraints, val));
Error in types.untyped.Set/validateAll (line
51)
obj.fcn(mk, obj.map(mk));
Error in types.util.checkSet (line 11)
val.validateAll();
Error in
types.core.DynamicTable/validate_tablecolumn
(line 61)
types.util.checkSet('tablecolumn',
struct(), constrained, val);
Error in
types.core.DynamicTable/set.tablecolumn (line
46)
obj.tablecolumn =
obj.validate_tablecolumn(val);
Error in types.core.DynamicTable (line 39)
obj.tablecolumn =
types.util.parseConstrained('types.core.NWBDataInterface',
'types.core.TableColumn', varargin{:});
Error in io.parseGroup (line 68)
parsed = eval([typename '(kwargs{:})']);
Error in io.parseGroup (line 26)
subg = io.parseGroup(filename, g_info);
Error in nwbRead (line 20)
nwb = io.parseGroup(filename, info);
without the line nwbfile.add_trial({'start': 0.0, 'end': 1.0})
in the python script, the matlab loading works as expected.
need to make sure shape/dim names are in the right order when you query them from matlab.
Now that we have the tests (99%) working, are there good options for setting up continuous integration? It would be great if we could have these tests run automatically on PRs so we can see if the merge would break any tests. Here's a MATLAB guide on setting it up using Jenkins. Is there a way of doing this for free?
What's going on??
It'd be really helpful to those like me trying to read & understand how the package works to have (Matlab-style) documentation for each class and function. Particularly with the common use of abbreviations for variable & function names, and the lack of comments, it's quite hard to follow the logic!
It is much less verbose to create structs in MATLAB than StructMaps, for example:
>> s.datasets.d1 = [1 2 3];
vs.
>> s = util.StructMap('datasets', util.StructMap('d1', [1 2 3]));
But it you pass the former into the Group constructor, it will accept it and fail later if you do a subsref or subsasgn:
>> s.datasets.d1 = [1 2 3];
>> g = types.untyped.Group(s);
>> g.d1
Reference to non-existent field 'map'.
Error in types.untyped.Group/findsubprop (line 124)
if ~isempty(obj.(pn)) && isKey(obj.(pn).map, nm)
Error in types.untyped.Group/subsref (line 46)
pn = findsubprop(obj, mainsub);
A good practice might be, rather than attempting to account for struct vs StructMap throughout the code, convert all structs to StructMaps when they are received from the user. Then the rest of the code can assume everything is a StructMap.
generateCore('schema/core/nwb.namespace.yaml')
error:
The class file.Dataset has no Constant property or Static method named 'procdims'.
Error in file.procdims (line 15)
[subsz, subnm] = file.Dataset.procdims(dimopt, shapeopt);
Error in file.Dataset (line 74)
[obj.shape, obj.dimnames] = file.procdims(dims, shape);
Error in file.Group (line 105)
ds = file.Dataset(datasetiter.next());
Error in file.fillClass>processClass (line 120)
class = file.Group(node);
Error in file.fillClass (line 6)
[processed, classprops, inherited] = processClass(name, namespace, pregen);
Error in file.writeNamespace (line 13)
fwrite(fid, file.fillClass(k, namespace, pregenerated), 'char');
Error in generateCore (line 30)
file.writeNamespace(namespaceMap('core'));
The schema has a new table called Units which cannot be read by matnwb.
nwbRead('test_units.nwb')
Error using types.util.checkDtype (line 58)
Property `id` must be a types.core.ElementIdentifiers.
Error in types.core.DynamicTable/validate_id (line 56)
val = types.util.checkDtype('id',
'types.core.ElementIdentifiers', val);
Error in types.core.DynamicTable/set.id (line 42)
obj.id = obj.validate_id(val);
Error in types.core.DynamicTable (line 37)
obj.id = p.Results.id;
Error in io.parseGroup (line 68)
parsed = eval([typename '(kwargs{:})']);
Error in io.parseGroup (line 26)
subg = io.parseGroup(filename, g_info);
Error in nwbRead (line 20)
nwb = io.parseGroup(filename, info);
The generated (written by writeClass) class constructor currently ignores unknown key-value pairs. It should probably error (or at least warn) to alert the user:
>> ts = types.TimeSeries('unknown_key', 2); % unknown_key is simply ignored
It is very easy to misspell a key and think you set the value but you did not:
>> ts = types.TimeSeries('decription', 2); % meant to set description but there is a type-o
Filtered test:
>> nwbtest('ProcedureName', 'testWriteClassConstructorThrowsOnUnknownArgument')
Is there any way to use matnwb to tell HDF5 to apply compression to a dataset?
Breaking it down, the following will be needed for implementation:
DataStub
work for this?The writeClass and parseClass functions are long and contain a lot of functionality, which makes targeted unit testing more difficult. There are private functions in these files that make the code more readable, but these functions cannot be called directly for testing. Might want to consider breaking apart these functions into smaller function files (writeDataset, writeAttribute, writeLink, etc.)
Main example being the trials
Group under NWBFile.
- doc: 'Data about experimental trials'
name: trials
neurodata_type_inc: DynamicTable
quantity: '?'
datasets:
- doc: The start time of each trial
attributes:
- name: description
value: the start time of each trial
dtype: text
doc: Value is 'the start time of each trial'
name: start
neurodata_type_inc: TableColumn
dtype: float
- doc: The end time of each trial
attributes:
- name: description
value: the end time of each trial
dtype: text
doc: Value is 'the end time of each trial'
name: end
neurodata_type_inc: TableColumn
dtype: float
Expected behavior should be that NWBFile checks for the class along with any additions to the class itself.
At the current moment, only the class validation occurs.
This isn't necessarily breaking but validation is weaker, and this allows users to break from schema because matnwb never properly checks for these context-dependent values on export.
in the code
dev = types.core.Device( ...
'source', ns_path);
file.general_devices.set('dev1', dev);
eg = types.core.ElectrodeGroup('source', ns_path, ...
'description', 'a test ElectrodeGroup', ...
'location', 'unknown', ...
'device', types.untyped.SoftLink('/general/devices/dev1'));
file.general_extracellular_ephys.set('elec1', eg);
I don't like that you have to know the path of dev1
in order to link it. I think that makes it hard for new users that might not know the hierarchical structure of nwb-schema. How would you feel about having file.general_devices.set
optionally output the SoftLink
object if an output is requested?
so the new syntax would be
dev = types.core.Device( ...
'source', ns_path);
dev_link = file.general_devices.set('dev1', dev);
eg = types.core.ElectrodeGroup('source', ns_path, ...
'description', 'a test ElectrodeGroup', ...
'location', 'unknown', ...
'device', dev_link);
file.general_extracellular_ephys.set('elec1', eg);
I made this for myself. Any interest in adding it to this project, or should I make this type of stuff in a separate repo? I think this would work nicely as a method of TimeSeries (e.g. LFP.load
), but I'm not sure the best way to do that since it's generated code.
function data = loadTimeSeriesData(timeseries, interval, downsample_factor, electrode)
%LOADTIMESERIESDATA loads data within a time interval from a timeseries
%
% DATA = loadTimeSeriesData(TIMESERIES, INTERVAL, DOWNSAMPLE_FACTOR)
% TIMESERIES: matnwb TimeSeries object
% INTERVAL: [start end] in seconds
% DOWNSAMPLE_FACTOR: default = 1
% ELECTRODE: detault = [] (all electrodes). Takes a 1-indexed integer,
% (NOT AN ARRAY)
% Works whether timestamps or starting_time & rate are stored. Assumes
% timestamps are sorted in ascending order.
if ~exist('interval','var')
interval = [0 Inf];
end
if ~exist('downsample_factor','var') || isempty(downsample_factor)
downsample_factor = 1;
end
if ~exist('electrode','var')
electrode = [];
end
dims = timeseries.data.dims;
if interval(1)
if isempty(timeseries.starting_time)
start_ind = fastsearch(timeseries.timestamps, interval(1), 1);
else
fs = timeseries.starting_time_rate;
t0 = timeseries.starting_time;
if interval(1) < t0
error('interval bounds outside of time range');
end
start_ind = (interval(1) - t0) * fs;
end
else
start_ind = 1;
end
if isfinite(interval(2))
if isempty(timeseries.starting_time)
end_ind = fastsearch(timeseries.timestamps, interval(2), -1);
else
fs = timeseries.starting_time_rate;
t0 = timeseries.starting_time;
if interval(2) > (dims(1) * fs + t0)
error('interval bounds outside of time range');
end
end_ind = (interval(2) - t0) * fs;
end
else
end_ind = Inf;
end
start = ones(1, length(dims));
start(end) = start_ind;
count = fliplr(dims);
count(end) = floor((end_ind - start_ind) / downsample_factor);
if ~isempty(electrode)
start(end-1) = electrode;
count(end-1) = 1;
end
if downsample_factor == 1
data = timeseries.data.load(start, count)';
else
stride = ones(1, length(dims));
stride(end) = downsample_factor;
data = timeseries.data.load(start, count, stride)';
end
Why is cellstr used for text dtypes? It seems like a string would be more natural.
For example, why do I have to do this:
>> ts = types.TimeSeries;
>> ts.source = {'hello'}
Instead of this:
>> ts.source = 'hello'
For OS compatibility's sake.
Attempting to get the value of a field in a StructMap array produces an error:
Error in util.StructMap/subsref (line 37)
o = obj.map(subv);
Failing test:
>> nwbtest('ProcedureName', 'testStructMapArrayDotSubsref')
I'm trying to access linked data, e.g. lfp.electrodes.data
which gives:
RegionView with properties:
path: '/general/extracellular_ephys/electrodes'
view: [1×1 types.untyped.ObjectView]
region: {[1 338]}
type: 'H5T_STD_REF_DSETREG'
reftype: 'H5R_DATASET_REGION'
but I don't see any methods to actually follow that link to the data. Is there a convenient way to do this?
generateCore('schema/core/nwb.namespace.yaml');
Error using save
Cannot create 'core.mat' because 'namespaces' does not exist.
Error in generateCore (line 23)
save(fullfile('namespaces','core.mat'), '-struct', 'cs');
you can fix this by creating a namespaces directory
Assigning IDs to warnings makes it possible to suppress them without suppressing all warnings. Assigning IDs to errors makes it easier to distinguish between errors thrown by the code. Not having error IDs, I cannot easily test, for instance, if a particular action causes a particular error.
Some of the warnings and errors in the code already have something that looks like an ID, but its in the message:
error('Group:subsasgn: please specify whether this numeric value is in ''attributes'' or ''datasets''')
These should be pulled out of the message and assigned to the message ID:
error('Group:subsasgn', 'Please specify whether this numeric value is in ''attributes'' or ''datasets''')
Or a warning/error ID naming strategy should be defined for the project.
Datasets are transposed after being written to disk by nwbExport and read back by nwbRead:
Actual Value:
0 1 2 3 4 5 6 7 8 9
Expected Value:
0
1
2
3
4
5
6
7
8
9
I believe this is because nwbExport transposes the matrices but nwbRead does not transpose them back.
Failing test:
>> nwbtest('Name', 'tests.system.TimeSeriesIOTest/testRoundTrip')
The spec says the Attribute.value key should be assigned a constant value for an attribute (see http://schema-language.readthedocs.io/en/latest/specification_language_description.html#value ). However, it is mapped as only a default value for a property that can be changed.
Failing test:
>> nwbtest('ProcedureName', 'testWriteAttributeWithValue')
Doesn't Break IO but will allow table columns not matching the schema.
All Groups created share the same StructMaps for property values. This causes a lot of strange behavior, for example:
>> g1 = types.untyped.Group();
>> g1.g2 = types.untyped.Group();
>> g3 = types.untyped.Group()
g3 =
Group with properties:
attributes: {}
datasets: {}
links: {}
groups: {'g2'}
classes: {}
See: https://www.mathworks.com/help/matlab/matlab_oop/properties-containing-objects.html. Properties assigned object default values should be assigned in the constructor (instead of in the property block).
Failing tests:
>> nwbtest('ProcedureName', 'testGroupConstructorDoesNotCarryState')
When a dataset has a text dtype and a multi-dimensional shape with a null dimension, the validation function will error when the generated class is instantiated.
For example, ElectrodeGroup has a channel_coordinates dataset with the following spec:
name: channel_coordinates
shape:
- null
- 3
Trying to instantiate the generated class:
>> types.ElectrodeGroup
Error using types.ElectrodeGroup/validate_channel_coordinates (line 147)
ElectrodeGroup.channel_coordinates: val must have shape [~,3]
Error in types.ElectrodeGroup/set.channel_coordinates (line 62)
obj.channel_coordinates = validate_channel_coordinates(obj, val);
Error in types.ElectrodeGroup (line 54)
obj.(field) = p.Results.(field);
Failing test:
>> nwbtest('ProcedureName', 'testWriteDatasetWithNullShape')
PyNWB only writes datasets to HDF5 if they have been set. MatNWB writes all datasets to disk even if they have not been assigned/set by the user or a default value. I'm not sure if this discrepancy will cause issues in the future but it is inefficient to create these empty datasets and it means you have a bunch of empty datasets crowding up the HDF5 file (not great when viewed in HDFView for instance).
Tab completion is an important part of discovery (for me at least). However, with all the subsref and properties trickiness, tab completion no longer works when you're navigating deeply within a hierarchy:
>> f = nwbfile;
>> ts = types.TimeSeries;
>> f.acquisition.timeseries = types.untyped.Group;
>> f.acquisition.timeseries.test_timeseries = ts;
>> f.acquisition.timeseries. % Tab completion says "No Completions Found" here
It would be great if this worked. It may not be possible but maybe someone can investigate it.
There's a new feature in pynwb that caches the specs used to generate the file within the file itself. Here's an example:
from pynwb import NWBFile, NWBHDF5IO
from datetime import datetime
nwbfile = NWBFile('source', ' ', ' ',
datetime.now(), datetime.now(),
institution='University of California, San Francisco',
lab='Chang Lab')
with NWBHDF5IO('test_cache.nwb', 'w') as io:
io.write(nwbfile, cache_spec=True)
Here is the produced file.
If only the core is used it caches the text of core.namespace.yaml
. If extensions are used, it also stores those. This allows users to automatically load data even if the data is defined by an external spec and allows the extensions so be passed with the data without worrying about versions of extensions, etc.
Would it be possible to read these specs and automatically generate the defined classes?
Here's an example with an extension:
from pynwb.spec import NWBDatasetSpec, NWBNamespaceBuilder, NWBGroupSpec
name = 'test'
ns_path = name + '.namespace.yaml'
ext_source = name + '.extensions.yaml'
test_obj = NWBGroupSpec(
neurodata_type_def='TestObj',
doc='test object',
datasets=[NWBDatasetSpec(name='filler_data', doc='data doc', dtype='int',
shape=(None, 1))],
neurodata_type_inc='NWBDataInterface')
ns_builder = NWBNamespaceBuilder(name + ' extensions', name)
ns_builder.add_spec(ext_source, test_obj)
ns_builder.export(ns_path)
from datetime import datetime
from pynwb import load_namespaces, register_class, NWBFile, NWBHDF5IO
from pynwb.core import NWBDataInterface
from pynwb.form.utils import docval, popargs
from collections import Iterable
# load custom classes
load_namespaces(ns_path)
@register_class('TestObj', 'test')
class TestObj(NWBDataInterface):
__nwbfields__ = ('filler_data',)
@docval({'name': 'name', 'type': str, 'doc': 'name'},
{'name': 'source', 'type': str, 'doc': 'source?'},
{'name': 'filler_data', 'type': Iterable, 'doc': 'data'})
def __init__(self, **kwargs):
name, source, filler_data = popargs('name', 'source', 'filler_data',
kwargs)
super(TestObj, self).__init__(name, source, **kwargs)
self.filler_data = filler_data
nwbfile = NWBFile(source='0',
session_description='0',
identifier='0',
session_start_time=datetime(1900, 1, 1))
nwbfile.add_acquisition(TestObj(name='name', source='source', filler_data=[1]))
with NWBHDF5IO('test_extension.nwb', 'w') as io:
io.write(nwbfile, cache_spec=True)
Here is the generated file.
The schema files have currently been copied and pasted into the MatNWB repository. A better approach may be to use a submodule that points to the schema repo. I can think of some pros/cons for both approaches, so this change is not a definite win but something to think about. Submodule pros: avoids copy/paste mistakes, makes the source of the schema files clear (as well as git rev). Cons: confusing for new users cloning the repo (need to use --recursive flag to get submodules)
Hi matnwb experts,
sorry if this is a naive question.
I tried importing a NWB dataset to matlab for the first time. I ran the command "generateCore" and received an error (pasted below). I also tried to run it using the nwb.namespace.yaml from pynwb.
Am I missing a Schema function?
I ran it on Matlab R2017b and using matnwb code updated 6/6/18 from github.
Thank you,
Stephan
generateCore('/pathto/matnwb/schema/core/nwb.namespace.yaml');
Warning: Invalid file or directory '/pathto/pynwb/src/pynwb/jar/schema.jar'.
In javaclasspath>local_validate_dynamic_path (line 271)
In javaclasspath>local_javapath (line 187)
In javaclasspath (line 124)
In javaaddpath (line 71)
In util.generateSchema (line 2)
In generateCore (line 22)
Undefined function or variable 'Schema'.
Error in yaml.getSourceInfo (line 2)
schema = Schema();
Error in util.generateSchema (line 5)
schema = yaml.getSourceInfo(localpath, filenames{:});
Error in generateCore (line 22)
cs = util.generateSchema(core);
Attempting to explicitly assign a value to a "true" property of a group throws an error:
>> g1 = types.untyped.Group();
>> g2 = types.untyped.Group();
>> g1.groups.g2 = g2
Error using containers.Map/subsref
The specified key is not present in this container.
Error in util.StructMap/subsasgn (line 59)
tempobj = obj.map(subv);
Error in types.untyped.Group/subsasgn (line 77)
obj = builtin('subsasgn', obj, s, r);
I believe this is important because if a Group cannot distinguish between an attribute and a dataset, you need a way to explicitly tell it which to use.
>> g1.a1 = [1 2 3]
Error using types.untyped.Group/subsasgn (line 69)
Group:subsasgn: please specify whether this numeric value is in 'attributes' or 'datasets'
Failing test:
>> nwbtest('ProcedureName', 'testGroupDotSubsasgnGroup')
Would it be possible to make it so the variable nwbfile.nwb_version
auto-populated?
I am having trouble reading the UnitTimes
datatype. Specifically, when I try to import data with UnitTimes
, I get the following error:
>> nwb=nwbRead('/Users/bendichter/Desktop/Buzsaki/SenzaiBuzsaki2017/test.nwb');
Error using hdf5lib2
Incorrect number of uint8 values passed in
reference parameter.
Error in H5R.get_name (line 34)
name = H5ML.hdf5lib2('H5Rget_name', loc_id,
ref_type, ref, useUtf8);
Error in io.parseReference (line 7)
target = H5R.get_name(did, reftype, data);
Error in io.parseDataset (line 24)
data = io.parseReference(did, tid,
H5D.read(did));
Error in io.parseGroup (line 14)
ds = io.parseDataset(filename, ds_info,
fp);
Error in io.parseGroup (line 26)
subg = io.parseGroup(filename, g_info);
Error in io.parseGroup (line 26)
subg = io.parseGroup(filename, g_info);
Error in io.parseGroup (line 26)
subg = io.parseGroup(filename, g_info);
Error in nwbRead (line 20)
nwb = io.parseGroup(filename, info);
I have tracked this down to an error building the spike_times_index
component of UnitTimes
, which is a list of range references. This is data written using pynwb
. Could this be due to the switch of dimensions mentioned in the README? I'll try to replicate the error using a file written in matnwb
as well.
When a dataset is inherited and its dtype is "any", then the inheriting type redefines it as something more specific, like "text", this causes the generated class to error on instantiation because the default value is assigned per the dtype of the superclass dataset but the validation function is based on the dtype of the dataset in the actual class.
For example, AnnotationSeries inherits from TimeSeries. TimeSeries.data has a dtype of "any". AnnotationSeries.data has a dtype of "text". The generated types.AnnotationSeries fails during construction:
>> types.AnnotationSeries
Error using types.AnnotationSeries/validate_data (line 64)
AnnotationSeries: data must be a cell string
Error in types.TimeSeries/set.data (line 99)
obj.data = validate_data(obj, val);
Error in types.TimeSeries (line 71)
obj.(field) = p.Results.(field);
Error in types.AnnotationSeries (line 32)
obj = [email protected](varargin{:});
Failing test:
>> nwbtest('ProcedureName', 'testSmokeInstantiateCore')
Since names are used as field names of objects, matnwb does not allow them to have spaces. pynwb does allow spaces. Maybe a compromise would be to automatically replace all spaces with underscores in matnwb.
>> generateExtension('/Users/bendichter/dev/to_nwb/to_nwb/extensions/ecog/ecog.namespace.yaml');
Out of memory. The likely cause is an infinite recursion
within the program.
files are here: https://github.com/bendichter/to_nwb/tree/master/to_nwb/extensions/ecog
file = nwbfile( ...
'source', 'sdf', ...
'session_description', 'a test NWB File', ...
'identifier', 'sdf', ...
'session_start_time', datestr(now, 'yyyy-mm-dd HH:MM:SS'), ...
'file_create_date', datestr(now, 'yyyy-mm-dd HH:MM:SS'));
ts = types.core.TimeSeries('source', 'source',...
'starting_time',0,...
'starting_time_rate',3,...
'data',[1,2,3],...
'data_units','V?');
file.acquisition.set('test', ts);
nwb_path = 'test.nwb';
nwbExport(file, nwb_path)
nwb_read = nwbRead(nwb_path);
Undefined function 'keys' for input arguments of type 'types.untyped.DataStub'.
Error in io.parseGroup>elide (line 77)
ekeys = keys(set);
Error in io.parseGroup (line 58)
props(etp) = elide(etp, props);
Error in io.parseGroup (line 26)
subg = io.parseGroup(filename, g_info);
Error in io.parseGroup (line 26)
subg = io.parseGroup(filename, g_info);
Error in nwbRead (line 20)
nwb = io.parseGroup(filename, info);
Error in TDT2NWB (line 146)
nwb_read = nwbRead(nwb_path);
Interestingly this is not an issue for derived classes like ElectricalSeries
. I may be entering something wrong here, but it would be better if errors occurred on write rather than on read.
I updated the syntax on the python tests and now most of the errors are minor data type equality errors, e.g int32
-> int64
.
Verification failed in tests.system.TimeSeriesIOTest/testInFromPyNWB.
----------------
Test Diagnostic:
----------------
Values for property 'data' are not equal
---------------------
Framework Diagnostic:
---------------------
verifyEqual failed.
--> Classes do not match.
Actual Class:
int64
Expected Class:
int32
Actual Value:
10×1 int64 column vector
100
110
120
130
140
150
160
170
180
190
Expected Value:
10×1 int32 column vector
100
110
120
130
140
150
160
170
180
190
I would expect NWBFile.nwb_version to be assigned by MatNWB and not the user. Right now this is left blank unless the user specifies it. PyNWB assigns this version.
Failing test:
>> nwbtest('Name', 'tests.system.NWBFileIOTest/testInFromPyNWB')
The following pull requests on the nwb-schema (and PyNWB repo respectively) added new neurodata_types to the schema to improve support for trial data.
NeurodataWithoutBorders/pynwb#536
NeurodataWithoutBorders/nwb-schema#173
In particular the DynamicTable
and TableColumn
type are new. DynamicTable
is essentially a column-based table consisting of an arbitrary number of TableColumn
. In this way users can add arbitrary metadata columns for trial data without having to write extensions. It also adds a new group trials
for storing trial data.
There are no new features in terms of the schema-langue, i.e., these are changes to only the schema itself. It would be great if we could catch up MatNWB to the schema as the support for trial data was one of the central needs identified during the hackathons.
@nclack @ln-vidrio
ElectricalSeries.data is defined by the spec as:
name: data
shape:
- - null
- - null
- null
Attempting to assign a 2d matrix to data on the generated class causes an error:
>> es = types.ElectricalSeries;
>> es.data = ones(2)
Error using types.ElectricalSeries/validate_data (line 65)
ElectricalSeries.data: val must have [1,2] dimensions
Error in types.TimeSeries/set.data (line 99)
obj.data = validate_data(obj, val);
Failing test:
>> nwbtest('ProcedureName', 'testWriteDatasetWithMultipleNullShapes')
I am trying to load a file with a large LFP data block. The shape is saved as 50461375x80 and type int16
. When I try to run nwbRead
I run into several problems. The error I receive is
Error using double
Requested 4036910000x1 (30.1GB) array exceeds maximum array
size preference. Creation of arrays greater than this limit
may take a long time and cause MATLAB to become unresponsive.
See array size limit or preference panel for more
information.
Error in types.util.checkDtype (line 40)
val = eval([type '(val)']);
Error in types.core.ElectricalSeries/validate_data (line 33)
val = types.util.checkDtype('data', 'double', val);
Error in types.core.TimeSeries/set.data (line 96)
obj.data = obj.validate_data(val);
Error in types.core.TimeSeries (line 72)
obj.data = p.Results.data;
Error in types.core.ElectricalSeries (line 17)
obj = [email protected](varargin{:});
Error in io.parseGroup (line 68)
parsed = eval([typename '(kwargs{:})']);
Error in io.parseGroup (line 26)
subg = io.parseGroup(filename, g_info);
Error in io.parseGroup (line 26)
subg = io.parseGroup(filename, g_info);
Error in io.parseGroup (line 26)
subg = io.parseGroup(filename, g_info);
Error in nwbRead (line 20)
nwb = io.parseGroup(filename, info);
int16
and it appears that this is being cast as a double
4036910000x1
, but it should be 50461375x80
.With a little keyboard
mode exploration I found g_info.Datasets(1).Datatype.Class: 'H5T_INTEGER'
and .Type: 'H5T_STD_I16LE'
, which looks right to me, so matnwb
appears to be reading this information correctly, but maybe not applying it. If the strategy is to import as a double and then recast to the desired type, I think that's going to continue to cause RAM issues and should probably be refactored. g_info.Datasets(1).Dataspace.Size: [80 50461375]
, which I think should be transposed. NWB is pretty particular about the time dimension always being first.
The file I am trying to import is quite large (6 GB), and I think you might be able to investigate these issues without it, but if you'd like it, let me know the best way to share it with you.
Datasets that have multiple possible shapes including a vector shape cannot be assigned vector values:
>> mc = testpack.TestClass();
>> mc.testDataset = 1; % testDataset should accept a vector shape but it won't allow it
Error using testpack.TestClass/validate_testDataset (line 47)
TestClass.testDataset: val must have shape [2,3]
Error in testpack.TestClass/set.testDataset (line 26)
obj.testDataset = validate_testDataset(obj, val);
This issue may be related to the use of ndims in the validation function. ndims will always return a value greater than or equal to 2, per https://www.mathworks.com/help/matlab/ref/ndims.html. The validation function assumes ndims will return 1 on a scalar value.
Failing test:
>> nwbtest('ProcedureName', 'testWriteDatasetWithMultipleShapes')
The Group constructor currently ignores unknown fields in the given struct. It should probably error (or at least warn) to alert the user:
>> s.unknownField = [1 2 3];
>> g = types.untyped.Group(s)
g =
Group with properties:
attributes: {}
datasets: {}
links: {}
groups: {}
classes: {}
Filtered test:
>> nwbtest('ProcedureName', 'testGroupConstructorThrowsOnUnknownArgument')
Many of the tests writing a file in MatNWB and trying to open it in PyNWB fail. Most appear to fail because MatNWB writes empty datasets when the datasets have not been assigned/set by the user or a default value. PyNWB does not expect these empty datasets. Not sure if that’s considered a PyNWB or MatNWB issue.
Failing tests:
>> nwbtest('Name', 'tests.system.*/*PyNWB')
I believe this is an issue with pynwb not following spec strictly enough: NeurodataWithoutBorders/pynwb#592. Creating an issue here for record-keeping
Some of the attribute and dataset dtypes listed in the schema spec do not appear to be handled properly. For instance, I would expect ascii, utf, utf8, utf-8 to be parsed as string like text but they are left as is. Others are handled in unexpected ways (float is parsed as double instead of single).
Failing tests:
>> nwbtest('ProcedureName', 'testParseAttributeWithDtype')
>> nwbtest('ProcedureName', 'testParseDatasetWithDtype')
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.