The sparc2nwb from sparc-fair-codeathon

Purpose

This is the repository for Team 1 of the 2021 SPARC FAIR Codeathon. Supported by the NIH Common Fund, SPARC is an open access data sharing resource with high-value datasets, maps, tools, and computational studies in the field of bioelectronic medicine that ultimately aims to improve targeting for more specifically designed neuromodulation therapies.

Project Goals

Identifying the Problem

A major problem our team has identified is that data within the SPARC Portal is stored in a variety of different file formats and is not standardized. This heterogeneity limits data sharing as well as interoperability between programming languages, which slows progress in the field. Our team sought methods that would overcome this challenge and improve the FAIRness of SPARC data.

Creating a Solution

Our project goal was to improve the readability and accessibility of SPARC data by standardizing the format in which the data is stored. We achieved this goal by first converting raw data stored on the SPARC Portal into NWB format. Once the data was in NWB format, we then created APIs to extract the data out of the NWB files so that researchers can manipulate the data for analyses in multiple programming languages.

What is Neurodata Without Borders (NWB)?

Neurodata Without Borders (NWB) is a NIH-based initiative to create a cross-platform standard for neurophysiology data storage and sharing.

The NWB file format allows users to store raw and processed data and associated metadata in a single, standardized format. Common file formats used in experimentation (e.g., .csv, .xlsx, .json, .m, .py) can be converted into .nwb format and the stored file information can then be extracted into the programming language of choice (i.e., Matlab, Python, C++) for processing and analyses. NWB is a dynamic format and does not have a stable folder structure for storing data across domains. Instead, the folder structure heavily depends on the study experimentation process and the type of data that was collected. The figure below illustrates the data storage structure for electrophysiological data, as an example (Source).

Description of data used to develop the pipeline

Our team created the tools and the code to convert data and metadata from an optophysiological study dataset within the SPARC Portal into NWB format. The study (protocol here, manuscript here) aimed to characterize porcine and human neuronal responses to mechanical compression and tension using immunohisochemical techniques.

Rationale for choosing this dataset to illustrate the use of our tools

We chose this dataset to illustrate the use of our tools for multiple reasons. First, optophysiological research methods are common within the SPARC Portal. By providing a template to convert this type of data to NWB, our project outcomes have a high impact in that more users can utilize our tools with minimal changes needed. We document the process of how we built the NWB file structure so that users can tailor our tools for different types of data as well.

Second, when choosing this dataset, we considered the impact potential and what would be helpful for the field. We ultimately chose to work with optophysiological data from a study that tested the neuronal response to mechanical stimulation in the porcine colon. This research contributed to investigating the underlying factors of inflammatory bowel disease (IBD), a condition which is a steeply growing public health crisis.

The research community has recognized the critical nature of this crisis and has committed large amounts of resources to investigate the underlying factors of IBD. There has been consistently large amounts of publications regarding colon disease over recent years and the field is in need of comparative studies of neuronal mechanisms across species. These factors highlight the importance of open communication and data sharing in this subfield. Colon disease research is desperately needed, but the data is hard to access. We therefore chose an optophysiological dataset as this type of data would be highly relevant for SPARC users, and we chose to convert data from the field of colon disease research as this is a consistently rising area of study within the field.

User guide

Step 1: Find dataset you wish to convert to NWB format from the SPARC Portal

Step 2: View template to determine if the chosen dataset complies with the structure of our tools

Template dataset structure

The current conversion APIs in the following step require the user to organize the dataset in the format of the template dataset structure below.

In the manifest.xlsx, filename represents the dataset filename including the path, and timestamp represents the time of the experiment to be acquired.

In the samples.xlsx, the following data columns are required, including subject_id, age, specimen type, sex, species, protocol title, specimen, and anatomical location.

In the subjects.xlsx, the following data columns are required, including subject_id, and Weight_kg.

If the dataset is in the same raw storage format, use the GUI in the following step to enter the dataset file path and convert to NWB file. If it is in a different format, you can either manipulate the format of the raw dataset or alter the conversion script in the following step to convert to NWB according to your specific needs.

Step 3: Convert the raw data files to NWB format

You have three options for converting your data to NWB format. Documentation and further instruction regarding these tools can be found in the respective folders within this repository.

Option 1: Use Python-based GUI
Option 2: Use conversion script script in Python
Option 3: Use conversion script in Matlab

In this example, converted data within the NWB file includes:

Timeframe (i.e., timestamp in frame number)
Neuronal response (i.e., changes in fluorescence in response to stimulus)

And converted metadata within the NWB file ncludes:

Subject Metadata (Age, Genotype, Subject ID, Sex, Weight, Species, and Description)
Session Metadata (Session Description, Identifier, Session start time, File creation date, Institution, Lab, Experimenter, Experiment Description, Related Publications, and Keywords)
Mechanical stimulus type (i.e., stretch or compression)
Specific neuron within a group that responded to stimulus

Further documentation regarding where the raw data is stored within the NWB file during the conversion process is located here.

Step 4: Extract (meta)data out of NWB file

You can extract the desired (meta)data out of the NWB file using either Matlab- or Python-based APIs located here. Further documentation regarding how to navigate the contents of the NWB file are included within the folder contents.

Step 5: Process and analyze (meta)data

You are now able to view and manipulate the contents of the dataset in either Python or Matlab.

Additional References

Manuscipt

A manuscript that details the process of creating these tools, including open source code and data used in our project, is currently in progress. Check back soon!

SPARC2NWB Team

Marielle Darwin | Ananth Reddy | Derek Chang | Patrick Chuang

sparc-fair-codeathon / sparc2nwb Goto Github PK

sparc2nwb's Introduction