mne-tools / mne-bids-pipeline Goto Github PK

View Code? Open in Web Editor NEW

137.0 14.0 65.0 1.01 GB

Automatically process entire electrophysiological datasets using MNE-Python.

Home Page: https://mne.tools/mne-bids-pipeline/

License: BSD 3-Clause "New" or "Revised" License

Python 99.06% Shell 0.78% Makefile 0.16%

bids mne-python group-analysis pipeline python neuroscience

mne-bids-pipeline's Introduction

MNE-BIDS-Pipeline

MNE-BIDS-Pipeline is a full-flegded processing pipeline for your MEG and EEG data.

It operates on data stored according to the Brain Imaging Data Structure (BIDS).
Under the hood, it uses MNE-Python.

💡 Basic concepts and features

🏆 Automated processing of MEG and EEG data from raw data to inverse solutions.
🛠️ Configuration via a simple text file.
📘 Extensive processing and analysis summary reports.
🧑‍🤝‍🧑 Process just a single participant, or as many as several hundreds of participants – in parallel.
💻 Execution via an easy-to-use command-line utility.
🆘 Helpful error messages in case something goes wrong.
👣 Data processing as a sequence of standard processing steps.
⏩ Steps are cached to avoid unnecessary recomputation.
⏏️ Data can be "ejected" from the pipeline at any stage. No lock-in!
☁️ Runs on your laptop, on a powerful server, or on a high-performance cluster via Dask.

📘 Installation and usage instructions

Please find the documentation at mne.tools/mne-bids-pipeline.

❤ Acknowledgments

The original pipeline for MEG/EEG data processing with MNE-Python was built jointly by the Cognition and Brain Dynamics Team and the MNE Python Team, based on scripts originally developed for this publication:

M. Jas, E. Larson, D. A. Engemann, J. Leppäkangas, S. Taulu, M. Hämäläinen, A. Gramfort (2018). A reproducible MEG/EEG group study with the MNE software: recommendations, quality assessments, and good practices. Frontiers in neuroscience, 12. https://doi.org/10.3389/fnins.2018.00530

The current iteration is based on BIDS and relies on the extensions to BIDS for EEG and MEG. See the following two references:

Pernet, C. R., Appelhoff, S., Gorgolewski, K. J., Flandin, G., Phillips, C., Delorme, A., Oostenveld, R. (2019). EEG-BIDS, an extension to the brain imaging data structure for electroencephalography. Scientific Data, 6, 103. https://doi.org/10.1038/s41597-019-0104-8

Niso, G., Gorgolewski, K. J., Bock, E., Brooks, T. L., Flandin, G., Gramfort, A., Henson, R. N., Jas, M., Litvak, V., Moreau, J., Oostenveld, R., Schoffelen, J., Tadel, F., Wexler, J., Baillet, S. (2018). MEG-BIDS, the brain imaging data structure extended to magnetoencephalography. Scientific Data, 5, 180110. https://doi.org/10.1038/sdata.2018.110

mne-bids-pipeline's People

Contributors

Stargazers

Watchers

mne-bids-pipeline's Issues

incorrect test dataset gets downloaded

If I do:

$ python tests/download_test_data.py ds000248

I get download for eeg_matching_pennies:

/home/mainak/anaconda2/envs/mne/lib/python3.6/site-packages/numba/decorators.py:146: RuntimeWarning: Caching is not available when the 'parallel' target is in use. Caching is now being disabled to allow execution to continue.
  warnings.warn(msg, RuntimeWarning)

----------------------
datalad installing "eeg_matchingpennies"
datalad get data "sub-05" for "eeg_matchingpennies"

@agramfort can you replicate?

Use log abscissa in PSD plots when goal is to spot bad channels

As discussed in #123

defaults

we should review all defaults in the config.py, for instance this one:

https://github.com/mne-tools/mne-study-template/blob/4ea6f4d617f2dc6a06589693cc9ad058941fb3b1/config.py#L115

`_audiovis_` naming

In the files that we genrate _audiovis_ is all over the place.

ie:

https://github.com/mne-tools/mne-study-template/blob/f33cb39a12419ab56a399de0ccf8a860a9137af0/02-apply_maxwell_filter.py#L32

Should the code be example agnostic? or is it really part of the example?

explain better some parameters setup for ICA

Since one has the choice to high-pass filter the data or not, there are several tricky questions:
(1) it seems that the ICA rejection works best if the high-pass filter is 1 Hz
(2) what if one does not want to high-pass the data, what are the options?

Handling of multiple sessions

If I'm not mistaken, currently we process each session separately (only concatenate their runs when needed). Now in experimental paradigms I've implemented in the past, the runs were often split across multiple sessions because otherwise the experiment would simply take too long; but in the end, all data has to be analyzed (filtered, ERP computation etc) as one. I don't think this is currently supported here and would probably need some thinking and a few tweaks in a number of places

Test datasets' empty-room recordings not stored in BIDS-compatible way?

So I'm working on emptyroom processing, and I wanted to test my code with ds000246 and ds000248. Turns out that mne-bids' get_matched_empty_room() cannot discover the empty-room recordings because they're stored without the ses entity in their path:

https://github.com/OpenNeuroDatasets/ds000246/tree/master/sub-emptyroom/meg
https://github.com/OpenNeuroDatasets/ds000248/tree/master/sub-emptyroom/meg

Before opening an issue upstream, I wanted to ask whether this is BIDS-compliant? If it is, mne-bids needs to be adjusted. If it's not – idk, we'd have to switch to other datasets for testing? Or get those datasets fixed?

The BIDS validator doesn't complain, btw.

cc @sappelhoff

Add testing with 3D backend

We should test with PyVista on our CI system.

CI for mne study template

With @sappelhoff we agreed today that a better workflow for MNE study template would be to set up the CIs to use datalad to fetch different datasets. Since it's relatively lightweight to get one subject with datalad, this can be used to run tests on many different datasets and work in a test driven manner. So, the workflow would be something like:

add datalad to Circle CI in one pull request
next pull request, add one dataset to .circle/config.yml and iterate until circle passes
next pull request, add another dataset and iterate until circle passes

This way, you are guaranteed not to break datasets that already work. Go go go @sappelhoff !

Handling of bad channels in experiments with multiple runs and sessions

This is somewhat related to #97, but probably more urgent.

Say we have an experimental session that consists of multiple runs (or blocks). Currently, we apply bad channel detection and SSS on each block individually. This can lead to situations where a given sensor is marked as bad in one run, but not in another; and after applying the Maxwell filter, we will be unable to tell :) I'm not sure if that's good practice? What I would have done intuitively is: scan all runs for bad channels, note those down, form the union of all bad channels and mark them as bad in ALL runs before proceeding with any interpolation / SSS. What's your stance on this?

This question could even be expanded beyond runs, to sessions. But here I feel more comfortable to treat sessions differentially.

Should 03-extract_events.py be dropped?

It seems this file is a leftover from the time before we had Annotations (or before mne-bids started storing events as Annotations).

For example, the docstring says:
https://github.com/mne-tools/mne-study-template/blob/280bde6e98d78c7b888e878e292d30cf867e20f2/03-extract_events.py#L6-L7

Now, we don't even have or use config.stim_channel anymore. And the events file created by this script is only ever mentioned again in 05a-run_ica.py, albeit not used.

The produced event file is picked up by the Report.parse_folder call in 99-make_reports.py, which creates an event plot; however, these events do not necessarily correspond to config.event_id in the sense that

all events are included here, not just the ones specified in config.event_id, and
the events are not labeled (i.e., just have an event number and not a "human-readable" name)

Proposal

We should remove 03-extract_events.py altogether and instead change 99-make_reports.py such that it reads events and event labels from annotations, and passes these values to mne.viz.plot_events() so we get properly labeled event plots of only the events the user deemed relevant (via config.event_id).

ds000248 dataset doesn't have `participants.tsv`

While working on #84, I realized that the following datasets currently used in CI testing don't come with a participants.tsv, meaning reading them via the latest mne-bids will fail:

ds000248
~~eeg_matchingpennies~~

How should we go about this?

How to deploy for multiple analyses

In the case where one would like to run the pipeline

on the same dataset multiple times with different settings, or
on different datasets

it is unclear to me what's the best practice regarding mne-study-template deployment.

While the readme suggests (point 3) that one may define an MNE_BIDS_STUDY_CONFIG environment variable to use one global installation of the pipeline with different configurations, the code makes it clear that this is really just intended for running the CI tests. So – what gives? Currently I have multiple copies of mne-study-template, each with their own config.py, to analyze different projects.

This seems cumbersome, and it's also related to the question @agramfort recently raised: Should the pipeline be a loose collection of scripts, or maybe an installable package?

One issue that often arises with "global" packages / installations is that one is usually pinned to just one package version. However, with the advance of virtual environments this should be less of an issue these days. But even so, one could make use of the version identifier in config.py to automatically retrieve the correct pipeline version from GitHub. PsychoPy has a similar functionality where it allows users to specify the exact package version the experiment should be run with.

What are your general thoughts on this matter?

tagging @jasmainak @agramfort

Updating ds000248 (MNE "sample" dataset): Move fine-calibration and crosstalk files

OpenNeuro dataset ds000248 ships with fine-calibration and cross-talk files. Since we've now standardized naming of those files, I would like to update the dataset by moving:

from	to
`/derivatives/ct_sparse_mgh.fif`	`/sub-01/meg/sub-01_acq-crosstalk_meg.fif`
`/derivatives/sss_cal_mgh.dat`	`/sub-01/meg/sub-01_acq-calibration_meg.dat`

@jasmainak I see you're the one who published the dataset on OpenNeuro. Do you think you could update it? Or grant me access so I could do it myself?

Thanks!

Script names proposal

This is the naming that @SophieHerbst proposes for the moment. This is open to discussion once we have an updated flowchart.
This issue is just to keep track of them

temporal filtering before maxwell filtering?

Together with @stevenMgraham we were trying to use the MNE study template for our analysis.

However, we realized that the study template is doing something different from what the biomag demo repo does. It temporally filters first and then applies maxwell filtering. The biomag demo applies maxwell filtering first.

This seems wrong to me because you'd temporally spread the artifacts before the maxwell filter which would make data overall worse. Instead you want to do maxwell filtering first and then apply temporal filtering on the cleaned data. Thoughts @agramfort ?

Make "make report" work

Magic how little lines you need for a new dataset :)
We should have one dataset that also does make source and ideally make report we could see as artifact on circle

Originally posted by @agramfort in #45 (comment)

related to #4 ... but the codebase has moved considerably

To Do

turn mne-somato-data into git-annex dataset to be downloaded via datalad
make mne-somato-data work in study template with make source
make make report work

Add interactive / single subject mode ?

For beginners and anyone who needs to check their data and debug I would find it really helpful to add an interactive mode for the scripts.
I usually do

figure = raw.plot(n_channels = 50,butterfly=True, group_by='position')

just after importing to check for bad sensors again.
Then, I plot events, single trial epochs, ICA components, ...
The report might be a bit overwhelming for newbies. What do you think?

Parallelize testing

Can we somehow make Circle run all those tests in parallel? They're not dependent of one another, so at least in theory this should be no problem…

select and reject ICA components

Right now 04-artifact_correction_ica.py just runs the ICA but doesn't do anything with it.
We need a new scripts to visualize and select the components:
https://mne-tools.github.io/stable/auto_tutorials/plot_artifacts_correction_ica.html

and to reject them

ica.apply(realData, exclude=exclude_ICA)

Add SSP and ICA to empty-room processing?

Currently these steps are simply omitted

explain how to get BIDS converted sample data

What is the intended way to get the sample data in BIDS?

I tried make fetch, but it misses some dependencies:

sophie@Sophies-MacBook-Pro-2:~/repos/mne-study-template$ make fetch
python ./tests/download_test_data.py --dataset=
Traceback (most recent call last):
File "./tests/download_test_data.py", line 5, in
import datalad.api as dl
ModuleNotFoundError: No module named 'datalad'
make: *** [fetch] Error 1

How about a link to https://mne.tools/mne-bids/auto_examples/convert_mne_sample.html#sphx-glr-auto-examples-convert-mne-sample-py ?
However, this crashes because of the emptyroom, see here.

ds000246 downloads failing

ds000246 seems to have gotten corrupted on OpenNeuro; I've contacted OpenNeuro support. Until this is resolved, our CIs will fail.

Allow convenient testing of different datasets with the Makefile

Instead of having to edit the path in the config.py every time one wants to run a new dataset, we could perhaps have some helper function that can be triggered with the Makefile.

Add proper logging mechanism

… instead of using print() calls.

use pytest for tests

As we discussed here, we should pytest.parametrize and as much of the pytest infrastructure as possible rather than reinventing the wheel.

Any thoughts about nipype?

This seems like an interesting and potentially useful project.

I'm curious if there was any thought to making a nipype (https://nipype.readthedocs.io/en/latest/) type interfaces?

I don't use nipype very often but I have found it useful in the past to standardize some large batch processing. Just a thought.

add more options to makefile

there should be:

$ make clean

to remove the derivatives folder

and if the output file of a script already exists, then do not re-run it. I think it's how many people would use it. It's a bit annoying to re-run everything from the start. Opinions?

remove duplication of make_bids_basename

there is a lot of duplicated code with make_bids_basename: https://github.com/mne-tools/mne-study-template/blob/master/05b-run_ssp.py#L58

If we want the scripts to be extensible, they should be simplified somehow

StratifiedKFold with shuffling in the decoding step?

Currently we create the cross-validation object in our decoding step (08) of the pipeline via:
https://github.com/mne-tools/mne-study-template/blob/b61a5ca66aaef1f631d7ce2def3b1cde5d611729/08-sliding_estimator.py#L80-L81

By default, StratifiedKFold does not shuffle, meaning that the passed random_state doesn't have any effect (it produces a warning, though).

So – should we enable shuffling? Intuitively I would say yes, but want to hear your opinion, @agramfort

Evoked plot ordinates only contain UNIT, but no actual LABEL

See the abscissa for comparison, which has label and unit:

Probably needs to be fixed upstream in MNE, but just leaving this here because I just stumbled upon this while browsing our build artifacts…

update get_values_for_key -> get_entity_vals

can you update this in mne-study-template? :) don't forget about it ;-)

Originally posted by @jasmainak in mne-tools/mne-bids#260 (comment)

Improve Reports

The generated reports currently leave much to be desired. I will update this list as I discover more missing features / issues, and of course tick them off if issues get resolved :)

there's no Epochs drop log (I believe we used to have one, once upon a time?) (mne-tools/mne-python#7990)
PSD plots for raw data missing (#155)
SSP projectors are missing (mne-tools/mne-python#7991)
decoding results are not included (#184)
TFR results are not included
I don't think we're currently including diagnostic info re ICA (#163)
some results appear twice, because we're using parse_folder() and later adding these elements again, manually
the buttons in the top-right toggle visibility of sections, however I suppose most users would assume they're navigation links, so their behavior is confusing

Setup ICA's n_components

The right way to set the n_components is this:

meg_maxfilter_rank = raw.copy().pick_types(meg=True).estimate_rank()
ica = mne.preprocessing.ICA(n_components=meg_maxfilter_rank,
                            method=method,
                            random_state=42)

Plus there was the problem of centring the PCA. (check https://github.com/massich/epilepsy-biomag-2018)

Originally posted by @massich in #2

config.trigger_time_shift is currently unused

And also lacks clarification of what exactly it is supposed to do.

Use MNE config for placing the test data of the study template

@jasmainak @sappelhoff can you see how to use this variable to download the data:

In [3]: mne.utils.get_config('MNE_DATA')
Out[3]: '/Users/alex/mne_data'

it's better than hardcoding the ~/data folder.

Originally posted by @agramfort in #35 (comment)

issues with MNE study template

Yesterday, with @larsoner we asked some workshop participants to try out the code here. Some of the limitations I observed:

the config file is a bit hard to parse. If you want to try just one script, it's not clear which params in the config file must be touched or changed. Perhaps a more explicit import at the top of each file would solve this issue:

from config import blah

It assumes a certain structure to the data but it's not BIDS. This means there are many paths that need to be defined in the config. I think for people using the study template, we should make it as simple as possible -- they should just have to point to the root and then it should throw informative errors if certain files were expected, e.g., a list of participants, an events file etc. cc @sappelhoff
the code assumes fif file format. If we use, mne_bids.io.read_raw_bids, it should be able to triage over different files without making this assumption

Remove anonymization

As @jasmainak pointed out, anonymization should happen on the BIDS level, before the study template is run.

Naming scheme of inverse solutions

Currenly, we do this for STCs:

https://github.com/mne-tools/mne-study-template/blob/8887a945a09274f64abe879e7d25a8d3c6752d5a/13-make_inverse.py#L71-L73

producing filenames like:

Cam-CAN_CC110033_mne_dSPM_inverse-audiovis1200Hz-lh.stc

and this for fsaverage-based STCs (later used to create grand mean):

https://github.com/mne-tools/mne-study-template/blob/8887a945a09274f64abe879e7d25a8d3c6752d5a/14-group_average_source.py#L45-L48

producing filenames like

mne_dSPM_inverse_fsaverage-audiovis1200Hz-lh.stc

inside derivatives/mne-study-template/sub-CC110033/meg

Now -- with which one should we stick? I suggest the latter, because it's more concise and typically the data are organized in properly-named directory structures anyway?

More ICA iterations

The low default number of MNE's ICA iterations should be increased, at least for EEG, since in my experience MNE's default is often not sufficient for ICA to converge; and / or the algorithm should be switched to Picard.

Temporally restricted channel interpolation

Probably needs to be solved upstream, but dropping it here for initial discussion

Currently, bad channel detection via find_bad_channels_maxwell() works on a global scale: if a user-defined threshold is exceeded frequently enough, the respective channel is marked as bad. It will typically be be interpolated via Maxwell filtering as the next step.

Now @SophieHerbst is running into two issues with this approach:

If flux spikes occur only rarely and / or their amplitude is only of "medium" size, the threshold will not be exceeded often enough to classify the channel as bad. Lowering the threshold (or required number the threshold must be exceeded), on the other hand, leads to many false-positives in other channels.
Flux spikes are a transient artifact; outside of those spikes, the channel may deliver usable data. However, currently this information will be lost if the entire channel is marked as bad and is subsequently interpolated.

A potential approach to tackle both issues could be to:

Run a flux spike detection without global thresholding – just return the temporal location of the spikes
Annotate those spikes, e.g. as FLUX_SPIKE, and keep track of the affected channel(s)
Interpolate channel(s) during that time period

Thoughts?

Write manually selected bad channels to disk

We currently write a table of channels that are bad after running find_bad_channels_maxwell to disk.

But in cases where the user chooses to work in interactive mode (see #131) and potentially changes the bads selection in the interactive Raw browser after automated bad channel detection has been run, we don't update that table. This should probably be changed.

run tests also locally

Currently, all the tests are in the circle CI script. That's fine, but sometimes you want to run just one command locally and see if everything works. Perhaps this could be added to one script which can be called from circle?

Cutting a release

cc @agramfort @jasmainak @sappelhoff

I want to make a first tagged release, ideally very soon (end of this week / sometime next week). I will try to polish some things and improve the documentation. Do you have anything in particular that you'd definitely want to see in a first release?

btw I'd like to follow a "release often, release early" approach with the Study Template, as long as we're adding changes so frequently.

Could not find input data file matching

Hello! I am completely new to EEG/MEG let alone mne-tools, but I am trying my best to get this software working on my computer.

I've downloaded a dataset from openneuro (ds000247 - OMEGA Resting State), and trying to run mne-study-template. When I run "make sensor", I ran into the following error message.

hayashis@haswell:~/git/mne-study-template(master*) $ export BIDS_ROOT=/mnt/3t/testdata/ds000247-download
hayashis@haswell:~/git/mne-study-template(master*) $ make sensor
python3 01-import_and_filter.py

Processing subject: 0006
------------------------
Traceback (most recent call last):
  File "01-import_and_filter.py", line 155, in <module>
    main()
  File "01-import_and_filter.py", line 151, in main
    config.sessions))
  File "01-import_and_filter.py", line 149, in <genexpr>
    parallel(run_func(subject, run, session) for subject, run, session in
  File "01-import_and_filter.py", line 77, in run_filter
    '"{}"'.format(search_str))
ValueError: Could not find input data file matching: "/mnt/3t/testdata/ds000247-download/sub-0006/ses-0001/meg/sub-0006_ses-0001_task-noise_run-01_meg*"
make: *** [Makefile:14: sensor] Error 1

Could you help me troubleshoot this problem?

Error: Surface outer skull is not completely inside surface outer skin

Dear all,

Regarding the DS117 dataset. When I employ the conductivity for 3 layers (EEG), I got the next error (subject 006):

However, employing the same subject, but with conductivity for 1 layer (MEG), there is not any error:

In the case of the subject 002, there is not problem with conductivity for 3 layers (EEG):

I only require to use the EEG data (I know that the 3-layer is unreliable). Is there any solution for the first error message? or do you suggest me only take the users such as 002?. The idea is to continue with the forward estimation and the inverse stepts.

By the way, thanks for your proposal MNE pipeline ;).

Handling datasets with multiple simultaneous recording modalities

Currently we ask users to specify the kind of the recording to be analyzed; so e.g. if config.kind='meg', we will look for data in the meg/ sub-directory.

Now it occurred to me that according to BIDS, in cases where you have a simultaneous recording of several modalities from the same recording device, you store all of them in the same place.

Concrete example: your device supports recording MEG and EEG simultaneously with the same sampling frequency –> you would consider the EEG data as kind of "auxiliary" data of the MEG measurement, and store the dataset e.g. as sub-01/meg/sub-01_run-01_meg.fif.

In comparison, if you record MEG and EEG simultaneously, but use a separate EEG amplifier, you would store the MEG data as sub-01/meg/sub-01_run-01_meg.fif and the EEG data as sub-01/eeg/sub-01_run-01_eeg.fif.

What this boils down to is that when we receive kind='meg', we could still end up loading a dataset that contains EEG channels. This is e.g. the case with the ds000248 dataset, as is demonstrated in this table which was created based on kind='meg' processing, yet there are still EEG channels in the data. When during empty-room processing I wanted to mark the same channels as bad as in the respective experimental measurement, I would get an error message because the experimental data had an EEG channel in info['bads'], yet the empty-room recording naturally doesn't contain any EEG channels.

I therefore suggest that for now, we run sth like raw.pick_types(kind) in the pipeline as early as possible to avoid any downstream issues like the one I encountered above.

We also need to discuss how to deal with this kind of "multi-modal" data in general. Assume a user has an M/EEG dataset (same device, so saved in the same file in meg/) and wants to limit analysis to EEG, this is currently not possible by setting config.kind.

Would like to hear your opinion on this.

Add "make clean" functionality to Makefile

Should remove the mne-study-template derivatives.

Can I be a member? :-)

Hello, I wanted to ask if you could grant me access rights to merge PRs? Would that be possible?

Allow noise covariance calculation from empty-room recordings

Currently we only allow noise covariance estimation based on the pre-stimulus period. @SophieHerbst requested an easy way to plug in empty-room recordings here. Will need to dig into mne-bids to figure out what are our options here. Suggestions and ideas welcome!