neurostuff / nimare Goto Github PK

View Code? Open in Web Editor NEW

177.0 7.0 56.0 31.63 MB

Coordinate- and image-based meta-analysis in Python

Home Page: https://nimare.readthedocs.io

License: MIT License

Python 96.89% Dockerfile 0.53% Makefile 0.11% TeX 1.94% Smarty 0.52%

meta-analysis python neuroimaging

nimare's Introduction

NiMARE: Neuroimaging Meta-Analysis Research Environment

A Python library for coordinate- and image-based meta-analysis.

Currently, NiMARE implements a range of image- and coordinate-based meta-analytic algorithms, as well as several methods for advanced meta-analytic methods, like automated annotation and functional decoding.

Installation

Please see our installation instructions for information on how to install NiMARE.

Installation with pip

pip install nimare

Local installation (development version)

pip install git+https://github.com/neurostuff/NiMARE.git

Citing NiMARE

If you use NiMARE in your research, we recommend citing the Zenodo DOI associated with the NiMARE version you used, as well as the Aperture Neuro journal article for the NiMARE Jupyter book. You can find the Zenodo DOI associated with each NiMARE release at https://zenodo.org/record/6642243#.YqiXNy-B1KM.

# This is the Aperture Neuro paper.
@article{Salo2023,
  doi = {10.52294/001c.87681},
  url = {https://doi.org/10.52294/001c.87681},
  year = {2023},
  volume = {3},
  pages = {1 - 32},
  author = {Taylor Salo and Tal Yarkoni and Thomas E. Nichols and Jean-Baptiste Poline and Murat Bilgel and Katherine L. Bottenhorn and Dorota Jarecka and James D. Kent and Adam Kimbler and Dylan M. Nielson and Kendra M. Oudyk and Julio A. Peraza and Alexandre Pérez and Puck C. Reeders and Julio A. Yanes and Angela R. Laird},
  title = {NiMARE: Neuroimaging Meta-Analysis Research Environment},
  journal = {Aperture Neuro}
}

# This is the Zenodo citation for version 0.0.11.
@software{salo_taylor_2022_5826281,
  author       = {Salo, Taylor and
                  Yarkoni, Tal and
                  Nichols, Thomas E. and
                  Poline, Jean-Baptiste and
                  Kent, James D. and
                  Gorgolewski, Krzysztof J. and
                  Glerean, Enrico and
                  Bottenhorn, Katherine L. and
                  Bilgel, Murat and
                  Wright, Jessey and
                  Reeders, Puck and
                  Kimbler, Adam and
                  Nielson, Dylan N. and
                  Yanes, Julio A. and
                  Pérez, Alexandre and
                  Oudyk, Kendra M. and
                  Jarecka, Dorota and
                  Enge, Alexander and
                  Peraza, Julio A. and
                  Laird, Angela R.},
  title        = {neurostuff/NiMARE: 0.0.11},
  month        = jan,
  year         = 2022,
  publisher    = {Zenodo},
  version      = {0.0.11},
  doi          = {10.5281/zenodo.5826281},
  url          = {https://doi.org/10.5281/zenodo.5826281}
}

To cite NiMARE in your manuscript, we recommend something like the following:

We used NiMARE v0.0.11 (RRID:SCR_017398; Salo et al., 2022a; Salo et al., 2022b).

Contributing

Please see our contributing guidelines for more information on contributing to NiMARE.

We ask that all contributions to NiMARE respect our code of conduct.

nimare's People

Contributors

Stargazers

Watchers

nimare's Issues

Output files should follow BIDS-like naming convention

The output filenames can be changed to have key-value pairs, much like BIDS. We can also take some cues from the PALM conventions.

EDIT 2020/08/01: TS adding a checklist.

Implement naming convention for output files. (#224)
Document naming convention in docs. (#333)
Create flat json describing the convention, based on the PyBIDS config files. (#338)

absolute paths in nidm_pain_dset_with_subpeaks_docker.json

The json representation of meta-analyses probably shouldn't have absolute paths encoded
example

my solution was to just provide a partial path and add in the user specific directories when loading the files (you can see an example in this jupyter notebook)

Sleuth converter failing on multiline experiment names

The Sleuth-to-json converter is failing on any experiment labels that are more than one line long. This applies both to very long names that are wrapped and to grouped experiments (one of the options for Sleuth exporting). It should be pretty easy to fix.

Allow SCALE to use image for null and provide some options

Options for baseline images:

Neurosynth
BrainMap (from a data dump)
Eklund false positive rates from here

Should we include text extraction/automated annotation tools?

Both functional decoding and content-based meta-analyses will require some way of labeling experiments/papers. By grabbing labeled images from Neurovault and crowd-annotated experiments from Brainspell, we can build datasets with manual annotations, but I was thinking that it might be a good idea to support automated annotation methods as well.

We could have a submodule of nimare.extract for text extraction (e.g., downloading abstracts, tf-idf vectorization of abstracts with a data-driven ontology like Neurosynth, LDA topic modeling, GCLDA topic modeling, Cognitive Atlas term extraction, and hierarchical expansion of CogAt labels). Some elements are currently implemented in Neurosynth, while I’ve attempted to implement others in cogat-extraction.

Switch from printing warnings/messages to logging

Unless someone has ideas for a good alternative, I'd like to use logging instead of printing to track warnings, messages, and errors.

Odd behavior of MKDA/KDA kernels with odd radii

The MKDA and KDA kernels use code I took from Neurosynth, and return weird results when I use an odd radius on a 2mm template. I haven't run it on a 1mm template yet, but I plan to soon. The image below shows what I'm talking about. Here is the current version of the kernel generation code.

Add diagnostics module

Visual representation for diagnostics of incoming data (e.g., plot coordinates outside brain)
Model review
Quality control on results
~~Remove coordinates from outside of masks in kernel estimators (#37)~~

Make sure Datasets can be loaded from compressed json files

It seems likely that NIMADS datasets will be stored in compressed json files, so we should ensure that NiMARE Datasets can load from those in addition to standard jsons.

Store estimator and params in MetaResult

It's a good idea to have the MetaResult object store basic provenance information. At minimum, we should store the estimator class name and its parameters (from get_parameters).

Add ROI data type

To facilitate ROI-level meta-analyses, it probably makes sense to add an 'ROI' data type under contrasts (in addition to 'coordinates', 'images', 'connmat', etc.). Alternatively, maybe 'parcellation' is more general. Basically, we need to allow for a list of (possibly overlapping) multi-voxel parcels that have an associated value (or set of value), definition of the parcel (e.g., image mask), method of extraction, etc.

Add tqdm progress bars to permutations based methods

Refactor GCLDA model

The GCLDA model implemented here merges both the Dataset and Model classes implemented in the GCLDA package. As a result, the code is a bit bloated, and is probably less than efficient. We need to clean up the model and double check that my previous refactor didn't introduce any bugs. If there's any way for us to speed it up, that would be great too.

Zenodo

Add NiMARE to Zenodo
Enable github integration
Add .zenodo.json
Include information about .zenodo.json to PR template and contributors guidelines

Integrate peaks2maps

Current home: https://github.com/chrisfilo/peaks2maps

Export U-Net model as SavedModel
Add Peaks2mapsKernel class to https://github.com/neurostuff/NiMARE/blob/master/nimare/meta/cbma/kernel.py
The fitted model is large (500Mb) - we will need to put it on OSF and download it when it's needed

Q: Should we host code for retraining the model here as well? @tsalo @tyarkoni

Benchmark NiMARE ALE against GingerALE

We can run both versions with comparable parameters on a small dataset and compare both outputs duration.

Incorporating adapted MIT licensed code

For the image-based meta-analyses, we need to convert T images to Z. Dr. Sochat has a great package for doing just that (https://github.com/vsoch/TtoZ), which is released under the MIT license. I don't want to call the whole package, which is set up for the command line, so I'd like to take the core ~25 lines of code and use it here in a function in nimare.utils. Is there an appropriate way to do that?

Add File Drawer Frequency algorithm to diagnostics module

The File Drawer Frequency tool estimates the number of non-significant contrasts associated with a given large meta-analytic sample. We can implement this as a diagnostic that takes in a Dataset object and returns the predicted number of null findings that one would have included in the sample if not for publication bias. This may also be useful for generating and incorporating simulated null findings into meta-analyses to determine if meta-analytic results are robust to the file drawer problem. The latter application is mostly speculation, though.

Refactoring discussion thread

Following discussion, I thought it would be good to start a thread for discussion of the proposed refactoring of the codebase. The main suggestion so far is to rework the API so it's centered around a distinction between Transformer classes whose transform method returns a Dataset object, and Estimator classes whose fit method returns a MetaResult object. It sounds like almost every existing class fits fairly comfortably into one of these hierarchies. The package structure might then look something like this:

nimare.dataset (`Dataset`, `Study`, other data classes)
nimare.transformers
--nimare.transformers.base (base `MetaTransformer` class etc.)
--nimare.transformers.annotate (annotation tools)
--nimare.transformers.kernel (kernel estimation methods)
--nimare.transformers.decomposition (dimensionality-reducing transformations)
nimare.estimators
--nimare.estimators.base (base `MetaResult` class)
--nimare.estimators.cbma (CBMA estimators)
--nimare.estimators.ibma (IBMA estimators)
--nimare.estimators.decode (decoding methods)
--nimare.estimators.parcellate (parcellation methods)
nimare.utils
--nimare.utils.io (conversion utilities to/from NIMADS format and other formats)
--nimare.utils.stats (stats utilities)
--nimare.utils.cli (Command-Line Iinterface)
nimare.workflows (tools for constructing workflows/pipelines)

Thoughts?

Remove FSL dependency for IBMAs if possible

It would be preferable to limit NiMARE to Python dependencies, but we currently use FSL for the fixed and mixed effects GLM IBMAs. If possible, it would be good to find pure Python tools that we could use for this. Nistats may have the means to do this.

Should we use NIDM-Results?

NIDM-Results objects contain all of the information we could need from contrasts for NiMARE, plus the tools for parsing those objects (i.e., PyNIDM) seem pretty useful. I don’t know how stable the specification or the tools are (it looks like both are under constant development), but it seems like we could build the NiMARE database/dataset classes around the NIDM-Results objects.

I also don’t know whether it's possible or not to build the Results objects with partial information (e.g., just coordinates and metadata), but NeuroVault supports them and we could request that Brainspell output its database in that format as well. That should make it easier to download and merge everything, not to mention it would automatically link SE and contrast images for IBMAs (which has been something that's been vexing me).

ALE uses a multiproc pool even if one core is specified

It helps debugging and profiling if there's an easy way to run without multiproc pool.

Use Nipype for MALLET interface

We can include Nipype as an optional dependency and can wrap MALLET in an Interface instead of using poorly formatted subprocess calls.

Numbafy the code

@tsalo have you tried using Numba's JIT compiler to speed up parts of the meta-analytic computations? The stuff in stats will almost certainly benefit. The large estimator classes/functions like _run_ale probably won't improve much right now, but we should try to re-factor into smaller (and ideally reusable) chunks with an eye to numbafication.

Search methods to support in Datasets

Since NiMARE Datasets should support coordinates, images, annotations, and metadata, searching will be more involved than it is in Neurosynth (I assume). Currently I'm working on a few methods based on the data type.

For coordinates, we want to be able to search by an ROI mask or a coordinate based on distance or a fixed radius.

For annotations we can have searches based on full labels with a threshold for each label's value (since some annotation methods return weights instead of binary yes/no decisions), as well as more web-like searching.

For images, we need to be able to search based on image type (T, Z, contrast, etc.). Other than that, I'm not sure how one would typically search a Dataset for relevant images based on characteristics of the images themselves.

Is there any other metadata we should explicitly make searchable?

Have kernel classes return `Dataset` instances

The API might be more intuitive if the Kernel classes returned a Dataset instance, with the resulting images appended to the .images list of every Contrast. Per discussion with @tsalo, the internal logic of the .fit calls could move to module level for efficiency (e.g., if one needs to build up a null distribution of 10,000 sets off images without copying the Dataset that many times in memory).

Command line integration for common workflows

We want to be able to call common workflows (e.g., running full ALE on Sleuth text file) from the command line. We also want to include a boilerplate description of what was done for each workflow that people can use in their methods sections.

Missing resources

When running https://github.com/neurostuff/NiMARE/blob/master/examples/nidm_pain_meta-analyses.ipynb I got the following error:

No such file: 'c:\gdrive\workspace\nimare\nimare\resources\templates/MNI152_2mm_mask.nii.gz'

Should those templates be added to the repo?

Output cluster-level inference maps with cluster log(P) values

We currently output most cluster-level inference maps with Z or metric (e.g., ALE) values, thresholded at some alpha. We can (and should) output them with cluster-level log(P) values and no threshold. This would also eliminate the need for cluster-level threshold arguments.

Originally proposed by @nicholst.

Add badges to README and docs

Ultimately, we will want to include, at minimum, equivalents of the following:

Are there any others people would like to see added?

Documentation site not rendering on GitHub Pages

I fixed a couple of issues in the docs that were preventing the site from building locally, which I merged in #94. However, even though the GitHub Pages section in the repository's Settings says that the site is published at https://neurostuff.github.io/NiMARE/, the actual URL doesn't have anything.

Does anyone have any idea why this could be or how we can solve the issue?

NiMARE tutorial page

I took a stab at generating a Nipype-style tutorial page for NiMARE:
https://github.com/bilgelm/nimare_tutorial
https://bilgelm.github.io/nimare_tutorial/

To update the tutorials:

fork the repo
open index.ipynb (or any *.ipynb file inside notebooks/) in jupyter
make edits inside the first cell, then run the first cell and save
run update_pages.sh, which will generate an html file from each ipynb file
push to GitHub
in the GitHub repository Settings, set the GitHub Pages to be built from the /docs folder (thanks @djarecka for helping figure this out):

- I had to select a theme in order for the site to be published successfully, but this doesn't affect the layout

The tutorial is far from complete:

Most hyperlinks are just placeholders. (ATM only the "NiMARE" link under "Introduction" and the links under "Workflow Examples" lead to their respective pages. These pages themselves are not complete either.)
Need logos.
Binder hasn't been set up.
ipynb files in https://github.com/neurostuff/NiMARE/tree/master/examples have been copied into this tutorial repository. It'd be better to link them rather than make separate copies.
Someone else (@tsalo ?) should take over ownership of this tutorials repo.

rfx_glm does not use parallelization

Model storage

A few of the tools that are going to go into NiMARE will require using trained models. In some cases, like GCLDA, NiMARE should provide the tools for training the model, but it may be time-consuming enough that users might want to just use pre-trained models that are saved somewhere. In other cases, like the CogPO classifiers from the ATHENA project, NiMARE won't be able to provide the training materials, because the raw data (i.e., labels and texts) won't be publicly available.

I am considering using figshare, but wasn't sure if there were any other resources that are better or are more commonly used in the field. Does anyone have any thoughts?

stouffers and rfx_glm do not use parallelization

Talairach and UNKNOWN spaces not supported in Dataset initalization

To clarify: Talairach is supported, but only as TAL (as it's written in Neurosynth). Sleuth uses "Talairach" though, so we need to support that. Also, Neurosynth's label for unknown space ("UNKNOWN") isn't supported either, which should at least raise an error.

Transforming coordinates and images

@tyarkoni At what point should the coordinates and images be transformed to the same template? I know you had some thoughts on that, but I can't remember the specifics. I'm currently working on a function to download images from NeuroVault and coordinates from NeuroSynth (or maybe that should be brainspell) and merge the two into one Dataset object, and I'm not sure if I should transform everything into the same space as part of compiling the database or if I should leave it until a later step (e.g., the meta-analysis stage).

ALEKernel masks data after convolution

This means that signal for coordinates outside of the mask will be included. Was this intended?

Markdown versus ReStructuredText

I'm curious if @tsalo or anyone has thoughts about markdown versus restructured text for the documentation, in my project I bit the bullet and pretty much have everything using restructured text so it can be easily rendered by sphinx-text, but markdown is more common in the wild.

Expand docstrings with math

Once a Sphinx-based site is up, including the relevant math in the docstrings will make it easier for users to understand the different tools, including the meta-analysis algorithms, the topic modeling methods, and the functional decoders.

Dependencies

Is there a set of dependencies I should limit NiMARE to? For one, I was thinking that nltools.mask.create_spheres would make the KDA and MKDA kernel estimators much simpler, but I know that nltools is not as mature a package as something like nibabel.

Text storage in Dataset

I think we can add a dataset.texts DataFrame that can contain different kinds of text (e.g., abstracts).

Meta-analytic algorithms

Here's a list of the methods we plan to support and their current statuses in my branch:

Kernel-based coordinate-based meta-analyses:
- ALE
- SCALE
- MKDA Density Analysis
- MKDA Chi2 Analysis (Neurosynth speed-optimized version)
- MKDA Chi2 Analysis with empirical null
- KDA
Model-based coordinate-based meta-analyses:
- BHICP Available implementation only in C++
- HPGRF/BHPGM Available implementation only in C++
- SBLFR
- ~~SBR~~ Cannot currently be run on a full brain. No source code available.
Image-based meta-analyses:
- Fisher's
- Stouffer's FFX
- Stouffer's RFX with theoretical null
- Stouffer's RFX with empirical null
- Weighted Stouffer's
- FFX GLM
- RFX GLM with theoretical null
- RFX GLM with empirical null
- MFX GLM

Coordinates from example pain analysis do not match source ttl files

Looking at https://github.com/neurostuff/NiMARE/blob/master/nimare/tests/data/nidm_pain_dset_with_subpeaks.json or https://github.com/neurostuff/NiMARE/blob/master/nimare/tests/data/nidm_pain_dset_with_subpeaks_docker.json or https://github.com/neurostuff/NiMARE/blob/master/nimare/tests/data/nidm_pain_dset.json the coordinates do not match those listed in the source nidm files (looking at the ttl files, for example: https://www.neurovault.org/collections/1425/pain_01.nidm.ttl).

Were the peaks extracted from unthresholded maps instead of nidm files?

Create tutorials for common uses

We would like tutorials for:

Loading in data from NeuroVault, Neurosynth, etc.
Generating modeled activation maps for coordinates
Running meta-analyses
Common workflows (MACM, metaICA, etc.)

Implement word2brain GloVe annotation/encoding model

This preprint describes the word2brain application of the Global Vectors for Word Representation (GloVe) for annotation and encoding based on Neurosynth data. It would be good to implement this in NiMARE.

References

Nunes, Abraham. "word2brain." bioRxiv (2018): 299024. https://doi.org/10.1101/299024

A horse (as in The Nightmare)
A brain with tentacles and a face
A hydra (great idea from @62442katieb)

Don't hardcode `n_cores`

Better look up available cores using subprocess