GithubHelp home page GithubHelp logo

neurostuff / pymare Goto Github PK

View Code? Open in Web Editor NEW
46.0 8.0 10.0 373 KB

PyMARE: Python Meta-Analysis & Regression Engine

Home Page: https://pymare.readthedocs.io

License: MIT License

Python 100.00%
python meta-analysis

pymare's Introduction

PyMARE: Python Meta-Analysis & Regression Engine

A Python library for mixed-effects meta-regression (including meta-analysis).

Latest Version PyPI - Python Version License DOI Documentation Status GitHub CI Codecov

PyMARE is alpha software under heavy development; we reserve the right to make major changes to the API.

Quickstart

Install PyMARE from PyPI:

pip install pymare

Or for the bleeding-edge GitHub version:

pip install git+https://github.com/neurostuff/pymare.git

Suppose we have parameter estimates from 8 studies, along with corresponding variances, and a single (fixed) covariate:

y = np.array([-1, 0.5, 0.5, 0.5, 1, 1, 2, 10]) # study-level estimates
v = np.array([1, 1, 2.4, 0.5, 1, 1, 1.2, 1.5]) # study-level variances
X = np.array([1, 1, 2, 2, 4, 4, 2.8, 2.8]) # a fixed study-level covariate

We can conduct a mixed-effects meta-regression using restricted maximum-likelihood (ReML)estimation in PyMARE using the high-level meta_regression function:

from pymare import meta_regression

result = meta_regression(y, v, X, names=['my_cov'], add_intercept=True,
                         method='REML')
print(result.to_df())

This produces the following output:

         name   estimate        se   z-score     p-val  ci_0.025   ci_0.975
0  intercept  -0.106579  2.993715 -0.035601  0.971600 -5.974153   5.760994
1     my_cov   0.769961  1.113344  0.691575  0.489204 -1.412153   2.952075

Alternatively, we can achieve the same outcome using PyMARE's object-oriented API (which the meta_regression function wraps):

from pymare import Dataset
from pymare.estimators import VarianceBasedLikelihoodEstimator

# A handy container we can pass to any estimator
dataset = Dataset(y, v, X)
# Estimator class for likelihood-based methods when variances are known
estimator = VarianceBasedLikelihoodEstimator(method='REML')
# All estimators expose a fit_dataset() method that takes a `Dataset`
# instance as the first (and usually only) argument.
estimator.fit_dataset(dataset)
# Post-fitting we can obtain a MetaRegressionResults instance via .summary()
results = estimator.summary()
# Print summary of results as a pandas DataFrame
print(result.to_df())

And if we want to be even more explicit, we can avoid the Dataset abstraction entirely (though we'll lose some convenient validation checks):

estimator = VarianceBasedLikelihoodEstimator(method='REML')

# X must be 2-d; this is one of the things the Dataset implicitly handles.
X = X[:, None]

estimator.fit(y, v, X)

results = estimator.summary()

pymare's People

Contributors

jdkent avatar julioaperaza avatar nicholst avatar tsalo avatar tyarkoni avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

pymare's Issues

Python 3.5 support

Given the use of f-strings, we need to drop Python 3.5 support, but we can include 3.8. I can also add tests for 3.7 and 3.8 to the CircleCI config.

Invalid data type for einsum

I am trying to run a meta-analysis with the DerSimonian-Laird estimator on a toy example, but I am getting an odd error in einsum.

Data

meta_data.txt

educ age intercept educ_var age_var intercept_var site sample_size
0 -0.0343665 0.0685226 0.621957 0.053402 0.00518936 0.382996 a 833
1 0.337121 0.0831343 -0.754383 0.257379 0.0256497 1.6269 b 42
2 -0.131197 0.0794553 -0.551759 0.341987 0.027644 2.39215 c 22
3 0.56336 0.125616 -4.06326 0.29117 0.0278645 2.23611 d 12
4 1.47761 0.0410448 -4.48134 0.772219 0.0675251 3.46781 e 10
5 0.398824 -0.0435086 3.97413 0.687461 0.0618968 5.08491 f 7
6 0.122459 -0.016405 4.29702 0.437787 0.0430945 2.63758 g 18

Code

import pandas as pd
from pymare.estimators import DerSimonianLaird
from pymare import Dataset

meta_df = pd.read_table("meta_data.txt")

metamodel = DerSimonianLaird()
dset = Dataset(
    y=meta_df[["age", "educ"]].values, 
    v=meta_df[["age_var", "educ_var"]].values,
    add_intercept=True,
)
metamodel.fit_dataset(dset)

Traceback

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-11-26c7cae08262> in <module>
      5     add_intercept=True,
      6 )
----> 7 metamodel.fit_dataset(dset)

~/anaconda/envs/python3/lib/python3.6/site-packages/pymare/estimators/estimators.py in fit_dataset(self, dataset, *args, **kwargs)
     88 
     89         all_kwargs.update(kwargs)
---> 90         self.fit(*args, **all_kwargs)
     91         self.dataset_ = dataset
     92 

~/anaconda/envs/python3/lib/python3.6/site-packages/pymare/estimators/estimators.py in fit(self, y, v, X)
    195 
    196         # Estimate initial betas with WLS, assuming tau^2=0
--> 197         beta_wls, inv_cov = weighted_least_squares(y, v, X, return_cov=True)
    198 
    199         # Cochrane's Q

~/anaconda/envs/python3/lib/python3.6/site-packages/pymare/stats.py in weighted_least_squares(y, v, X, tau2, return_cov)
     24 
     25     # Einsum indices: k = studies, p = predictors, i = parallel iterates
---> 26     wX = np.einsum('kp,ki->ipk', X, w)
     27     cov = wX.dot(X)
     28 

<__array_function__ internals> in einsum(*args, **kwargs)

~/anaconda/envs/python3/lib/python3.6/site-packages/numpy/core/einsumfunc.py in einsum(out, optimize, *operands, **kwargs)
   1348         if specified_out:
   1349             kwargs['out'] = out
-> 1350         return c_einsum(*operands, **kwargs)
   1351 
   1352     # Check the kwargs to avoid a more cryptic error later, without having to

TypeError: invalid data type for einsum

Add effect size module

We should add functionality that allows easy conversion between effect size measures.

RTD build failing

There seem to be a couple of issues with the RTD build, but the one I'm currently seeing is a problem with sphinx_gallery_conf:

Configuration error:
Unknown key(s) in sphinx_gallery_conf:
'ignore_patterns', did you mean one of ['ignore_pattern', 'filename_pattern']?

I think I've seen this before, and I think it's due to a change in the Sphinx Gallery extension's interface. We can pin the appropriate version and fix the key.

Make CodeCov CI thresholds more liberal

Currently, PRs register as failing if there is any decrease in coverage. This is flagging perfectly good PRs (e.g., #51). We should loosen up the thresholds in the CI settings.

Datasets do not check input array shapes/sizes

To reproduce the problem, use the following code

from pymare import core, estimators

y = [
    [2, 4, 6],  # estimates for first study's three datasets
    [3, 2, 1],  # estimates for second study's three datasets
]
v = [
    [100, 100, 100],  # estimate variance for first study's three datasets
    [8, 4, 2],  # estimate variance for second study's three datasets
]
X = [  # all "dataset"s must have the same regressors
    [5, 9],  # regressors for first study
    [2, 8],  # regressors for second study
    [5, 5],  # regressors for imaginary third study
]

dataset = core.Dataset(y=y, v=v, X=X, X_names=["X1", "X7"], add_intercept=False)
est = estimators.WeightedLeastSquares().fit_dataset(dataset)
results = est.summary()

Here's the traceback:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-1-b48664cc0a71> in <module>
     16 
     17 dataset = core.Dataset(y=y, v=v, X=X, X_names=["X1", "X7"], add_intercept=False)
---> 18 est = estimators.WeightedLeastSquares().fit_dataset(dataset)
     19 results = est.summary()

~/Documents/tsalo/PyMARE/pymare/estimators/estimators.py in fit_dataset(self, dataset, *args, **kwargs)
     97 
     98         all_kwargs.update(kwargs)
---> 99         self.fit(*args, **all_kwargs)
    100         self.dataset_ = dataset
    101 

~/Documents/tsalo/PyMARE/pymare/estimators/estimators.py in fit(self, y, X, v)
    196             v = np.ones_like(y)
    197 
--> 198         beta, inv_cov = weighted_least_squares(y, v, X, self.tau2, return_cov=True)
    199         self.params_ = {"fe_params": beta, "tau2": self.tau2, "inv_cov": inv_cov}
    200         return self

~/Documents/tsalo/PyMARE/pymare/stats.py in weighted_least_squares(y, v, X, tau2, return_cov)
     33 
     34     # Einsum indices: k = studies, p = predictors, i = parallel iterates
---> 35     wX = np.einsum("kp,ki->ipk", X, w)
     36     cov = wX.dot(X)
     37 

<__array_function__ internals> in einsum(*args, **kwargs)

/opt/miniconda3/lib/python3.8/site-packages/numpy/core/einsumfunc.py in einsum(out, optimize, *operands, **kwargs)
   1357         if specified_out:
   1358             kwargs['out'] = out
-> 1359         return c_einsum(*operands, **kwargs)
   1360 
   1361     # Check the kwargs to avoid a more cryptic error later, without having to

ValueError: operands could not be broadcast together with remapped shapes [original->remapped]: (3,2)->(2,3) (2,3)->(3,newaxis,2) 

Note that the exception is raised when the Estimator is fitted, rather than when the Dataset is initialized, and it's the product of a mathematical operation, rather than an explicit check.

NOTE: Related to #37.

Revisit permutation test methodology

In working on #101, I've come across a few things in the permutation test methods that confuse me.

First, the permutation tests loop over datasets and parallelize across permutations. This makes sense in a non-imaging context, when you won't have many, if any, parallel datasets. However, in neuroimaging meta-analyses, you'll typically have many more parallel datasets (e.g., voxels) than permutations. Would it make sense to flip the approach in PyMARE, or would that cause too many problems for non-imaging meta-analyses?

Second, I'm comparing PyMARE's approach to Nilearn's permuted_ols function. I've noticed that there are a few steps in Nilearn's procedure that aren't in PyMARE, including some preprocessing done on the target_vars (y), tested_vars (X), and confounding_vars (also X). Should we (1) adopt this step and/or (2) treat confounding variables differently from tested variables?

Support estimators that don't require study-level variances

Currently, all supported estimators depend on the availability of both estimates and variances from individual studies. We should add estimation capabilities that relax this requirement.

One option is to add separate estimation methods like Fisher's method, Stouffer's z, or a t-test-style random-effects model, but an alternative I think I prefer is to add a weights argument to any estimator that accepts a variances argument (currently all of them). Then, passing an array to weights would take precedence over 1/variance, but users could also specify weights based on, e.g., 1/root(N), or just unit weights, giving us the equivalent of the standalone methods without having to add non-standard and generally crummy estimators to the package.

Problems with PyStan

PyStan's API changed with version 3, which requires Python 3.7+. I have tried to update the StanMetaRegression code to work with the new API in #66, but what appears to be the same general method is now failing.

At the moment, I am going to probably just skip the StanMetaRegression test and add a warning to the class's docstring. I don't have the time or expertise necessary to debug this ATM.

Convert permutation tests to functions

MetaRegressionResult.permutation_test and CombinationTestResult.permutation_test both require access to the Result object, which is a bit much for NiMARE Estimators to hold onto between the fitting and correction steps. If we could convert the methods to functions, and possibly distinguish the null distribution creation step from the correction step, I think that could help NiMARE.

This stems from neurostuff/NiMARE#278.

Implement NiMARE IBMA estimators in PyMARE

  • Fisher's
  • Stouffer's (t-test on z-statistics)
  • Stouffer's with empirical null distribution (null derived from sign flipping)
  • Weighted Stouffer's
  • RFX GLM (t-test on estimates)
  • RFX GLM with empirical null distribution (null derived from sign flipping)

Decide on a specific scope of PyMARE

Need to clarify, once and for all, which meta-analysis functionality will and will not be supported by PyMARE.

Presently we have, for continuous Gaussian responses

  • Fixed effects meta-regression
  • Random effects meta-regression
  • Random effects meta-regression, where study-level variance is unknown
  • "Combining" measures, Fishers & Stouffer's (coming soon)

The plan is to provide conversion tools so that the following types that don't exactly fit into that framework can be converted and used:

  • Correlation coefficients
  • Cohen's d effect sizes & standardised mean differences

The long list of possible to-be-included tools include:

  • Cochran's Q - Heterogeneity statistic
  • 'Stouffer's Meta Regression', basically a fixed effects meta-analysis on Z-scores -- not a standard method and never considered in traditional meta-analysis, but might have a use in NMA
  • 'Stouffer's Random Effects Regression' - Analyze Z-scores as if they were Gaussian response with homogeneous variance
  • Permutation - But need to consider how to do this vectorised, considering the NMA use case.

(NMA=Neuroimaging Meta Analysis)

Expressions file not packaged with library

When you pip install pymare, the expressions file is not bundled with the library, which causes an error when importing the library.

>>> import pymare
In /Users/tsalo/anaconda/envs/python3/lib/python3.6/site-packages/matplotlib/mpl-data/stylelib/_classic_test.mplstyle: 
The savefig.frameon rcparam was deprecated in Matplotlib 3.1 and will be removed in 3.3.
In /Users/tsalo/anaconda/envs/python3/lib/python3.6/site-packages/matplotlib/mpl-data/stylelib/_classic_test.mplstyle: 
The verbose.level rcparam was deprecated in Matplotlib 3.1 and will be removed in 3.3.
In /Users/tsalo/anaconda/envs/python3/lib/python3.6/site-packages/matplotlib/mpl-data/stylelib/_classic_test.mplstyle: 
The verbose.fileo rcparam was deprecated in Matplotlib 3.1 and will be removed in 3.3.
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/tsalo/anaconda/envs/python3/lib/python3.6/site-packages/pymare/__init__.py", line 2, in <module>
    from .effectsize import (OneSampleEffectSizeConverter,
  File "/Users/tsalo/anaconda/envs/python3/lib/python3.6/site-packages/pymare/effectsize/__init__.py", line 1, in <module>
    from .base import (OneSampleEffectSizeConverter, TwoSampleEffectSizeConverter,
  File "/Users/tsalo/anaconda/envs/python3/lib/python3.6/site-packages/pymare/effectsize/base.py", line 11, in <module>
    from .expressions import select_expressions
  File "/Users/tsalo/anaconda/envs/python3/lib/python3.6/site-packages/pymare/effectsize/expressions.py", line 48, in <module>
    one_sample_expressions, two_sample_expressions = _load_expressions()
  File "/Users/tsalo/anaconda/envs/python3/lib/python3.6/site-packages/pymare/effectsize/expressions.py", line 36, in _load_expressions
    expr_list = json.load(open(path, 'r'))
FileNotFoundError: [Errno 2] No such file or directory: '/Users/tsalo/anaconda/envs/python3/lib/python3.6/site-packages/pymare/effectsize/expressions.json'

Thanks to @akimbler for discovering the bug.

CombinationTest p-values incorrect

I am working on a more comprehensive example for Estimators, and I noticed that the p-value for Stouffer's results was close to 1.0, even though I don't think it should be.

Please see this notebook for the code used to see this.

EDIT: To summarize the thread below- the p-values are being set, but they're incorrect. We just need to change the scipy.stats function used to calculate the p-values to fix the bug.

PyMARE to estimate on higher dimensional data

We are trying to use a pymare-based second-level estimator (WeightedLeastSquares) in fitlins. See fitilins issue#290

cc: @effigies

Currently, I am not sure how to use pymare to fit the effect_maps from the first-level model at the Subject-level. Each subject has 4 runs. Here's what I tried to do:

Following are the estimates and variances we get as output from nistats' FirstLevelModel.compute_contrasts

>>> print(filtered_effects)
['/home/shank/Stanford/multiverse/temp_stats/run/01_loss_effect_size.nii.gz',
'/home/shank/Stanford/multiverse/temp_stats/run/02_loss_effect_size.nii.gz',
'/home/shank/Stanford/multiverse/temp_stats/run/03_loss_effect_size.nii.gz', 
'/home/shank/Stanford/multiverse/temp_stats/run/04_loss_effect_size.nii.gz']

>>> print(filtered_variances)
 ['/home/shank/Stanford/multiverse/temp_stats/run/01_loss_effect_variance.nii.gz',
'/home/shank/Stanford/multiverse/temp_stats/run/02_loss_effect_variance.nii.gz',
'/home/shank/Stanford/multiverse/temp_stats/run/03_loss_effect_variance.nii.gz',
'/home/shank/Stanford/multiverse/temp_stats/run/04_loss_effect_variance.nii.gz']

Next I tried to squeeze them into a single np array and use the WeightedLeastSquares

effect_data = np.squeeze([nb.load(effect).get_fdata(dtype='f4')
                                          for effect in filtered_effects])

variance_data = np.squeeze([nb.load(effect).get_fdata(dtype='f4')
                                          for effect in filtered_variances])

>>> print(effect_data.shape, variance_data.shape)

(4, 97, 115, 97) (4, 97, 115, 97)

When I try to fit the estimator, I get the following error:

X = np.array(coll.X)
print(X)
[[1]
 [1]
 [1]
 [1]]

d = pymare.Dataset(effect_data, variance_data, X)
estimator = WeightedLeastSquares()
estimator.fit_dataset(d)
Error
ValueError                                Traceback (most recent call last)
<ipython-input-23-431d31eb7e6f> in <module>
     35 
     36 estimator = WeightedLeastSquares()
---> 37 estimator.fit_dataset(d)
     38 
     39 # print(estimator.summary())

~/miniconda3/envs/fitlins38/lib/python3.8/site-packages/pymare/estimators/estimators.py in fit_dataset(self, dataset, *args, **kwargs)
     88 
     89         all_kwargs.update(kwargs)
---> 90         self.fit(*args, **all_kwargs)
     91         self.dataset_ = dataset
     92 

~/miniconda3/envs/fitlins38/lib/python3.8/site-packages/pymare/estimators/estimators.py in fit(self, y, X, v)
    161         if v is None:
    162             v = np.ones_like(y)
--> 163         beta, inv_cov = weighted_least_squares(y, v, X, self.tau2,
    164                                                return_cov=True)
    165         self.params_ = {'fe_params': beta, 'tau2': self.tau2, 'inv_cov': inv_cov}

~/miniconda3/envs/fitlins38/lib/python3.8/site-packages/pymare/stats.py in weighted_least_squares(y, v, X, tau2, return_cov)
     24 
     25     # Einsum indices: k = studies, p = predictors, i = parallel iterates
---> 26     wX = np.einsum('kp,ki->ipk', X, w)
     27     cov = wX.dot(X)
     28 

<__array_function__ internals> in einsum(*args, **kwargs)

~/miniconda3/envs/fitlins38/lib/python3.8/site-packages/numpy/core/einsumfunc.py in einsum(out, optimize, *operands, **kwargs)
   1348         if specified_out:
   1349             kwargs['out'] = out
-> 1350         return c_einsum(*operands, **kwargs)
   1351 
   1352     # Check the kwargs to avoid a more cryptic error later, without having to

ValueError: operand has more dimensions than subscripts given in einstein sum, but no '...' ellipsis provided to broadcast the extra dimensions.

Automatically compute the Jacobian for use in minimization

Currently the ML/REML estimates are generated via derivative-free optimization (using the BFGS solver). This works fine for the test cases, but is likely to be slower and less stable than using the gradients. It should be straightforward to compute the Jacobian matrix for the log-likelihood functions via Jax/autograd and pass it to scipy.optimize.minimize alongside the negative LL.

Run CI regularly (nightly?)

At the moment, this package isn't really under active development, so we should probably have some kind of regular CI running with the most recent versions of all of PyMARE's dependencies.

Improved packaging

The current setup.py is pretty barebones. We should beef it up, add versioneer support, make sure we have a separate installation option that includes the optional dependencies (e.g., PyStan, ArviZ), etc.

Read expression list from JSON file and allow user to load custom list

With an eye to NiMARE integration, it would be nice to allow users to override the list of expressions used in effect size conversion. An easy way to support this would be to read the expression in from JSON, and then create a manager/config class that can take an external JSON file containing the expressions).

Add permutation-based p-value and CIs

Currently, for the frequentist estimators, p-values and CIs are computed using WLS for the fixed effects and the Q-profile method for tau^2. We may want to consider adding support for permutation-based stats.

Clarify or check Estimator input shapes

I was trying to run one of the Estimators using weighted_least_squares without initializing a Dataset and was getting confusing errors from numpy.einsum before I realized that inputs need to be 2D no matter what. We can coerce 1D inputs to 2D with a new Estimator._validate_inputs() method, or we can just update the docstrings to clarify requirements.

BTW, based on variable convention, I think it's reasonable to assume that X must be 2D, but it's not obvious that y, v, etc. should be 2D as well, and indeed, there's an obvious error about shape if X is 1D, but no shape check for y, v, etc.

To replicate:

y = np.random.random(10)
v = np.random.random(10) ** 2
X = np.random.random((10, 1))
est = pymare.estimators.DerSimonianLaird()
est.fit(y=y, v=v, X=X)
print(est.results.to_df())

Results in:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-4-41d64261b276> in <module>()
      3 X = np.random.random((10, 1))
      4 est = pymare.estimators.DerSimonianLaird()
----> 5 est.fit(y=y, v=v, X=X)
      6 print(est.results.to_df())

~/Documents/tsalo/PyMARE/pymare/estimators/estimators.py in fit(self, dataset, **kwargs)
     78                     kwargs[name] = getattr(dataset, name)
     79 
---> 80         self.params_ = self._fit(**kwargs)
     81         self.dataset_ = dataset
     82 

~/Documents/tsalo/PyMARE/pymare/estimators/estimators.py in _fit(self, y, v, X)
    180 
    181         # Estimate initial betas with WLS, assuming tau^2=0
--> 182         beta_wls, inv_cov = weighted_least_squares(y, v, X, return_cov=True)
    183 
    184         # Cochrane's Q

~/Documents/tsalo/PyMARE/pymare/stats.py in weighted_least_squares(y, v, X, tau2, return_cov)
     24 
     25     # Einsum indices: k = studies, p = predictors, i = parallel iterates
---> 26     wX = np.einsum('kp,ki->ipk', X, w)
     27     cov = wX.dot(X)
     28 

<__array_function__ internals> in einsum(*args, **kwargs)

~/anaconda/envs/python3/lib/python3.6/site-packages/numpy/core/einsumfunc.py in einsum(*operands, **kwargs)
   1354     # If no optimization, run pure einsum
   1355     if optimize_arg is False:
-> 1356         return c_einsum(*operands, **kwargs)
   1357 
   1358     valid_einsum_kwargs = ['out', 'dtype', 'order', 'casting']

ValueError: einstein sum subscripts string contains too many subscripts for operand 1

Maximum likelihood-based Estimators return singleton standard errors

import numpy as np
import pymare

y = np.random.random((20, 973))
v = np.random.random((20, 973))

print("VarianceBasedLikelihoodEstimator")
est = pymare.estimators.VarianceBasedLikelihoodEstimator(method="reml")
est.fit_dataset(pymare.Dataset(y=y, v=v))
est_summary = est.summary()
fe_stats = est_summary.get_fe_stats()
for name, arr in fe_stats.items():
    print(f"{name}: {arr.shape}")

print("\nHedges")
est = pymare.estimators.Hedges()
est.fit_dataset(pymare.Dataset(y=y, v=v))
est_summary = est.summary()
fe_stats = est_summary.get_fe_stats()
for name, arr in fe_stats.items():
    print(f"{name}: {arr.shape}")
VarianceBasedLikelihoodEstimator
est: (1, 973)
se: (1, 1)
ci_l: (1, 973)
ci_u: (1, 973)
z: (1, 973)
p: (1, 973)

Hedges
est: (1, 973)
se: (1, 973)
ci_l: (1, 973)
ci_u: (1, 973)
z: (1, 973)
p: (1, 973)

Note that se for VarianceBasedLikelihoodEstimator is (1, 1), but for Hedges it's (1, 973). I don't know if this is the expected behavior, but it surprised me in neurostuff/NiMARE#691.

Extend effect size calculation

After discussion, we've decided do the following:

  • Retain the current EffectSizeCalculator in more or less the same form.
  • Wrap the ES calculator with a function that passes all inputs through the the calculator, with the exception of a measure argument that specifies what the target measure for analysis is. The returned value is a Dataset, where the key attributes are now set to the transformed values (e.g., if measure="d", estimates is the output of EffectSizeCalculator.to_d(), etc.) . This function is roughly equivalent to metafor's escalc.
  • Evaluate whether we can adopt some/all of metafor's argument names as inputs to EffectSizeCalculator (and in the list of Expressions).

Negative tau2 error in DerSimonianLaird random effects estimation

Updated summary

There is a bug in Estimators wherein, during random-effects stats estimation, tau2 may drop below zero and raise an exception. I don't know what's causing this, but for documentation of the actual bug, see #40 (comment).

To replicate (on my laptop):

import numpy as np
from pymare.stats import q_profile
y = np.array([0.0000000e+00, 0.0000000e+00, 0.0000000e+00, 0.0000000e+00, -5.2165329e-01, 7.4963748e-01, 3.3219385e-01, -1.2351226e+00, 1.7527617e+00, -3.0110748e+00,  1.0946554e+02, -4.8926108e-02, 0.0000000e+00,  0.0000000e+00,  0.0000000e+00, -6.2120113e+01, -2.8613630e+01, -1.2308966e+01,  5.3221474e+00,  0.0000000e+00, -9.2870388e+00])
v = np.array([0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 5.54480810e-01, 1.21734442e-01, 2.25415160e-01, 1.74327830e+00, 6.00752247e-01, 1.43908420e+00, 3.25888647e+03, 4.66549950e+01, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 2.91479907e+03, 2.15909521e+03, 2.17931152e+03, 1.63775589e+02, 0.00000000e+00, 1.01077698e+02])
X = np.array([[1.], [1.], [1.], [1.], [1.], [1.], [1.], [1.], [1.], [1.], [1.], [1.], [1.], [1.], [1.], [1.], [1.], [1.], [1.], [1.], [1.]])
alpha = 0.05
q_cis = q_profile(y=y, X=X, v=v, alpha=alpha)

Original post

I tried running the DerSimonianLaird estimator on the 21-pain dataset we use in NiMARE to test IBMA methods. There is apparently some mismatch between NiMARE's default brain mask and the maps in the Dataset, so some voxels have all zeroes for both beta and varcope maps. These zero voxels lead to divide-by-zero issues in the estimator, leading to NaNs in the tau2 estimates and an exception when trying to get RE stats.

For an example, see this notebook.

Add mega-analysis vs. meta-analysis example

@jdkent and I were working on a data exercise comparing mega- and meta-analysis for the ABCD-ReproNim course, and I was thinking that the general approach of taking a multi-site dataset, running a random intercepts model taking site into account, and comparing that to a "meta-analysis" where each site is treated as its own study would be a good addition to the PyMARE documentation.

Here is a basic attempt at this: https://github.com/tsalo/misc-notebooks/blob/master/run_mega_and_meta.ipynb

Move statistic conversion functions from NiMARE to PyMARE

If we want PyMARE to be flexible, then I think we need to account for different types of data. NiMARE already includes a bunch of functions for converting a lot of different values, so I think we should just migrate those functions over here and add some examples using them.

The new datasets module should ultimately be able to provide us with a range of meta-analytic datasets that have different types of data that may need to be converted before we can put them in Dataset objects.

Add optional study identifiers to `Dataset`

This stems from #91. If we want to make it easy to identify outliers, we need to be able to identify individual studies. We could perhaps add an id parameter to Dataset.__init__(), which would default to a simple integer index.

Sympy 1.10 removes `sympy.compatibility.core`

This stems from neurostuff/NiMARE#654.

ImportError while loading conftest '/home/runner/work/NiMARE/NiMARE/nimare/tests/conftest.py'.
[11](https://github.com/neurostuff/NiMARE/runs/5454997736?check_suite_focus=true#step:5:11)
nimare/__init__.py:11: in <module>
[12](https://github.com/neurostuff/NiMARE/runs/5454997736?check_suite_focus=true#step:5:12)
    from . import (
[13](https://github.com/neurostuff/NiMARE/runs/5454997736?check_suite_focus=true#step:5:13)
nimare/decode/__init__.py:3: in <module>
[14](https://github.com/neurostuff/NiMARE/runs/5454997736?check_suite_focus=true#step:5:14)
    from . import continuous, discrete, encode
[15](https://github.com/neurostuff/NiMARE/runs/5454997736?check_suite_focus=true#step:5:15)
nimare/decode/continuous.py:14: in <module>
[16](https://github.com/neurostuff/NiMARE/runs/5454997736?check_suite_focus=true#step:5:16)
    from ..meta.cbma.base import CBMAEstimator
[17](https://github.com/neurostuff/NiMARE/runs/5454997736?check_suite_focus=true#step:5:17)
nimare/meta/__init__.py:3: in <module>
[18](https://github.com/neurostuff/NiMARE/runs/5454997736?check_suite_focus=true#step:5:18)
    from . import ibma, kernel
[19](https://github.com/neurostuff/NiMARE/runs/5454997736?check_suite_focus=true#step:5:19)
nimare/meta/ibma.py:7: in <module>
[20](https://github.com/neurostuff/NiMARE/runs/5454997736?check_suite_focus=true#step:5:20)
    import pymare
[21](https://github.com/neurostuff/NiMARE/runs/5454997736?check_suite_focus=true#step:5:21)
/opt/hostedtoolcache/Python/3.7.12/x64/lib/python3.7/site-packages/pymare/__init__.py:2: in <module>
[22](https://github.com/neurostuff/NiMARE/runs/5454997736?check_suite_focus=true#step:5:22)
    from .effectsize import (OneSampleEffectSizeConverter,
[23](https://github.com/neurostuff/NiMARE/runs/5454997736?check_suite_focus=true#step:5:23)
/opt/hostedtoolcache/Python/3.7.12/x64/lib/python3.7/site-packages/pymare/effectsize/__init__.py:1: in <module>
[24](https://github.com/neurostuff/NiMARE/runs/5454997736?check_suite_focus=true#step:5:24)
    from .base import (OneSampleEffectSizeConverter, TwoSampleEffectSizeConverter,
[25](https://github.com/neurostuff/NiMARE/runs/5454997736?check_suite_focus=true#step:5:25)
/opt/hostedtoolcache/Python/3.7.12/x64/lib/python3.7/site-packages/pymare/effectsize/base.py:11: in <module>
[26](https://github.com/neurostuff/NiMARE/runs/5454997736?check_suite_focus=true#step:5:26)
    from .expressions import select_expressions
[27](https://github.com/neurostuff/NiMARE/runs/5454997736?check_suite_focus=true#step:5:27)
/opt/hostedtoolcache/Python/3.7.12/x64/lib/python3.7/site-packages/pymare/effectsize/expressions.py:9: in <module>
[28](https://github.com/neurostuff/NiMARE/runs/5454997736?check_suite_focus=true#step:5:28)
    from sympy.core.compatibility import exec_
[29](https://github.com/neurostuff/NiMARE/runs/5454997736?check_suite_focus=true#step:5:29)
E   ImportError: cannot import name 'exec_' from 'sympy.core.compatibility'

Results objects break if user called fit() rather than fit_dataset()

The various results classes all currently require access to the estimator's last-fitted Dataset, which won't exist now following the API change to allow fit() to be called with numpy arrays. The solution is to either implicitly create and store a Dataset, which would be easier, but feels counterproductive (since the point of eliminating Dataset inputs from fit() was to cut down overhead), or to directly store references to the input arrays rather than to a Dataset in the estimator (in which case we have to add a bunch of code to every estimator, and lose some of the benefits of the `Dataset abstraction).

Handling of missing data in Estimators

Per @tyarkoni in #40 (comment):

I don't recommend treating voxels with missing data for studies as if the estimates and variances are 0. My naive expectation is that PyMARE estimators will always return a NaN value if you do that (if they don't, let me know!), which would be fine since you could just mask those voxels out. BUT even if that's true, I don't think you want to rely on PyMARE to do the right thing here. If you know you have missing data for at least one study at a voxel, I suggest either setting the estimates and variances for those studies to NaN or (better) just masking them out before you hand off to PyMARE in the first place.

Using DerSimonianLaird as an example estimator, here is the behavior I've found:

  • Any NaNs in y --> tau2 = NaN
  • All zeros in y --> tau2 = 0
  • Any NaNs in v --> tau2 = NaN
  • Any zeros in v --> tau2 = NaN

I think the behavior we want is to ignore any studies with NaNs in either y or v, and maybe raise an error when v is 0 (with the recommendation to fill missing data with NaNs instead of zeros) and raise a warning when y is 0. And of course, if all studies have NaN, then all parameters estimated by the estimator should be NaN.

It would then be the user's responsibility to ensure that missing data is represented with NaNs. On NiMARE's side, I can open an issue to do this automatically in MetaEstimator.fit(). I don't think masking is feasible for NiMARE, given that missing data may vary by both study and voxel, so we'd end up with varying numbers of studies contributing to each voxel's meta-analysis.

WDYT?

Releases on Zenodo

Now that PyMARE is releasing on PyPi, should we also do the same on Zenodo?

Beef up the results objects

The current MetaRegressionResults object is pretty barebones and intended only as a prototype. Minimally, we should:

  • Implement the skeleton summary() and plot() methods.
  • Add a __repr__ method (possibly just aliased to summary()) that includes information about the dataset and estimation (the current to_df() method provides only estimate details).
  • Consider better representations of the internal parameters and associated stats, which are currently all stored in a dict.

Settings for metafor comparison

I'm having trouble determining how metafor was called to produce the values we're testing against in our CI. @tyarkoni do you happen to have that info available?

For example, to compare DerSimonian-Laird estimator results, I did the following:

metafor

library(metafor)

y <- c(-1, 0.5, 0.5, 0.5, 1, 1, 2, 10)
v <- c(1, 1, 2.4, 0.5, 1, 1, 1.2, 1.5)

mod <- rma(y, v, method="DL")
print(mod)
Random-Effects Model (k = 8; tau^2 estimator: DL)

tau^2 (estimated amount of total heterogeneity): 7.6337 (SE = 4.8874)
tau (square root of estimated tau^2 value):      2.7629
I^2 (total heterogeneity / total variability):   88.02%
H^2 (total variability / sampling variability):  8.35

Test for Heterogeneity:
Q(df = 7) = 58.4539, p-val < .0001

Model Results:

estimate      se    zval    pval    ci.lb   ci.ub 
  1.7679  1.0491  1.6852  0.0920  -0.2883  3.8241  . 

---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

PyMARE

import numpy as np
from pymare import Dataset, estimators

y = np.array([-1, 0.5, 0.5, 0.5, 1, 1, 2, 10])
v = np.array([1, 1, 2.4, 0.5, 1, 1, 1.2, 1.5])
X = np.ones(y.shape)
dset = Dataset(y=y, v=v, X=X)

est = estimators.DerSimonianLaird()
est.fit_dataset(dset)
res = est.summary()
res.to_df()
name estimate se z-score p-value ci_0.025 ci_0.975
intercept 0.884331 0.528961 1.67182 0.0945589 -0.152414 1.92108
0 0.884331 0.528961 1.67182 0.0945589 -0.152414 1.92108

Summary

The results look quite different (e.g., the metafor estimate is >2x the PyMARE estimate), so I think I'm doing it wrong.

Use masked arrays in computations

Related to #9, we should add support for masked arrays wherever possible—this will allow vectorized estimation even when the studies in parallel datasets differ (i.e., users pass in NaN values in different studies for different datasets).

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.