GithubHelp home page GithubHelp logo

rishi-kulkarni / hierarch Goto Github PK

View Code? Open in Web Editor NEW
6.0 1.0 0.0 704 KB

Resampling-Based Hypothesis Testing for Python

License: MIT License

Python 100.00%
hypothesis-tests resampling-strategies bootstrapping-statistics permutation-statistics

hierarch's Introduction

hierarch

A Hierarchical Resampling Package for Python

Version 1.1.6

hierarch is a package for hierarchical resampling (bootstrapping, permutation) of datasets in Python. Because for loops are ultimately intrinsic to cluster-aware resampling, hierarch uses Numba to accelerate many of its key functions.

hierarch has several functions to assist in performing resampling-based (and therefore distribution-free) hypothesis tests, confidence interval calculations, and power analyses on hierarchical data.

Table of Contents

  1. Introduction
  2. Setup
  3. Documentation
  4. Citation

Introduction

Design-based randomization tests represents the platinum standard for significance analyses [1, 2, 3] - that is, they produce probability statements that depend only on the experimental design, not at all on less-than-verifiable assumptions about the probability distributions of the data-generating process. Researchers can use hierarch to quickly perform automated design-based randomization tests for experiments with arbitrary levels of hierarchy.

[1] Tukey, J.W. (1993). Tightening the Clinical Trial. Controlled Clinical Trials, 14(4), 266-285.

[2] Millard, S.P., Krause, A. (2001). Applied Statistics in the Pharmaceutical Industry. Springer.

[3] Berger, V.W. (2000). Pros and cons of permutation tests in clinical trials. Statistics in Medicine, 19(10), 1319-1328.

Setup

Dependencies

  • numpy
  • pandas (for importing data)
  • numba
  • scipy (for power analysis)

Installation

The easiest way to install hierarch is via PyPi.

pip install hierarch

Alternatively, you can install from Anaconda.

conda install -c rkulk111 hierarch

Documentation

Check out our user guide at readthedocs.

Citation

If hierarch helps you analyze your data, please consider citing it. The manuscript also contains a set of simulations validating hierarchical randomization tests in a variety of conditions.

Kulkarni RU, Wang CL, Bertozzi CR (2022) Analyzing nested experimental designs—A user-friendly resampling method to determine experimental significance. PLoS Comput Biol 18(5): e1010061. https://doi.org/10.1371/journal.pcbi.1010061

hierarch's People

Contributors

dependabot[bot] avatar rishi-kulkarni avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

hierarch's Issues

Allow to only receive resampling indices (make hierarch agnostic to specific statistical tests).

I am currently working with data from the Human Connectome Project. The dataset contains a lot of siblings, sometimes even dizygotic or even monozygotic twins. Basically for all my statistical tests, I would have to consider this nested structure when computing p-values (which I would like to derive from permutation test) or confidence intervals (which I would like to obtain from bootstrapping tests). This is how the demographic data would look like:

image

The problem is, that I would like to conduct multivariate tests (i.e. CCA to be more specific). Which means that besides from the demographic information that could be used to conduct a hierarchical resampling procedure, I also have matrices X and y that contain the "actual data" and that I would use to perform my statistical test. When looking at the Hypothesis Testing section, I can see that hierarch currently seems to be restricted to simple univariate tests like t-tests and ANOVA? Do you think it would be feasible to allow users to only obtain resampled indices so they could apply their own tests?

This is my current approach which is quite tedious:
1.) Use FSL-Palm to derive exchangability blocks using the hcp2blocks.m function
2.) Use a python implementation of FSL-Palm's quicker function (skpalm.permutations.quickperms) to derive a matrix of reampled indices.

Implement helper function to create design matrix?

Hi! As an add-on to #127 , it might not only be nice just to obtain hierarchically resampled indices (so that users are independent from using one of the pre-defined tests provided by hierarch) but also to have helper function that allows you to create a design matrix that is needed as input to create the indices in the first place. For example, right now I am using

[...] FSL-Palm to derive exchangability blocks using the hcp2blocks.m function

This function takes over the tedious work of manually defining the EB-matrix yourself. For example consider the following dataframe - Some subjects in the HCP sample are completely unrelated to anyone (Family 2 & 5 only appear one time). Some subjects are related (having the same family id) but are regular sibilings. Some other subjects are also related, and on top they can be monozygotic or dizygotic twins. Here it get's a philosophical because MZ-twins can be considered as clones whereas for DZ-twins the hcp2blocks.m function allows the user to decide if DZ-twins should be treated as regular siblings or clones. In my case however, I also have repeated measures for each subject (which can not be interpreted by hcp2blocks.m). I fitted a mixed model (value ~ 1 + (1 | Subject)) and would like to obtain hierarchically resampled indices that respect the hierarchical structure of the HCP-dataset. Do you think it would make sense to implement such a function?

image

Here's the code to regenerate the dataframe:

df = pd.DataFrame( {'Subject': {0: 1, 1: 1, 2: 1, 3: 2, 4: 2, 5: 2, 6: 3, 7: 3, 8: 3, 9: 4, 10: 4, 11: 4, 12: 5, 13: 5, 14: 5, 15: 6, 16: 6, 17: 6, 18: 7, 19: 7, 20: 7, 21: 8, 22: 8, 23: 8}, 'Family_ID': {0: 1, 1: 1, 2: 1, 3: 2, 4: 2, 5: 2, 6: 3, 7: 3, 8: 3, 9: 4, 10: 4, 11: 4, 12: 1, 13: 1, 14: 1, 15: 4, 16: 4, 17: 4, 18: 5, 19: 5, 20: 5, 21: 3, 22: 3, 23: 3}, 'ZygosityGT': {0: 'MZ', 1: 'MZ', 2: 'MZ', 3: 'NoTwin', 4: 'NoTwin', 5: 'NoTwin', 6: 'NoTwin', 7: 'NoTwin', 8: 'NoTwin', 9: 'DZ', 10: 'DZ', 11: 'DZ', 12: 'MZ', 13: 'MZ', 14: 'MZ', 15: 'DZ', 16: 'DZ', 17: 'DZ', 18: 'NoTwin', 19: 'NoTwin', 20: 'NoTwin', 21: 'NoTwin', 22: 'NoTwin', 23: 'NoTwin'}, 'condition': {0: 1, 1: 2, 2: 3, 3: 1, 4: 2, 5: 3, 6: 1, 7: 2, 8: 3, 9: 1, 10: 2, 11: 3, 12: 1, 13: 2, 14: 3, 15: 1, 16: 2, 17: 3, 18: 1, 19: 2, 20: 3, 21: 1, 22: 2, 23: 3}, 'value': {0: 0.7739560485559633, 1: 0.4388784397520523, 2: 0.8585979199113825, 3: 0.6973680290593639, 4: 0.09417734788764953, 5: 0.9756223516367559, 6: 0.761139701990353, 7: 0.7860643052769538, 8: 0.12811363267554587, 9: 0.45038593789556713, 10: 0.37079802423258124, 11: 0.9267649888486018, 12: 0.6438651200806645, 13: 0.82276161327083, 14: 0.44341419882733113, 15: 0.2272387217847769, 16: 0.5545847870158348, 17: 0.06381725610417532, 18: 0.8276311719925821, 19: 0.6316643991220648, 20: 0.7580877400853738, 21: 0.35452596812986836, 22: 0.9706980243949033, 23: 0.8931211213221977}} )

Create bootstrapped datasets for regression problems?

Hi @rishi-kulkarni, I would like to use your package to create a list of bootstrapped datasets (again referring to HCP data), but I noticed that hierarch.resampling.Bootstrapper.fit() wants to have a value for y to define a treatment and control group. However, the HCP-dataset does not have treatment and control groups (in other words: All my analyses are regression problems). Is it still possible to generate bootstrapped datasets using your functions even if there are no groups?

Reminder: I would like to generate n bootstrapped datasets from the HCP dataset. In this dataset, subjects can belong to the same family or even be twins. I need a function that respects this structure so that resampled datasets are similar in that regard.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.