GithubHelp home page GithubHelp logo

cduvallet / microbiomehd Goto Github PK

View Code? Open in Web Editor NEW
66.0 11.0 28.0 2.71 MB

Cross-disease comparison of case-control gut microbiome studies

Makefile 10.24% Python 89.16% Shell 0.59%
reproducible-paper reproducible-research microbiome

microbiomehd's Introduction

MicrobiomeHD: the human gut microbiome in Health and Disease

This repo contains the code to reproduce all of the analyses in "Meta analysis of microbiome studies identifies shared and disease-specific patterns", Duvallet et al. 2017.

The raw data is available on Zenodo: DOI

More information on the raw data on Zenodo is in the db/ folder of this repo.

Reproducing analyses

The paper is accompanied by a Makefile, which you can use to re-make all of the analyses, figures, and tables in the paper.

The supplementary files are included in the final/supp-files/ folder of this repo. Most of the folders in this repo are currently empty, and will get populated with the results from make. The folders data/user_input and data/lit_search are the other two folders with provided information.

make will download the data from Zenodo, clean and process the raw data, perform all of the analyses in the paper, and make all of the figures and tables.

Note that make does not do any of the random forest parameter search-related analyses or figures. This part takes a very long time, and should be done as a background process.

Finally, re-building the PhyloT tree takes some time and re-orders the genera differently than in the paper (because there are multiple representations for each tree). The tree used in the paper is provided in data/analysis_results/. If you want to skip re-making the tree, you should run make tree --touch before running make.

Other things you can make separately:

  • figures: all of the figures
    • main_figures: just the figures in the main text
    • supp_figures: supplementary figures
  • tables: all of the tables, in both Markdown and tex formats
  • analysis: all of the analysis files, but none of the figures or tables
  • rf_params: the random forest parameter search analysis. Note that this is not included in any of the other make commands.
  • supp_files: the supplementary files, which are also included in the repo and don't technically need to be re-made
  • tree: the phyloT tree used to order genera in final figures.

Installing

To re-make all of the analyses, you'll first need to install the required modules.

You should probably do this in a Python 2 virtual environment. Unfortunately, many of the packages I used are no longer available in conda, so if you use anaconda you'll need to first create an empty conda environment, install pip, and then pip install the packages. If you don't use anaconda and/or have an alternative preferred way of making virtual environments, that's fine too (but I can't confirm that it will all work out). From the main directory, type:

conda create -n microbiomeHD python=2.7
source activate microbiomeHD
conda install pip
pip install -r requirements.txt

Note that all of these scripts were written in and for Python 2. Also, there have been many backward incompatible changes in some important modules used throughout, so you should install the old versions specified in the requirements.txt or else be plagued by many import errors.

You also need to install the NCBI EDirect command line tools for making the tree. Instructions on how to do that are on the NCBI documentation.

Then you just run make:

make

And voila! A paper!

Directory structure

This repo's structure follows what's recommended by Cookie Cutter Data Science. Some of the files that are made and which I think might be useful to you are already included in this repo. Other files are made by various scripts in src/.

data

All data-related files are (or will be) in data/:

  • user_input: user-inputted files, including:
    • results_folder.yaml file containing metadata on all the datasets [included]
    • list_of_tar_files.txt file that's used to download the raw data from Zenodo [included]
  • lit_search: manual curation of the results reported in the original publications [included]
  • analysis_results: files created by the analyses (e.g. q-values, random forest AUCs, etc) [made, except for the phyloT tree which is included]
  • raw_otu_tables: the raw OTU tables as downloaded from Zenodo
  • clean_tables: OTU tables and metadata in feather format, with "cleaned" data (i.e. only samples with both metadata and 16S, OTUs and samples with too few reads removed, etc)
  • tree: files associated with the phyloT tree. Note that the final tree (and its direct prerequisites) are included in this repo. If you want to re-make the tree from scratch, delete any of these files before running make. Making the tree is dicier and involves a manual step for you at http://phylot.biobyte.de/. Re-making the tree will also change the order of genera so that they no longer match the ordering in the paper exactly (since the linear order of phylogenetic groups doesn't matter).

source code

All of the code is in the src/ folder:

  • analysis: all of the code used to perform any analyses
  • data: data-related code, i.e. to download the raw data from Zenodo and to clean up the raw OTU tables and metadata files
  • final: code used to make the final figures, tables, and supplementary files
  • util: various functions and modules used in other scripts

figures, tables, and supplementary files

The Supplementary Files, Figures, and Tables are in the final/ folder. Some are made by make, others are included with this repo.

  • figures: figures [made]
  • tables: tables, in Markdown, tex, and tab-delimited formats [made]
  • supp-files: supplementary files [included]

microbiomehd's People

Contributors

cduvallet avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

microbiomehd's Issues

Unable to generate `data/tree/phyloT_tree.newick`

At a certain point, I am told by the Makefile to Go to http://phylot.biobyte.de/ and generate tree from data/tree/ncbi_ids.clean.for_phyloT. Press any key to continue once you've added the tree file in data/tree/phyloT_tree.newick. When I go to that website, upload the ncbi_ids.clean.for_phyloT file, and click generate tree, I get the error Error: At least 2 IDs are required to generate a tree. Indeed, the file ncbi_ids.clean.for_phyloT only contains one number: 239934.

Was I supposed to click something on the phyloT website? I just uploaded the file and didn't change anything else.

Elevate Your Reproducible Analysis with MicrobiomeStat alongside MicrobiomeHD

Dear MicrobiomeHD Team,

I hope this message finds you in good health and high spirits. Your meticulous work on the MicrobiomeHD, offering a seamless and comprehensive platform for reproducibility of the analyses, is laudable and provides an invaluable asset for the scientific community exploring the human gut microbiome in health and disease.

In the same spirit of advancing robust and reproducible microbiome analysis, I am excited to introduce you to MicrobiomeStat, a tool that stands as an exemplar of ensuring consistency and reliability in microbiome data analysis. MicrobiomeStat integrates effortlessly with existing workflows, such as MicrobiomeHD, and provides additional tools to ensure the comprehensive analysis of microbiome data with ease and accuracy.

Key Attributes of MicrobiomeStat:

  • Seamless Integration: Effortlessly integrate MicrobiomeStat with MicrobiomeHD and augment the analytical capabilities to ensure a deeper insight into microbiome data.
  • Commitment to Reproducibility: With a focus on reproducibility, MicrobiomeStat ensures that each step of the microbiome data analysis is robust, verifiable, and reliable.
  • Advanced Analytical Suite: Enhance the analysis with a range of tools provided by MicrobiomeStat for an exhaustive and detailed exploration of microbiome data.

By integrating MicrobiomeStat into the MicrobiomeHD workflow, you take a significant step forward in ensuring the robustness and reproducibility of microbiome research, further contributing to the advancements in understanding the human gut microbiome in health and disease.

I kindly invite you to explore the potentials of MicrobiomeStat by visiting the GitHub repository: MicrobiomeStat on GitHub

Your consideration of MicrobiomeStat as an addition to your remarkable suite of tools for microbiome analysis is highly appreciated. I am available for any further discussion or inquiries regarding MicrobiomeStat and eagerly await your positive response.

Wishing you continued success in your significant work.
Demo of MicrobiomeStat.pdf

Warm regards,
Chen YANG

Install issues

I'm having a couple install issues when attempting to install the requirements on NYU's HPC cluster and using virtualenv to make sure it's a clean install. I'm using pip3 install -r requirements.txt to try and get everything.

  • you should probably mention somewhere that it requires python 3. I'm one of those luddites who still uses 2 by default (version 0.5.1 one scikit-bio is only python 3 compatible)

  • feather requires Cython but when installing with pip for some reason it doesn't get installed first (raises ModuleNotFoundError: No module named 'Cython'). running pip3 install Cython seems to fix this.

  • I think feather also requires numpy and doesn't install it properly (ModuleNotFoundError: No module named 'numpy'). Again, running pip3 install numpy seems to fix it.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.