GithubHelp home page GithubHelp logo

greenelab / pathcore-t-analysis Goto Github PK

View Code? Open in Web Editor NEW
7.0 5.0 2.0 139.2 MB

This repository is in support of the PathCORE-T paper (https://doi.org/10.1101/147645). It contains all the code and necessary data/metadata to repeat all analyses in the paper.

Home Page: https://pathcore-demo.herokuapp.com

License: BSD 3-Clause "New" or "Revised" License

Shell 2.93% Python 13.88% Jupyter Notebook 83.19%
gene-expression feature-extraction pathway-analysis supplement

pathcore-t-analysis's Introduction

Overview

This repository contains the scripts to run the analyses described in the PathCORE-T paper. Running ./ANALYSIS.sh is sufficient to reproduce the results in the paper. To use PathCORE-T in your own analyses, please review the sections from The PathCORE-T analysis workflow onwards in this README.

We released two Python packages for PathCORE-T:

The two packages are used in this analysis repository.

The data directory

A README is provided in the ./data directory with details about the scripts to download and/or process datasets, data source citations, etc.

The figures directory

All figures in the PathCORE-T paper are also available here.

The jupyter-notebooks directory

Scripts used to generate Figure 3 and Supplemental Figure 2 are provided in notebook format. We have found that we can offer greater detail about each of the figures in this format.

Tutorials

This directory also contains 2 notebooks that users can read through or run when they are getting started with PathCORE-T analysis:

The PathCORE-T analysis workflow

Please review one of the analysis_<dataset>_<model>.sh scripts for an example of the workflow.

In the figure below, (a) is used to generate the weight matrix and (b) specifies the inputs to the PathCORE-T analysis in (c):

PathCORE-T analysis workflow diagram

Scripts (in order of execution):

  1. run_network_creation.py

    Iterates through a directory of weight matrices generated by a feature construction algorithm that has been applied to a transcriptomic dataset. Multiple weight matrices can be constructed from the same algorithm initialized with different random seeds. The eADAGE example uses multiple weight matrices, whereas the two NMF examples only use one weight matrix.

  2. run_permutation_test.py

    Iterates through a directory of network files and applies a permutation test to the networks to determine edge significance. If there is more than 1 network file in the directory, the networks are combined to make a single aggregate network. Edges that are significant under their corresponding nulls (generated by the permutation test) are kept in the final network.

Additional:

  • constants directory

    This module allows for import of two dictionaries: GENE_SIGNATURE_DEFINITIONS and SHORTEN_PATHWAY_NAMES. These are intended to be modified when you need to run PathCORE-T using a feature construction algorithm and/or pathway definitions different from those in our case studies.

    In most cases, the files in constants should be the only ones you may need to modify to run an analysis of your own.

  • utils.py

    Utility functions for file reading & processing.

Web application database setup

Here we describe the steps taken to prepare the database that backs the PathCORE-T demo application. The demo application is built on the Flask microframework and deployed on Heroku. The database is a MongoDB instance hosted on mLab.

Both Heroku and mLab provide free tier options for their services.

Note that the --metadata flag is used in analysis_Paeruginosa_eADAGE.sh for run_network_creation.py ahead of the web application setup carried out by running web_db_Paeruginosa_eADAGE.sh.

Scripts (in order of execution):

  • web_initialize_db.py

    Creates the following collections:

    • genes: Stores the gene identifiers. Assumes these can be retrieved from the first column (the row names/index) of the transcriptomic dataset. For the PAO1 example, we provided an additional file (for more information, see data/README.md) that has the common names corresponding to the gene locus tags specified in the compendium.

    • pathways: Stores the pathway & definition information from the pathway definitions file.

    • sample_labels: Stores the sample labels and the corresponding normalized expression values. Assumes the labels can be retrieved from the first row (the header) of the transcriptomic dataset and each column is the vector of expression values corresponding to that sample.

    • network_edges: Stores the network files in the networks directory created by running run_network_creation.py.

    • network_feature_signatures: Stores the feature gene signature information in the metadata directory created by running run_network_creation.py ... --metadata

    • network_feature_pathways: Stores the feature pathway definitions in the metadata directory created by running run_network_creation.py ... --metadata

    • sample_annotations: Specific to the PAO1 example, we store additional information about the samples in the compendium that can be displayed on the web application (for more information about the sample annotations file, see `data/README.md).

  • web_edge_page_data.py

    Creates the collection pathcore_edge_data. All information needed in an edge page is stored here (e.g. computes gene odds ratios, sample "summary" expression scores, creates heatmaps based on these values).

Additional:

  • utils_setup_PAO1_example.py

    Utility files in support of the PAO1 example. Gets the gene common names and sample annotations information.

PathCORE-T web application setup

Step 1: mLab setup

  • Register for an mLab account at mLab.com.
  • Create new: Create a free sandbox database (0.5 GB).
  • Database Users tab: Add a user to the new database that has write-access.
  • Create a credentials file (see example-mLab-credentials.yml)

Step 2: Run web_initialize_db.py

Step 3: Run web_edge_page_data.py

Fork the PathCORE-T-demo repository. Follow the setup instructions in the repository's README. Update or remove any text or code specific to the eADAGE-based, KEGG PAO1 case study so that the web application accurately describes and supports your analysis.

pathcore-t-analysis's People

Contributors

kathyxchen avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

pathcore-t-analysis's Issues

Update figures based on response to reviewers round 3

Figures to be revised based on reviewer comments:

  • Figure 1 (high-level workflow)
  • Figure 5 (PAO1/KEGG eADAGE network)
  • Figure S1 (NMF weight distributions)
  • Figure S2 (PAO1/KEGG NMF network) - REMOVE
  • Rearrange order of Figs 2-4.

Rename from PathCORE to PathCORE-T

At minimum, substitutions need to take place in the README of this repository. Documentation in scripts should also be updated (and/or we should add a note in the README about the change & possible inconsistencies).

Include an example of the PathCORE analysis workflow, from model construction on a dataset to the final network.

jupyter notebook example so that a user can just read through it if they do not want to run code. They would also have the option of running the notebook so they can play around with the example themselves.

  • Update the jupyter-notebooks README
  • Make sure a user knows that they have to download the PAO1 compendium (e.g. ./download_data.sh) before running the jupyter notebook themselves.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.