GithubHelp home page GithubHelp logo

greenelab / core-accessory-interactome Goto Github PK

View Code? Open in Web Editor NEW
1.0 4.0 1.0 175 MB

Investigating the functional relationship between P. aeruginosa core and accessory genes.

License: BSD 3-Clause "New" or "Revised" License

Jupyter Notebook 98.43% Python 1.52% R 0.06% Shell 0.01%
pseudomonas-aeruginosa accessory-genes notebook experiment gene-expression

core-accessory-interactome's Introduction

Identifying the interaction of core and accessory genes in P. aeruginosa

Alexandra J Lee, Georgia Doing, Samuel L. Neff, Deborah A Hogan and Casey S Greene

April 2020

University of Pennsylvania

Clinical and environmental strains of Pseudomonas aeruginosa (or P. aeruginosa), an opportunistic pathogen that causes difficult to treat infections, have significant genomic heterogeneity including the presence of diverse accessory genes that are only present in some strains or clades. Both core genes, which are conserved across strains, and accessory genes have been associated with traits such as biofilm formation and virulence. Much of what we know about core and accessory gene content comes from genome analyses. Here, we use a newly assembled transcriptome compendium to analyze the transcriptional patterns of core and accessory gene expression in PAO1 and PA14 strains across thousands of samples from hundreds of distinct experiments. We found that a subset of core genes was transcriptionally stable across strain PAO1 and PA14 strain types and that these genes had fewer accessory genes with correlated expression patterns than did less stable core genes.

Directory Structure

Folder Description
0_explore_data This folder contains analysis notebooks to visualize the expression data to get a sense for the variation contained.
1_processing This folder contains analysis notebooks to determine what threshold to use to partition the gene expression data into PAO1 and PA14 compendia.
2_correlation_analysis This folder contains analysis notebooks to detect gene co-expression modules starting with gene expression data, applying Pearson correlation and then clustering on this correlation matrix to obtain gene modules.
3_core_core_analysis This folder contains analysis notebooks to examine the stability of core genes across strains.
4_acc_acc_analysis This folder contains analysis notebooks to examine accessory-accessory gene modules.
5_core_acc_analysis This folder contains analysis notebooks to examine the relationship between core genes and accessory genes.
6_common_genes_analysis This folder contains analysis notebooks to compare common DEGs found in prior work to core and accessory genes
scripts This folder contains supporting functions that other notebooks in this repository will use.
data This folder contains metadata used for different analyses.

Usage

Operating Systems: Mac OS, Linux (Note: bioconda libraries not available in Windows)

In order to run this simulation on your own gene expression data the following steps should be performed:

First you need to set up your local repository:

  1. Download and install github's large file tracker. Once downloaded and installed, setup git lfs by running git lfs install
  2. Install miniconda
  3. Navigate to the location where you'd like the code to live and clone the core-accessory-interactome repository by running the following command in the terminal:
git clone https://github.com/greenelab/core-accessory-interactome.git

Note: Git automatically detects the LFS-tracked files and clones them via http. 4. Navigate into the cloned repo by running the following command in the terminal:

cd core-accessory-interactome
  1. Set up your conda environment by running the following command in the terminal:
bash install.sh
  1. Navigate to any of the analysis directories listed in the table above to see the code for how analyses were performed. To reproduce the results and figures of the paper, run the analysis directories in order.

Acknowledgements

We would like to thank Jake Crawford for very insightful discussions about methods and interpretation of gene correlation analyses. We would also like to thank all other members of Greene lab (Natalie Davidson, Ben Heil, Ariel Hippen, David Nicholson, Milton Pividori, Halie Rando, Taylor Reiter) for helpful comments and code review.

core-accessory-interactome's People

Contributors

ajlee21 avatar

Stargazers

 avatar

Watchers

 avatar  avatar  avatar  avatar

Forkers

ajlee21

core-accessory-interactome's Issues

Quantify difference in likelihood

But you could approximate this integral in this case, right? Since your density plot is just a continuous approximation of your discrete data (gene correlations), you could just count the number of gene correlations > 0.5 and divide it by the total number of gene correlations.

Originally posted by @jjc2718 in #5

Is accessory shift consistent across other groups of samples?

This is interesting!

If you wanted to, you could probably do a permutation test to determine whether or not this skew is significant: take, say, 1000 random samples of genes from the whole gene set having the same size as the accessory gene set, then compare the mean (or other summary statistic) with the distribution of mean correlations from the 1000 random samples. If the mean in the accessory genes is larger than most of the random samples, that provides pretty solid evidence that accessory genes have a higher mean correlation than you would expect by chance.

(This would be different than your current shuffled dataset because it would be a much smaller sample, which may lead to more variability between samples)

If you decide to go this route it might be good to run it by Casey first, he might have some thoughts on whether or not this is the right approach.

Originally posted by @jjc2718 in https://github.com/_render_node/MDIzOlB1bGxSZXF1ZXN0UmV2aWV3VGhyZWFkMjU3NzU1OTcwOnYy/pull_request_review_threads/discussion

Considerations for tuning network

Some considerations raised in #21 for tuning network:

  1. Using binary vs continuous edge weights
  2. Should we use soft thresholding, which WGCNA seems to reccomend.
  3. Distribution of genes amongst modules and how to split up large modules.

Try calculating correlations using only PAO1 samples and only PA14 samples

These are correlations between genes in PAO1 and PA14 expression data combined, correct? Just want to make sure my understanding of what data you're using here is right.

If so, it might also be interesting to see what happens in the coexpression analysis if you use just PAO1 or just PA14 data (but still using both strains to determine core/accessory genes)...no need to try it now, just an idea for the future.

Originally posted by @jjc2718 in #1

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.