GithubHelp home page GithubHelp logo

growth-rate-estimate's Introduction

Growth-rate-analysis

Growth-rate-analysis is a python pipeline to calculate peak to through ratio (PTR - ratio of copy numbers at origin of replication to terminus) from mapped reads coverage.

The pipeline follows the coverage smoothing algorithm as described in the publication:

Growth dynamics of gut microbiota in health and disease inferred from single metagenomic samples

Installation

The pipeline can be set up in two easy steps:

  1. Clone the github directory onto your system.
git clone https://github.com/alipirani88/Growth-rate-analysis.git

  1. Use Growth-rate-analysis_env.yml file provided with the github repo to create an environment - Growth-rate-analysis.
conda env create -f Growth-rate-analysis/Growth-rate-analysis_env.yml -n Growth-rate-analysis

Check installation

conda activate Growth-rate-analysis

python Growth-rate-analysis/growth_rate.py -h

Usage

Lets say you want to estimate PTR from the sequencing reads CFT073_condition_R1.fastq.gz. The pipeline will map the reads against CFT073 reference genome and calculate PTR from its smoothed coverage.

python -type SE -PE1 /Path-to-forward-end-fastq-file/CFT073_condition_R1.fastq.gz -o /path-to-output-directory/ -analysis CFT073_condition -index CFT073 -steps All

  • The above command will run the pipeline on fastq reads provided with -PE1 argument
  • The results will be saved in the output directory with a prefix CFT073_condition.
  • The reference genome and its path will be detected from the CFT073 settings that is set in config file.

config

config file is a high level easy to write YAML format configuration file that lets you configure your system wide runs and specify analysis parameters, path to the installed tools, data and system wide information.

  • This config file will contain High level information such as locations of installed programs, cores and memory usage for running on HPC compute cluster, path to a reference genome, various parameters used by different tools. These settings will apply across multiple runs and samples.

  • The config file stores data in KEY: VALUE pair.

  • An example config file with default parameters is included with the installation folder. You can customize this config file and provide it with the -config argument or edit this config file based on your requirements.

  • If you wish to run pipeline in hpc compute environment such as PBS or SLURM, change the number of nodes/cores memory reuirements based on your needs else the pipeline will run with default settings.

index

  • Add reference genome index name and its path to config file. For example; if you have set the reference genome path in config file as shown below, then the required value for command line argument -index would be -index CFT073
#index name
[CFT073]
# path to the reference genome fasta file.
Ref_Path: /nfs/esnitkin/bin_group/variant_calling_bin/reference/CFT073/
# Name of reference genome fasta file.
Ref_Name: CFT073.fasta

An updated Snakemake workflow of this algorithm is available at

This repository will soon be archived.

The pipeline runs sequentially as follows:


  1. Pre-Processing Raw reads using Trimmomatic
  2. Read Alignment using Bowtie2
  3. Post-Alignment steps using SAMTOOLS and PICARD duplicate reads removal
  4. Coverage graph analysis using Bedtools
  5. Binning the sequencing coverage and calculating PTR using In-house script.

Output:


The pipeline generates various alignment and bed output files from different tools at different steps that can be used for manual inspection. The final PTR results can be found in:

  • prefix_PTR.txt Estimated PTR value for the coverage graph
  • prefix_perc_coverage_graph.R script to plot sequence coverage graph with PTR values

growth-rate-estimate's People

Contributors

alipirani88 avatar

Stargazers

 avatar  avatar

Watchers

 avatar  avatar  avatar

Forkers

qxibai

growth-rate-estimate's Issues

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.