GithubHelp home page GithubHelp logo

jierui-cell / sedimix_v0 Goto Github PK

View Code? Open in Web Editor NEW
0.0 1.0 0.0 45.11 MB

Sedimix, An Automated Pipeline for Genomic Analysis; submit for partial fulfillment of MCB honors program

Shell 0.04% Python 2.71% Jupyter Notebook 97.25%

sedimix_v0's Introduction

Sedimix_v0

Sedimix: An Automated Pipeline for Genomic Analysis
This project was submitted in partial fulfillment of the MCB honors program.

Overview

Sedimix is currently under development and will undergo significant improvements over the next few months. This version, v0, primarily serves as a demo test.

Key Features:

  • Utilizes Snakemake, a workflow management system for Python
  • Identifies ancient hominin reads in the samples through mapping and taxonomic classification
  • Generates a report file with summary statistics (number of reads, deamination percentage, lineage site information, etc.)

Requirements

Ensure the following tools are installed and configured before running the pipeline:

Centrifuge

git clone https://github.com/DaehwanKimLab/centrifuge
make -C centrifuge
echo 'export PATH=$PATH:$(pwd)/centrifuge' >> ~/.bashrc
source ~/.bashrc

Kraken2

git clone https://github.com/DerrickWood/kraken2.git
./kraken2/install_kraken2.sh kraken2
echo 'export PATH=$PATH:$(pwd)/kraken2' >> ~/.bashrc
source ~/.bashrc

Seqtk

git clone https://github.com/lh3/seqtk.git
make -C seqtk
echo 'export PATH=$PATH:$(pwd)/seqtk' >> ~/.bashrc
source ~/.bashrc

BWA

git clone https://github.com/lh3/bwa.git
make -C bwa
echo 'export PATH=$PATH:$(pwd)/bwa' >> ~/.bashrc
source ~/.bashrc

Samtools

wget https://github.com/samtools/samtools/releases/download/1.20/samtools-1.20.tar.bz2
tar -xvjf samtools-1.20.tar.bz2
rm samtools-1.20.tar.bz2
./samtools-1.20/configure
make -C samtools-1.20
echo 'export PATH=$PATH:$(pwd)/samtools-1.20' >> ~/.bashrc
source ~/.bashrc

If you're using zsh as your shell, replace ~/.bashrc with ~/.zshrc in the commands above.

Index Files
Download index files for Centrifuge and Kraken2 from the following:

  • AWS Indexes for Centrifuge
    We recommend Refseq: bacteria, archaea, viral, human (7.9GB) for Centrifuge.

    wget https://genome-idx.s3.amazonaws.com/centrifuge/p%2Bh%2Bv.tar.gz
  • AWS Indexes for Kraken2
    We recommend Refseq: archaea, bacteria, viral, plasmid, human1, UniVec_Core (60GB) for Kraken2.

    wget https://genome-idx.s3.amazonaws.com/kraken/k2_standard_20240605.tar.gz

Alternatively, you can build Centrifuge and Kraken2 indexes yourself by following the instructions provided on their respective GitHub repositories.

Human Reference Genome
Download the human reference genome hg19.fq.gz from the following:

Python and Other Dependencies

To ensure all necessary dependencies are installed, create a conda environment using the provided environment.yaml file.

Conda Environment

Create a conda environment with the following command:

conda env create -f environment.yaml

Alternatively, you can use mamba for faster environment creation:

mamba env create -f environment.yaml

Activate the environment with:

conda activate sedimix

Pipeline Functionality

  1. Takes raw FASTQ files as input.
  2. Processes these through BWA to identify reads of interest.
  3. Classifies Homo sapiens reads using Centrifuge.
  4. Outputs a comprehensive taxonomic report and a folder containing classified hominin reads.

Usage Instructions

  1. Place your input FASTQ files in the data folder.

  2. Run the pipeline with the following command:

    snakemake -s ../rules/centrifuge.smk --cores {n_cores}

Retrieve Your Results

  • Classified Reads: Located in the final_reads folder
  • Data Summary Report: Located in the final_report folder
  • Example Folder: An example folder can be found in SIM_Set3_centrifuge.

sedimix_v0's People

Contributors

jierui-cell avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.