GithubHelp home page GithubHelp logo

abubakariabdulwasid / snp_pipeline Goto Github PK

View Code? Open in Web Editor NEW

This project forked from papos92/snp_pipeline

0.0 0.0 0.0 34.21 MB

Pipeline for reference-based variant (SNP/indels) calling from raw data to phylogeny.

License: GNU General Public License v3.0

Python 100.00%

snp_pipeline's Introduction

snp_pipeline

Licence: GNU General Public License v3.0 (copy provided in directory)
Author: Tom van Wijk
Contact: [email protected]

DESCRIPTION

Pipeline to perform SNP calling on raw sequence data. Currently only supports illumina paired-end read data and supplied with a Salmonella Dublin reference genome.

REQUIREMENTS

  • Linux operating system. This software is developed on Linux Ubuntu 16.04
    Experiences when using different operating systems may vary.
  • python 2.7.x
  • python libraries as listed in the import section of mothur_amplicon_pipeline.py
  • erne v2.1.1 or newer
  • Burrows-Wheels Aligner v0.7.12 or newer
  • Samtools v0.1.19 or newer
  • Picard Tools v2.18.7 or newer
  • VarScan 2.3.9 or newer

INSTALLATION

  • Clone the SNP_pipeline repository to the desired location on your system.
    git clone https://github.com/Papos92/SNP_pipeline.git
  • Add the location of the SNP_pipeline directory to the PATH variable:
    export PATH=$PATH:/path/to/SNP_pipeline
    (It is recommended to add this command to your ~/.bashrc file)
  • Create path variable SNP_REF to the SNP_pipeline directory:
    export SNP_REF=/path/to/SNP_pipeline
    (It is recommended to add this command to your ~/.bashrc file)
  • Create path variable PICARD to picard.jar:
    export PICARD=/path/to/picard.jar
    (It is recommended to add this command to your ~/.bashrc file)
  • Create path variable VARSCAN to VarScan.vX.X.X.jar:
    export VARSCAN=/path/to/VarScan.vX.X.X.jar
    (It is recommended to add this command to your ~/.bashrc file)

USAGE

Start the pipeline with the following command:

snp_pipeline.py -i 'inputdir' -o 'outputdir' -t 'threads' -x 'savetemp' -r 'reference'

  • 'inputdir': location of input directory. (required)
    Should only contain either the uncompressed (.fastq) or compressed (.fastq.gz) sequence files containing the raw sequences of the forward and reverse reads. For each sample, these fastq files need to be named with an '_R1' or '_R2' tag respectively and be furthermore identical. The data is expected to be free of primer-, barcode- and adapter sequences. Quality trimming is performed by the pipeline.

  • 'outputdir': location of output directory.
    Default = subdirectory in inputdir

  • 'threads': Number of threads (virtual cpu cores) to be used.
    Default = 8.

  • 'savetemp': Set to true so save the temporary files and directories generated by the pipeline.
    default = false

  • 'reference': Set the reference genome to be used for creating alignments and calling SNP's.
    default = NC_011205. (Salmonella Dublin)

ADDING NEW REFERENCE GENOMES

It is possible to add your own reference genomes by adding the genomes to the reference_file directory in .fasta format and run the following commands:

  • Index the reference genomes using BWA:
    bwa index 'genome'.fasta
  • Create dictionary file of database for PICARD:
    samtools faidx 'genome'.fasta
  • Add 'genome'.cvs containing gene annotations, these are used for filtering snp's in the accessory genome.

You can now run the pipeline with de -r 'genome' parameter where 'genome' is the name of the reference files (without extension).

snp_pipeline's People

Contributors

papos92 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.