GithubHelp home page GithubHelp logo

abubakariabdulwasid / assembly_pipeline Goto Github PK

View Code? Open in Web Editor NEW

This project forked from papos92/assembly_pipeline

0.0 0.0 0.0 23 KB

Pipeline for processing, assembly and quality assessment of bacterial WGS data.

License: GNU General Public License v3.0

Python 100.00%

assembly_pipeline's Introduction

Assembly pipeline

Licence: GNU General Public License v3.0 (copy provided in directory)
Author: Tom van Wijk
Contact: [email protected]

DESCRIPTION

This script is developed for the assembly of whole genome sequencing (WGS from now on) data of bacterial isolates. It is developed specifically for paired-end Illumina sequencing data but might work or might be easily modified to work with other sequencing data formats.

Quality reports of the raw data are generated using FastQC and MultiQC. The reads are quality trimmed from both ends using ENRE-filter, assembled to de-novo contigs and scaffolds using SPAdes. The assemblies are assessed using QUAST, a quality report is generated.

REQUIREMENTS

INSTALLATION

  • Clone the assembly_pipeline repository to the desired location on your system.
    git clone https://github.com/Papos92/assembly_pipeline.git
  • Add the location of assembly.py to the PATH variable:
    export PATH=$PATH:/path/to/assembly.py
    (It is recommended to add this command to your ~/.bashrc file)

USAGE

The script can be runned with the following command:

assembly.py -i 'inputdir' -o 'outputdir' -t 'threads' -m 'memory' -x 'savetemp'

  • 'inputdir': location of input directory. (required)
    Should only contain either the uncompressed (.fastq) or compressed (.fastq.gz) sequence files containing the raw sequences of the forward and reverse reads. The files need to be named with an _R1 and _R2 tag for the forward and reverse reads respectively.
    Each sample (set of forward and reverse files) are treated as a separate isolate. It is not (yet) possible to process isolates that are divided into multiple different samples. Will add this when required.

  • 'outputdir': location of output directory. (Default = inputdir)
    The output will be stored in multiple directories inside this directory.

  • 'threads': Number of threads (virtual cpu cores) to be used. (Default = 4)

  • 'memory' Maximum amount of RAM (GB) to be used. (Default = 13)
    If the machine runs out of RAM memory, SPAde will crash so adjust this parameter appropriately to your machine.

  • 'savetemp': Set to true so save the temporary files and directories generated by the pipeline. (Default = false)

assembly_pipeline's People

Contributors

papos92 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.