GithubHelp home page GithubHelp logo

iarcbioinfo / draftpolisher Goto Github PK

View Code? Open in Web Editor NEW

This project forked from biorb/draftpolisher

0.0 2.0 0.0 847 KB

A tool for the polishing of draft sequences

License: GNU General Public License v3.0

Python 100.00%

draftpolisher's Introduction

DraftPolisher

DraftPolisher is a tool for fast polishing of draft sequences.

Description

DraftPolisher can produce an improved consensus sequence for a draft genome assembly (or any draft sequence assembly) by using a reference genome and a reference database of sequences (by which the frequences of a defined k-mer, representative of each mismatch, will be evaluated) both in FASTA format.

Prerequisites

DraftPolisher is designed for the polishing of draft sequences. Two different releases of the tool are present: DraftPolisher.py and DraftPolisher_cov.py. The "cov" release (i.e. coverage release) takes into account also the coverage of the sequences from the reference database used for the polishing. The "cov" release was designed based on the SPAdes assembling output file format, where the coverage value is reported in the last part of the sequences IDs, so if using this version of the tool take care of the reference sequences file format (see SPAdes output format for more details).Three files are required by the tool: the query sequence to polish, a reference sequence, a reference database of sequences and the size of the k-mer to use to perform the polishing (we suggest k-mer size = 8). In "Test" folder data for testing the tool are available.

Installation

This tool uses MUSCLE Sequence alignment tool to perform the alignment thus the installation is required (We used MUSCLE v3.8.1551 to test the tool). The last version can be installed as follows:

conda install muscle

Or you can download it from this site and install it manually: http://drive5.com/muscle/downloads.htm

No other installation steps are required. You need just to run the python script of the program. The tool works well with Python 3.6.0 (https://www.python.org/downloads/release/python-360/) but others Python3 versions should work.

The following python packages are required:

  • os
  • re
  • subprocess
  • sys
  • itertools
  • pandas
  • numpy
  • Bio
  • argparse

For an easier installation we suggest to use conda like this:

1- create a conda environment:

conda create --name envname

2- activate the environment:

conda activate envname

3- install the following packadges:

conda config --add channels defaults

conda config --add channels bioconda

conda config --add channels conda-forge

conda install pandas

conda install -c anaconda numpy

conda install -c conda-forge biopython

conda install -c anaconda argparse python=3.6

Parameters

  • Mandatory

Name Example value Description
-q
--query
query.fa draft sequence in Fasta format
-s
--subject
subject.fa reference genome in Fasta format
-f
--sequences_database
reference_contigs.fa a Fasta file containing all the reads or sequences to use as reference database for the polishing of the draft sequence
-k
--kmer_size
8 k-mer size to use for the polishing
  • Flags

Flags are special parameters without value.

Name Description
-h Display help

Usage

python DraftPolisher.py -q query.fa -s subject.fa -f reads.fa -k 8
python DraftPolisher_cov.py -q query.fa -s subject.fa -f reads.fa -k 8

Main Output

Type Description
consensus.fa the polished sequence in Fasta format

Other outputs

Several files are generated during the process and they can be used to evaluate the error rate, error correction rate and in general to have different checkpoints of the process. At the end of the process will be asked if you want to discard these process files or not.

Contributions

Name Email Description
Rosario Nicola Brancaccio [email protected] Developer to contact for support
Massimo Tommasino [email protected]
Tarik Gheit [email protected]

Versioning

Versions: 1.0

Authors

Rosario Nicola Brancaccio - (https://github.com/BioRB/)

License

GNU General Public License, version 3

Acknowledgments

I would like to acknowledge Vincenzo Minissale and Sergey Senkin for their help.

References

draftpolisher's People

Contributors

biorb avatar

Watchers

James Cloos avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.