GithubHelp home page GithubHelp logo

enformatik / quorum Goto Github PK

View Code? Open in Web Editor NEW

This project forked from gmarcais/quorum

0.0 2.0 0.0 2.4 MB

QUality Optimized Reads from the University of Maryland

License: GNU General Public License v3.0

Makefile 0.37% M4 0.15% C++ 75.88% TeX 23.12% Perl 0.47%

quorum's Introduction

Platform

Quorum has been tested on Linux with gcc 4.4 to gcc 4.7.

Installation

You should download the latest release distribution tar ball from the releases section. If compiling from the github tree code, you will need autoconf, automake and yaggo. You can download yaggo from the github release page and copy the yaggo problem into your PATH.

Quorum requires Jellyfish to be installed. For Quorum to compile pkg-config must find Jellyfish. The following command must pring "OK":

pkg-config --exists jellyfish-2.0 && echo OK

If not, set the variable PKG_CONFIG_PATH appropriately.

If installing from the github tree, first run autoreconf -fi. Then install the usual way:

./configure --prefix=/path/where/to/install
make
make install

The last command may need to be run as root, if installing in a system directory.

Usage

Only one switch (-s) is required to run Quorum. This switch specify the size of the Jellyfish hash and it must be large enough so that all k-mers will fit into memory. With Illumina reads, a good estimate for this size is:

(G + k * n) / 0.8

where G is the estimated genome size, k is the k-mer length (24 by default) and n is the number of reads. If the chosen size is too small, quorum will stop with the error message: "Failed: Increase the size parameter".

For example, for a bacteria with 2 million Illumina reads in files read1.fastq and read2.fastq, the command would be:

quorum -s 50M read1.fastq read2.fastq

The output corrected file is called by default quorum_corrected.fa.

#Output format

The correction made are appended to the header line in the fasta format. For example, the following 101 bases long read:

  @1204
  GACCGGGCATGGGCTGAGCCTGTTCGGGAAGCTGACGGAGCCGGAAGAGGCCGGGATCGACCCTTCCGCCCCGCCCGCCGACTGGGTCGACCGGCCGGGCG

is corrected to:

  >1204 86:sub:T-C 91:3_trunc 62:5_trunc
  CTTCCGCCCCGCCCGCCGACTGGGCCGAC

The coordinate system is 0-based in the original reads (like a C or Perl array). Here, at base 86 a substitution was made from T to C. The 5_trunc is the index of the first base (0 if not specified) and the 3_trunc is the index after the last base (read length if not specified). Hence, the length of the corrected reads is computed as 3_trunc - 5_trunc (29 in this example). The uncorrected and corrected reads align as follows:

0                                                            62                      86   91        101
|                                                             |                       |    |         |
GACCGGGCATGGGCTGAGCCTGTTCGGGAAGCTGACGGAGCCGGAAGAGGCCGGGATCGACCCTTCCGCCCCGCCCGCCGACTGGGTCGACCGGCCGGGCG
                                                              CTTCCGCCCCGCCCGCCGACTGGGCCGAC

Switches

Other useful switches include (see quorum --help for a short description of all of them).

  • --threads NUMBER

Number of threads to use.

  • --kmer-len LENGTH

Length of k-mer to use. Defaults to 24. This is limited to 31.

  • --contaminant FILE

Pass in a fasta or fastq file of contaminant sequences. The error correction program will truncate any reads which contains a k-mer present in the contaminant sequences.

  • --prefix NAME

By default, all output file have the form quorum_*. This can be changed with this switch.

  • --min-q-char ASCII

This is the ASCII value of the base of quality encoding. If not specified, it is auto-detected: the first 1,000 reads of the first file are read and the minimum quality value seen in these reads is used for min-q-char. An error is raised if this auto-detected base is not one of the standard value (33, 59 or 64).

quorum's People

Contributors

gmarcais avatar

Watchers

James Cloos avatar Mehmet Keçeci avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.