GithubHelp home page GithubHelp logo

gpertea / cufflinks Goto Github PK

View Code? Open in Web Editor NEW

This project forked from cole-trapnell-lab/cufflinks

0.0 2.0 0.0 11.68 MB

License: Boost Software License 1.0

Shell 0.06% TeX 11.67% CSS 0.19% Python 0.81% C++ 76.33% C 10.95%

cufflinks's Introduction

CUFFLINKS
----------------------------
Cufflinks is a reference-guided assembler for RNA-Seq experiments. It
simultaneously assembles transcripts from reads and estimates their relative
abundances, without using a reference annotation.  The software expects as 
input RNA-Seq read alignments in SAM format (http://samtools.sourceforge.net).

Here's an example spliced read alignment record:
s6.25mer.txt-913508	16	chr1	4482736	255	14M431N11M	*	0	0	CAAGATGCTAGGCAAGTCTTGGAAG	IIIIIIIIIIIIIIIIIIIIIIIII	NM:i:0	XS:A:-

This record includes a custom tag used by Cufflinks to determine the strand 
of the mRNA from which this read originated.  Often, RNA-Seq experiments lose
strand information, and so reads will map to either strand of the genome.  
However, strandedness of spliced alignments can often be inferred from the 
orientation of consensus splice sites, and Cufflinks requires that spliced 
aligments have the custom strand tag XS, which has SAM attribute type "A", and
can take values of "+" or "-".  If your RNA-Seq protocol is strand specific,
including this tag for all alignments, including unspliced alignments, will 
improve the assembly quality.

The SAM records MUST BE SORTED by reference coordinate, like so:

sort -k 3,3 -k 4,4n hits.sam 

The program is fully threaded, and when running with multiple threads, should
be run on a machine with plenty of RAM. 4 GB per thread is probably reasonable
for most experiments.  Since many experiments feature a handful of genes that
are very abundantly transcribed, Cufflinks will spend much of its time 
assembling" a few genes.  When using more than one thread, Cufflinks may 
appear to "hang" while these genes are being assembled.

Cufflinks assumes that library fragment lengths are size selected and normally
distributed. When using paired end RNA-Seq reads, you must take care to supply
Cufflinks with the mean and variance on the inner distances between mate 
pairs. For the moment, Cufflinks doesn't support assembling mixtures of paired
end reads from different fragment size distributions.  Mixtures of single 
ended reads (of varying lengths) with paired ends are supported.

Cufflinks also assumes that the donor does not contain major structural 
variations with respect to the reference.  Tumor genomes are often highly 
rearranged, and while Cufflinks may eventually identify gene fusions and 
gracefully handle genome breakpoints, users are encouraged to be careful when
using Cufflinks on tumor RNA-Seq experiments. 

The full manual may be found at http://cole-trapnell-lab.github.io/cufflinks/


REQUIREMENTS
---------------------------

Cufflinks is a standalone tool that requires gcc 4.0 or greater, and runs on
Linux and OS X.  It depends on Boost (http://www.boost.org) version 1.38 or 
higher.


REFERENCES
---------------------------
Cufflinks builds on many ideas, including some
proposed in the following papers:

Ali Mortazavi, Brian A Williams, Kenneth McCue, Lorian Schaeffer and Barbara 
Wold, "Mapping and quantifying mammalian transcriptomes by RNA-Seq",Nature 
Methods, volume 5, 621 - 628 (2008)

Hui Jiang and Wing Hung Wong, "Statistical Inferences for isoform expression", 
Bioinformatics, 2009 25(8):1026-1032

Nicholas Eriksson, Lior Pachter, Yumi Mitsuya, Soo-Yon Rhee, Chunlin Wang, 
Baback Gharizadeh, Mostafa Ronaghi, Robert W. Shafer, Niko Beerenwinkel, "Viral 
population estimation using pyrosequencing", PLoS Computational Biology, 
4(5):e1000074

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.