GithubHelp home page GithubHelp logo

Comments (5)

gpertea avatar gpertea commented on September 28, 2024

Sounds like it could be related to the bug that was fixed in v1.1.2 (last version). Could you give the last version a try on the same data and report back if you still get the error ?

from stringtie.

ls233 avatar ls233 commented on September 28, 2024

Unfortunately I'm getting the same error also with V 1.1.2

stringtie --version
1.1.2

On Mon, Dec 14, 2015 at 10:03 PM, Geo Pertea [email protected]
wrote:

Sounds like it could be related to the bug that was fixed in v1.1.2 (last
version). Could you give the last version a try on the same data and report
back if you still get the error ?


Reply to this email directly or view it on GitHub
#32 (comment).

from stringtie.

gpertea avatar gpertea commented on September 28, 2024

Any chance of sharing with us the data which trigger this error? If you run with -v option (and no multi-threading) it should be possible to identify the bundle and the genomic location where the error happens and it should make it possible to extract only the reads and the reference transcripts from that location which triggers the problem. If you agree to share those data for debugging, please contact me at [email protected] to further discuss the data transfer details (unless you can already extract it and upload it somewhere for me to retrieve).
Thanks!

from stringtie.

gpertea avatar gpertea commented on September 28, 2024

Thank you for providing the data, we were able to reproduce the crash in that particular region which has very high coverage and a huge number of splice sites, causing StringTie to crash after taking more than 32GB RAM -- very unusual.

Upon investigation of that particularly dense bundle we noticed that the STAR alignments in that region were very messy, with many (probably false) splicing events seemingly caused by STAR forcefully aligning reads with mismatches + soft clipping.
The same reads aligned with HISAT2 (with --dta option of course) provided much better alignments in that region (without losing coverage depth!) and StringTie was able to assemble that region without problems using its output (using only a few hundred megabytes of RAM in the process!).
Not sure if it was something about the particular version of STAR used for generating those alignments, or the options used, but we had a hard time looking for ways to filter the "bad" alignments there, due to somewhat incomplete SAM records (missing MD tags; number of mismatches reported per pair not per read etc.).
Anyway, the latest version of StringTie we just released (v1.2.0) is able to finish processing that BAM file you provided (with the STAR alignments for the whole genome), using about 13GB RAM (with 8 cpus), in about 12 minutes. (Again, using the alignments produced by HISAT2 would drastically reduce the RAM usage and running time).

So please give the new version a try on your existing STAR alignments, but we strongly recommend using HISAT2 --dta for mapping the reads in the future (or at least, if you really have to use STAR, please use more stringent alignment options for it, e.g. lower the maximum number of mismatches per read to no more than 3 and perhaps also limit soft clipping somehow, to reduce the false-positive spliced alignment rate).

from stringtie.

gpertea avatar gpertea commented on September 28, 2024

This should've been fixed in v1.2.0.

from stringtie.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.