GithubHelp home page GithubHelp logo

Stringtie error about stringtie HOT 6 CLOSED

gpertea avatar gpertea commented on September 28, 2024
Stringtie error

from stringtie.

Comments (6)

gpertea avatar gpertea commented on September 28, 2024

It is quite unusual to get such a bad memory usage in StringTie, I am wondering if it's just some overexpressed gene on chrM or there are genes on other chromosomes involved here.
In order to pinpoint where it gets stuck/crashes, you should run with a single thread and use the -v option and capture the stderr messages - this way you could tell where it gets stuck (the last genomic region being processed). Hopefully then you can extract only the read alignments from that region and send it to us for further analysis..
I just uploaded a pre-release version v1.0.4 with the latest fixes: http://ccb.jhu.edu/software/stringtie/dl/stringtie-1.0.4.Linux_x86_64.tar.gz
Could you give this version a try with only one cpu (i.e. no -p option) and using -v as suggested above - and let me know what is the bundle where it gets stuck/crashes.
(the source is at http://ccb.jhu.edu/software/stringtie/dl/stringtie-1.0.4.tar.gz if you need to build from source).
There is also an option to ignore specific chromosome(s) in this version, so you can add "-x chrM" to skip processing of alignments on chrM (which are generally troublesome due to oversampling but rarely useful for gene expression analysis, unless one really cares about mitochondrial genes..). Be careful with the spelling of the chromosome names for this one, it's important if it's "chrM" or "ChrM".

from stringtie.

srithegreat avatar srithegreat commented on September 28, 2024

Hi,

Thanks for the quick response. I have tried with the newer version on one
of the files. It gets stuck at the bundle chr21:8182911-8595722. The last
few lines are as shown:
[05/18 19:44:51]^bundle chr21:8161340-8161490(3) done (0 processed
potential transcripts).
[05/18 19:44:51]>bundle chr21:8162457-8162596(3) (0js, 0 guides) loaded,
begins processing...
[05/18 19:44:51]^bundle chr21:8162457-8162596(3) done (0 processed
potential transcripts).
[05/18 19:44:51]>bundle chr21:8167909-8168134(5) (0js, 0 guides) loaded,
begins processing...
[05/18 19:44:51]^bundle chr21:8167909-8168134(5) done (0 processed
potential transcripts).
[05/18 19:44:51]>bundle chr21:8170441-8170607(3) (0js, 0 guides) loaded,
begins processing...
[05/18 19:44:51]^bundle chr21:8170441-8170607(3) done (0 processed
potential transcripts).
[05/18 19:46:21]>bundle chr21:8182911-8595722(12097263) (4774js, 39 guides)
loaded, begins processing...

Also, I have extracted the reads for this region only. Let me know your
thoughts. I do see that another sample gets stuck at chromosome M which
makes sense, but this I have no idea.

Files accessed here:
https://jh.box.com/s/cfxpfxrrb6q6y1yo2crgl0vjsm8w0kwl

Regards,
Srikanth

On Sun, May 17, 2015 at 2:20 PM, Geo Pertea [email protected]
wrote:

It is quite unusual to get such a bad memory usage in StringTie, I am
wondering if it's just some overexpressed gene on chrM or there are genes
on other chromosomes involved here.
In order to pinpoint where it gets stuck/crashes, you should run with a
single thread and use the -v option and capture the stderr messages - this
way you could tell where it gets stuck (the last genomic region being
processed). Hopefully then you can extract only the read alignments from
that region and send it to us for further analysis..
I just uploaded a pre-release version v1.0.4 with the latest fixes:
http://ccb.jhu.edu/software/stringtie/dl/stringtie-1.0.4.Linux_x86_64.tar.gz
Could you give this version a try with only one cpu (i.e. no -p option)
and using -v as suggested above - and let me know what is the bundle where
it gets stuck/crashes.
(the source is at
http://ccb.jhu.edu/software/stringtie/dl/stringtie-1.0.4.tar.gz if you
need to build from source).
There is also an option to ignore specific chromosome(s) in this version,
so you can add "-x chrM" to skip processing of alignments on chrM (which
are generally troublesome due to oversampling but rarely useful for gene
expression analysis, unless one really cares about mitochondrial genes..).
Be careful with the spelling of the chromosome names for this one, it's
important if it's "chrM" or "ChrM".


Reply to this email directly or view it on GitHub
#13 (comment).

Srikanth S. Manda
Research Scholar
Pandey Lab
McKusick-Nathans Institute of Genetic Medicine
Johns Hopkins University School of Medicine
Miller Research Building, Room 560
733 North Broadway
Baltimore, Maryland 21205

from stringtie.

gpertea avatar gpertea commented on September 28, 2024

Ah, I see, this is another monster cluster courtesy of HISAT... It's so dense that IGV started to complain about memory when I tried to visualize the read alignments.. Stringtie gets bogged down badly on this bundle, using about 33GB RAM.
Interestingly, Cufflinks simply gives up -- after filtering a lot of alignments, ends up producing no transcript assemblies at all (I haven't tried to use it with -g because Cufflinks cheats badly with that option..).
We should probably implement a new "aggressive alignment filtering" option in StringTie, to be enabled per user request for exuberant aligners like HISAT. HISAT is an excellent aligner, great speed etc. but at this point I have to recommend TopHat instead, because we sometimes get too many spurious alignments from HISAT creating clusters like these which StringTie cannot handle properly.
So for now I am sorry to say we cannot provide a fix for this situation (besides recommending TopHat), but thank you for sharing this example data set - we'll use this to devise a better alignment filtering strategy in a future version.

from stringtie.

srithegreat avatar srithegreat commented on September 28, 2024

Thanks for the feedback. I will see if TopHat solves my issue.

Do you think its a good idea to mix alignments from HISAT and TopHat?

On Fri, May 22, 2015 at 5:50 PM, Geo Pertea [email protected]
wrote:

Ah, I see, this is another monster cluster courtesy of HISAT... It's so
dense that IGV started to complain about memory when I tried to visualize
the read alignments.. Stringtie gets bogged down badly on this bundle,
using about 33GB RAM.
Interestingly, Cufflinks simply gives up -- after filtering a lot of
alignments, ends up producing no transcript assemblies at all (I haven't
tried to use it with -g because Cufflinks cheats badly with that option..).
We should probably implement a new "aggressive alignment filtering" option
in StringTie, to be enabled per user request for exuberant aligners like
HISAT. HISAT is an excellent aligner, great speed etc. but at this point I
have to recommend TopHat instead, because we sometimes get too many
spurious alignments from HISAT creating clusters like these which StringTie
cannot handle properly.
So for now I am sorry to say we cannot provide a fix for this situation
(besides recommending TopHat), but thank you for sharing this example data
set - we'll use this to devise a better alignment filtering strategy in a
future version.


Reply to this email directly or view it on GitHub
#13 (comment).

Srikanth S. Manda
Research Scholar
Pandey Lab
McKusick-Nathans Institute of Genetic Medicine
Johns Hopkins University School of Medicine
Miller Research Building, Room 560
733 North Broadway
Baltimore, Maryland 21205

from stringtie.

mpertea avatar mpertea commented on September 28, 2024

This should be fixed in StringTie version 1.1.0 that was just released.

from stringtie.

Liuy12 avatar Liuy12 commented on September 28, 2024

Hi,

I encountered similar errors when running StrintTie version 1.3.3. StringTie basically stuck at a certain bundle for days. I looked at this region in IGV and it seems like this region has tons of duplication reads and multi-mapped reads (Over 15000). I guess this is the reason that causes stringtie to stuck at this position. I tried to use the -M option and set it to a relatively lower value (0.1), but that didn't help. I am wondering do you have any suggestions regarding this issue? Do I need to pre-filter the bam files to remove multi-mapped reads? Any suggestion is appreciated. Thanks

Yuanhang(Leo) Liu
Informatics Specialist
Division of Biomedical Statistics and Informatics

Mayo Clinic
200 First Street SW
Rochester, MN 55905

from stringtie.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.