Comments (6)
It is quite unusual to get such a bad memory usage in StringTie, I am wondering if it's just some overexpressed gene on chrM or there are genes on other chromosomes involved here.
In order to pinpoint where it gets stuck/crashes, you should run with a single thread and use the -v option and capture the stderr messages - this way you could tell where it gets stuck (the last genomic region being processed). Hopefully then you can extract only the read alignments from that region and send it to us for further analysis..
I just uploaded a pre-release version v1.0.4 with the latest fixes: http://ccb.jhu.edu/software/stringtie/dl/stringtie-1.0.4.Linux_x86_64.tar.gz
Could you give this version a try with only one cpu (i.e. no -p option) and using -v as suggested above - and let me know what is the bundle where it gets stuck/crashes.
(the source is at http://ccb.jhu.edu/software/stringtie/dl/stringtie-1.0.4.tar.gz if you need to build from source).
There is also an option to ignore specific chromosome(s) in this version, so you can add "-x chrM" to skip processing of alignments on chrM (which are generally troublesome due to oversampling but rarely useful for gene expression analysis, unless one really cares about mitochondrial genes..). Be careful with the spelling of the chromosome names for this one, it's important if it's "chrM" or "ChrM".
from stringtie.
Hi,
Thanks for the quick response. I have tried with the newer version on one
of the files. It gets stuck at the bundle chr21:8182911-8595722. The last
few lines are as shown:
[05/18 19:44:51]^bundle chr21:8161340-8161490(3) done (0 processed
potential transcripts).
[05/18 19:44:51]>bundle chr21:8162457-8162596(3) (0js, 0 guides) loaded,
begins processing...
[05/18 19:44:51]^bundle chr21:8162457-8162596(3) done (0 processed
potential transcripts).
[05/18 19:44:51]>bundle chr21:8167909-8168134(5) (0js, 0 guides) loaded,
begins processing...
[05/18 19:44:51]^bundle chr21:8167909-8168134(5) done (0 processed
potential transcripts).
[05/18 19:44:51]>bundle chr21:8170441-8170607(3) (0js, 0 guides) loaded,
begins processing...
[05/18 19:44:51]^bundle chr21:8170441-8170607(3) done (0 processed
potential transcripts).
[05/18 19:46:21]>bundle chr21:8182911-8595722(12097263) (4774js, 39 guides)
loaded, begins processing...
Also, I have extracted the reads for this region only. Let me know your
thoughts. I do see that another sample gets stuck at chromosome M which
makes sense, but this I have no idea.
Files accessed here:
https://jh.box.com/s/cfxpfxrrb6q6y1yo2crgl0vjsm8w0kwl
Regards,
Srikanth
ᐧ
On Sun, May 17, 2015 at 2:20 PM, Geo Pertea [email protected]
wrote:
It is quite unusual to get such a bad memory usage in StringTie, I am
wondering if it's just some overexpressed gene on chrM or there are genes
on other chromosomes involved here.
In order to pinpoint where it gets stuck/crashes, you should run with a
single thread and use the -v option and capture the stderr messages - this
way you could tell where it gets stuck (the last genomic region being
processed). Hopefully then you can extract only the read alignments from
that region and send it to us for further analysis..
I just uploaded a pre-release version v1.0.4 with the latest fixes:
http://ccb.jhu.edu/software/stringtie/dl/stringtie-1.0.4.Linux_x86_64.tar.gz
Could you give this version a try with only one cpu (i.e. no -p option)
and using -v as suggested above - and let me know what is the bundle where
it gets stuck/crashes.
(the source is at
http://ccb.jhu.edu/software/stringtie/dl/stringtie-1.0.4.tar.gz if you
need to build from source).
There is also an option to ignore specific chromosome(s) in this version,
so you can add "-x chrM" to skip processing of alignments on chrM (which
are generally troublesome due to oversampling but rarely useful for gene
expression analysis, unless one really cares about mitochondrial genes..).
Be careful with the spelling of the chromosome names for this one, it's
important if it's "chrM" or "ChrM".—
Reply to this email directly or view it on GitHub
#13 (comment).
Srikanth S. Manda
Research Scholar
Pandey Lab
McKusick-Nathans Institute of Genetic Medicine
Johns Hopkins University School of Medicine
Miller Research Building, Room 560
733 North Broadway
Baltimore, Maryland 21205
from stringtie.
Ah, I see, this is another monster cluster courtesy of HISAT... It's so dense that IGV started to complain about memory when I tried to visualize the read alignments.. Stringtie gets bogged down badly on this bundle, using about 33GB RAM.
Interestingly, Cufflinks simply gives up -- after filtering a lot of alignments, ends up producing no transcript assemblies at all (I haven't tried to use it with -g because Cufflinks cheats badly with that option..).
We should probably implement a new "aggressive alignment filtering" option in StringTie, to be enabled per user request for exuberant aligners like HISAT. HISAT is an excellent aligner, great speed etc. but at this point I have to recommend TopHat instead, because we sometimes get too many spurious alignments from HISAT creating clusters like these which StringTie cannot handle properly.
So for now I am sorry to say we cannot provide a fix for this situation (besides recommending TopHat), but thank you for sharing this example data set - we'll use this to devise a better alignment filtering strategy in a future version.
from stringtie.
Thanks for the feedback. I will see if TopHat solves my issue.
Do you think its a good idea to mix alignments from HISAT and TopHat?
ᐧ
On Fri, May 22, 2015 at 5:50 PM, Geo Pertea [email protected]
wrote:
Ah, I see, this is another monster cluster courtesy of HISAT... It's so
dense that IGV started to complain about memory when I tried to visualize
the read alignments.. Stringtie gets bogged down badly on this bundle,
using about 33GB RAM.
Interestingly, Cufflinks simply gives up -- after filtering a lot of
alignments, ends up producing no transcript assemblies at all (I haven't
tried to use it with -g because Cufflinks cheats badly with that option..).
We should probably implement a new "aggressive alignment filtering" option
in StringTie, to be enabled per user request for exuberant aligners like
HISAT. HISAT is an excellent aligner, great speed etc. but at this point I
have to recommend TopHat instead, because we sometimes get too many
spurious alignments from HISAT creating clusters like these which StringTie
cannot handle properly.
So for now I am sorry to say we cannot provide a fix for this situation
(besides recommending TopHat), but thank you for sharing this example data
set - we'll use this to devise a better alignment filtering strategy in a
future version.—
Reply to this email directly or view it on GitHub
#13 (comment).
Srikanth S. Manda
Research Scholar
Pandey Lab
McKusick-Nathans Institute of Genetic Medicine
Johns Hopkins University School of Medicine
Miller Research Building, Room 560
733 North Broadway
Baltimore, Maryland 21205
from stringtie.
This should be fixed in StringTie version 1.1.0 that was just released.
from stringtie.
Hi,
I encountered similar errors when running StrintTie version 1.3.3. StringTie basically stuck at a certain bundle for days. I looked at this region in IGV and it seems like this region has tons of duplication reads and multi-mapped reads (Over 15000). I guess this is the reason that causes stringtie to stuck at this position. I tried to use the -M option and set it to a relatively lower value (0.1), but that didn't help. I am wondering do you have any suggestions regarding this issue? Do I need to pre-filter the bam files to remove multi-mapped reads? Any suggestion is appreciated. Thanks
Yuanhang(Leo) Liu
Informatics Specialist
Division of Biomedical Statistics and Informatics
Mayo Clinic
200 First Street SW
Rochester, MN 55905
from stringtie.
Related Issues (20)
- Segmentation fault only in "-mix" mode.
- when dealing with big genomes, the error occurs: "the input alignment file is not sorted!" HOT 3
- "The -c and -m parameters of StringTie are not effective. HOT 3
- "Segmentation fault" error
- "Segmentation fault" Error using StringTie HOT 1
- Segmentation Error with Stringtie mix. The bundle appears too large.
- Error when running prepDE.py HOT 1
- Springtie installation instructions fail on "make release"
- StringTie for single-end reads HOT 1
- Not compiling SuperReads_RNA HOT 1
- I wonder 'else' clause should be commented out (or deleted) in prepDE.py3, in the block of badGenes check
- Whether stringtie will include novel transcripts into ballgown file? What other tools can be used to visualize novel transcripts?
- StringTie --merge
- <class 'str'> with high reads after running prepDE.py3 HOT 1
- problem with python prepDE.py3 -i sample_lst.txt HOT 1
- Stringtie --merge -c
- Segmentation fault (core dumped) and problems of gene orientations HOT 3
- how do you get all the potential processed transcripts HOT 1
- input file format
- Use of StringTie to assemby of unannotated intron retention transcripts HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from stringtie.