Comments (7)
Indeed it's strange, I noticed that too sometimes on fast machines - I am assuming that the CPU usage may be very low because most of the time the threads are just waiting for data (alignment bundles) - the I/O is just not keeping up with the fast processing of those small bundles. This is the case when there are a lot of very small bundles to process (which unfortunately happens often). On the other hand it's also often the case that there is a high coverage/large span bundle that is keeping just 1 thread busy for a long time, while all the others finished - the many alignments on chrM seems to do that quite often in our experience. I suspect you do have such a high-coverage (and perhaps large span) bundle in your relatively small data set there.
I am actually curious about the kind of alignments/coverage you have there which would make StringTie so slow for just 4 million reads.. If you could share that .bam file I would be interested to take a look (let me know if you want a ftp location to upload that file - in that case please write to me directly at gpertea at jhu.edu).
from stringtie.
Unfortunately I cannot share this particular dataset but I can say that it is 2x 300bp data from a miseq and was prepared using a ribo reduction method rather than polya selection and does indeed have a large amount of reads mapping to chrM So your suspision is probably correct!
Thanks for the reply, I'll try further testing with a 'more appropriate' dataset. If theres any other information I could provide you with (without revealing the data itself) please let me know!
from stringtie.
Also if it helps you with any diagnostics, the genome reads were aligned to is in ~100k contigs and the existing annotation is in its early stages compared to model organisms
from stringtie.
I took another look at this and I found there were indeed some mutex usage issues in the code which caused the threads to idle much more than needed. I fixed some of those and I can see a serious increase in the efficiency of the multi-threading code now -- this fix should be in v1.0.4
from stringtie.
I also have this issue when i use the stringtie v1.3.3b.I'm using a '-p 48' flag
my machine have 54 cores.However i saw there is only one core running when processing the stringtie.CPU usage is about 100-200% but usually 4000% when use the same parameter in hisat2.
from stringtie.
Unfortunately StringTie's -p
option does not scale well at all, there is rarely a need to run with more than 4 CPUs -- definitely 48 is overkill. There are too many small bundles and StringTie blows through those very quickly, and because the current implementation takes one bundle per worker thread, most threads will spend their time waiting for each other in order to either grab the next bundle or to write the results (since large, complex bundles are quite rare).
This situation can be improved with some involved rewriting of the multi-threading code (to allow threads to grab many small bundles at a time), but I still think the benefit for most pipelines would be minimal, because it's still going to be a better (more efficient) use of computing cores to just run multiple stringtie processes (i.e. for multiple samples) with a small number of cores each, than using many threads on a single sample.
from stringtie.
If this is the case, the reference in the documentation of "-p 8" is also an overkill (Pertea et al 2016). I am experiencing this same issue with v1.3.4.
from stringtie.
Related Issues (20)
- Segmentation fault only in "-mix" mode.
- when dealing with big genomes, the error occurs: "the input alignment file is not sorted!" HOT 3
- "The -c and -m parameters of StringTie are not effective. HOT 3
- "Segmentation fault" error
- "Segmentation fault" Error using StringTie HOT 1
- Segmentation Error with Stringtie mix. The bundle appears too large.
- Error when running prepDE.py HOT 1
- Springtie installation instructions fail on "make release"
- StringTie for single-end reads HOT 1
- Not compiling SuperReads_RNA HOT 1
- I wonder 'else' clause should be commented out (or deleted) in prepDE.py3, in the block of badGenes check
- Whether stringtie will include novel transcripts into ballgown file? What other tools can be used to visualize novel transcripts?
- StringTie --merge
- <class 'str'> with high reads after running prepDE.py3 HOT 1
- problem with python prepDE.py3 -i sample_lst.txt HOT 1
- Stringtie --merge -c
- Segmentation fault (core dumped) and problems of gene orientations HOT 3
- how do you get all the potential processed transcripts HOT 1
- input file format
- Use of StringTie to assemby of unannotated intron retention transcripts HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from stringtie.