GithubHelp home page GithubHelp logo

Multi-threading not working? about stringtie HOT 7 CLOSED

gpertea avatar gpertea commented on September 28, 2024
Multi-threading not working?

from stringtie.

Comments (7)

gpertea avatar gpertea commented on September 28, 2024

Indeed it's strange, I noticed that too sometimes on fast machines - I am assuming that the CPU usage may be very low because most of the time the threads are just waiting for data (alignment bundles) - the I/O is just not keeping up with the fast processing of those small bundles. This is the case when there are a lot of very small bundles to process (which unfortunately happens often). On the other hand it's also often the case that there is a high coverage/large span bundle that is keeping just 1 thread busy for a long time, while all the others finished - the many alignments on chrM seems to do that quite often in our experience. I suspect you do have such a high-coverage (and perhaps large span) bundle in your relatively small data set there.
I am actually curious about the kind of alignments/coverage you have there which would make StringTie so slow for just 4 million reads.. If you could share that .bam file I would be interested to take a look (let me know if you want a ftp location to upload that file - in that case please write to me directly at gpertea at jhu.edu).

from stringtie.

cmonger avatar cmonger commented on September 28, 2024

Unfortunately I cannot share this particular dataset but I can say that it is 2x 300bp data from a miseq and was prepared using a ribo reduction method rather than polya selection and does indeed have a large amount of reads mapping to chrM So your suspision is probably correct!

Thanks for the reply, I'll try further testing with a 'more appropriate' dataset. If theres any other information I could provide you with (without revealing the data itself) please let me know!

from stringtie.

cmonger avatar cmonger commented on September 28, 2024

Also if it helps you with any diagnostics, the genome reads were aligned to is in ~100k contigs and the existing annotation is in its early stages compared to model organisms

from stringtie.

gpertea avatar gpertea commented on September 28, 2024

I took another look at this and I found there were indeed some mutex usage issues in the code which caused the threads to idle much more than needed. I fixed some of those and I can see a serious increase in the efficiency of the multi-threading code now -- this fix should be in v1.0.4

from stringtie.

zhanghao-njmu avatar zhanghao-njmu commented on September 28, 2024

I also have this issue when i use the stringtie v1.3.3b.I'm using a '-p 48' flag
my machine have 54 cores.However i saw there is only one core running when processing the stringtie.CPU usage is about 100-200% but usually 4000% when use the same parameter in hisat2.

from stringtie.

gpertea avatar gpertea commented on September 28, 2024

Unfortunately StringTie's -p option does not scale well at all, there is rarely a need to run with more than 4 CPUs -- definitely 48 is overkill. There are too many small bundles and StringTie blows through those very quickly, and because the current implementation takes one bundle per worker thread, most threads will spend their time waiting for each other in order to either grab the next bundle or to write the results (since large, complex bundles are quite rare).

This situation can be improved with some involved rewriting of the multi-threading code (to allow threads to grab many small bundles at a time), but I still think the benefit for most pipelines would be minimal, because it's still going to be a better (more efficient) use of computing cores to just run multiple stringtie processes (i.e. for multiple samples) with a small number of cores each, than using many threads on a single sample.

from stringtie.

cryptic0 avatar cryptic0 commented on September 28, 2024

If this is the case, the reference in the documentation of "-p 8" is also an overkill (Pertea et al 2016). I am experiencing this same issue with v1.3.4.

from stringtie.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.