GithubHelp home page GithubHelp logo

Comments (5)

sea-shunned avatar sea-shunned commented on July 18, 2024

Hi Jake,

In your args, do you have use_parallel_startpoints = True? If so, this itself is doing some multiprocessing in the maximum likelihood part, which would explain the exploding threads! Disabling this (use_parallel_startpoints = False) should keep things consistent for your parallelization across folds. This may cause the individual runs to be slower, but doing 10 (in this example) simultaneously should be faster overall than internal parallelization.

If you did set use_parallel_startpoints = False, then...well, I'll have to have a think 😄

A possible route for the future may be, as you say, to embed finer detail (e.g. n_cpus) for each run, rather than a binary serial/parallelize, or just better exposing the options of pathos (which is used underneath), so thank you for bringing this up!

from pysustain.

illdopejake avatar illdopejake commented on July 18, 2024

Hi Cameron,

Thanks so much for looking into this. I appreciate you bringing the use_parallel_startpoints argument to my attention. I hadn't really paid much attention to it, and I'm glad to know about it now. But, in the instance where I encountered the error described in this issue, I actually has use_parallel_startpoints set to False. So there must be somewhere else in the code that is leading to all these processes initiating.

from pysustain.

sea-shunned avatar sea-shunned commented on July 18, 2024

That is interesting!

Underneath, numpy does some parallelization that is externally controlled depending on the libraries that the cluster is using, i.e. if it is using BLAS/OpenBLAS, MKL etc. This StackOverflow post explains a possible way to address this, which may be easy or hard to do, depending on the cluster setup!

from pysustain.

illdopejake avatar illdopejake commented on July 18, 2024

Great, I had no idea numpy was doing that sort of thing, I guess because seems like it only becomes relevant with processes on really large arrays? I will give this a try next time and will report back on whether it resolves the issues. Thanks again!

from pysustain.

sea-shunned avatar sea-shunned commented on July 18, 2024

Yep, and sometimes parallelization has more overhead cost than the time you save (if the arrays aren't big enough), so it can pay to adjust the settings.

I'll keep this issue open for now — if setting those environment variables or anything else does fix the issue please let us know!

from pysustain.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.