Hiya Leon et al., I ran into an interesting issue when the sys admin

That is interesting! Underneath, numpy</co

Cross validation -- option to control # of threads about pysustain HOT 5 OPEN

ucl-pond commented on July 18, 2024

Cross validation -- option to control # of threads

from pysustain.

Comments (5)

sea-shunned commented on July 18, 2024

Hi Jake,

In your args, do you have use_parallel_startpoints = True? If so, this itself is doing some multiprocessing in the maximum likelihood part, which would explain the exploding threads! Disabling this (use_parallel_startpoints = False) should keep things consistent for your parallelization across folds. This may cause the individual runs to be slower, but doing 10 (in this example) simultaneously should be faster overall than internal parallelization.

If you did set use_parallel_startpoints = False, then...well, I'll have to have a think 😄

A possible route for the future may be, as you say, to embed finer detail (e.g. n_cpus) for each run, rather than a binary serial/parallelize, or just better exposing the options of pathos (which is used underneath), so thank you for bringing this up!

from pysustain.

illdopejake commented on July 18, 2024

Hi Cameron,

Thanks so much for looking into this. I appreciate you bringing the use_parallel_startpoints argument to my attention. I hadn't really paid much attention to it, and I'm glad to know about it now. But, in the instance where I encountered the error described in this issue, I actually has use_parallel_startpoints set to False. So there must be somewhere else in the code that is leading to all these processes initiating.

from pysustain.

sea-shunned commented on July 18, 2024

That is interesting!

Underneath, numpy does some parallelization that is externally controlled depending on the libraries that the cluster is using, i.e. if it is using BLAS/OpenBLAS, MKL etc. This StackOverflow post explains a possible way to address this, which may be easy or hard to do, depending on the cluster setup!

from pysustain.

illdopejake commented on July 18, 2024

Great, I had no idea numpy was doing that sort of thing, I guess because seems like it only becomes relevant with processes on really large arrays? I will give this a try next time and will report back on whether it resolves the issues. Thanks again!

from pysustain.

sea-shunned commented on July 18, 2024

Yep, and sometimes parallelization has more overhead cost than the time you save (if the arrays aren't big enough), so it can pay to adjust the settings.

I'll keep this issue open for now — if setting those environment variables or anything else does fix the issue please let us know!

from pysustain.

Recommend Projects

Cross validation -- option to control # of threads about pysustain HOT 5 OPEN

Comments (5)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs