Comments (5)
Hi Jake,
In your args
, do you have use_parallel_startpoints = True
? If so, this itself is doing some multiprocessing in the maximum likelihood part, which would explain the exploding threads! Disabling this (use_parallel_startpoints = False
) should keep things consistent for your parallelization across folds. This may cause the individual runs to be slower, but doing 10 (in this example) simultaneously should be faster overall than internal parallelization.
If you did set use_parallel_startpoints = False
, then...well, I'll have to have a think 😄
A possible route for the future may be, as you say, to embed finer detail (e.g. n_cpus) for each run, rather than a binary serial/parallelize, or just better exposing the options of pathos (which is used underneath), so thank you for bringing this up!
from pysustain.
Hi Cameron,
Thanks so much for looking into this. I appreciate you bringing the use_parallel_startpoints
argument to my attention. I hadn't really paid much attention to it, and I'm glad to know about it now. But, in the instance where I encountered the error described in this issue, I actually has use_parallel_startpoints
set to False
. So there must be somewhere else in the code that is leading to all these processes initiating.
from pysustain.
That is interesting!
Underneath, numpy
does some parallelization that is externally controlled depending on the libraries that the cluster is using, i.e. if it is using BLAS/OpenBLAS, MKL etc. This StackOverflow post explains a possible way to address this, which may be easy or hard to do, depending on the cluster setup!
from pysustain.
Great, I had no idea numpy was doing that sort of thing, I guess because seems like it only becomes relevant with processes on really large arrays? I will give this a try next time and will report back on whether it resolves the issues. Thanks again!
from pysustain.
Yep, and sometimes parallelization has more overhead cost than the time you save (if the arrays aren't big enough), so it can pay to adjust the settings.
I'll keep this issue open for now — if setting those environment variables or anything else does fix the issue please let us know!
from pysustain.
Related Issues (20)
- multiple sclerosis HOT 3
- Fixing controls in GMM HOT 2
- `use_parallel_startpoints` fails on numpy 1.21 HOT 6
- Fix for "rare" divide by zero problem HOT 6
- Idea: SusStaIn constraint with longitudinal measures HOT 6
- Adding a colourbar to PVD plots HOT 6
- IndexError while running the SuStaIn Workshop file HOT 2
- Data Preparation Pipeline/Code HOT 5
- How to interpret the Positional Variance Diagram HOT 1
- Question on Using pySuStaIn on ADNI HOT 3
- Parallel CV doesn't work (aka "Why do all my CV jobs run for fold 0 only???") HOT 1
- Ordinal Sustain Notebook HOT 3
- Mislabelled subtype numbers in PVDs HOT 9
- Parallelization fails -- TypeError: cannot pickle '_abc._abc_data' object HOT 4
- Allow for complete model reloading HOT 2
- minor installation issue with sklearn HOT 1
- [Question] Can we discover subtypes in a training test, and use the discovered subtypes to subtype subjects of a test set? HOT 1
- Example code for mixture_KDE HOT 3
- ValueError in AbstractSuStaIn HOT 10
- Enabling PVD Plot Legends HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from pysustain.