Comments (7)
Thanks Lukas. I haven't checked with the latest version, but I found that it was caused by setting thread affinity via the KMP_AFFINITY env variable. For me it worked if I set the value to empty.
from ml-workspace.
i have the same bug
you can find it just by:
- !stress -c 10
- import torch
- !stress -c 10
step1 show out 100% of 10 cpu
step3 show out 10% of 10 cpu
and i find that bug only in jupyter notebook.
docker version is 20
ml version is last
and its fine by:
import os
os.environ ['KMP_AFFINITY'] = ''
from ml-workspace.
@andr0idsensei I just did some evaluation on different machines with your code, but was not able to replicate the issue (atleast with the current build version of workspace: mltooling/ml-workspace:0.9.0-SNAPSHOT
).
For example, here I tried your code within JupyterLab with two different Pool sizes and both seem to be able to use the available CPUs.
Have you checked if other Docker containers have access to the full count of CPUs (maybe the Docker daemon has some limiting configuration)? Have you used a CPU limit on the workspace container?
from ml-workspace.
Closing this issue for now since it does not seem to be happening with the latest workspace version. Feel free to reopen the issue if you can still replicate it with the latest version.
from ml-workspace.
@andr0idsensei Thanks, thats very valuable feedback. The KMP
settings where adapted from the recommendations of Intel: https://software.intel.com/en-us/articles/maximize-tensorflow-performance-on-cpu-considerations-and-recommendations-for-inference
Unfortunately, I don't have a deep understanding of how KMP_AFFINITY
works, but it seems like that for some hardware setups this setting does not actually optimize the performance.
from ml-workspace.
I'm a bit confused that the KMP_AFFINITY
actually has an effect on the python multiprocessing module. I don't think that the multiprocessing module actually uses any OpenMP
code.
from ml-workspace.
Actually, I think that we might have messed up the initial multiprocess test somehow, and multiprocessing may not be affected by that (we haven't re-tested that though). The fact that you mentioned for you the bug was not reproducible, made me think a bit more about this. What we actually noticed was that when we used FastAI and Pytorch to train some models, that we usually trained on a multi-cpu and multi-gpu rig, using ml-workspace training was slower and not all CPU's were used. We tried to replicate that with the Pool example in the initial bug report, and at first it seemed to reproduce. However I think, I tried it once before finding out about KMP_AFFINITY, and I couldn't reproduce it. Since I don't think FastAI and Pytorch use multiprocessing (also haven't verified that) it would make sense that they were affected by KMP_AFFINITY.
from ml-workspace.
Related Issues (20)
- Trying to get in touch regarding a security issue HOT 3
- rstudio-server web upgrade HOT 1
- How to use gpu version with 11.0 nvidia-cuda version? HOT 3
- Can't use ml-tooling dockerized vnc server as display for other containers on the same docker network and same compose file. HOT 1
- Update? HOT 12
- Image without Anaconda HOT 3
- Kernel crashing on Apple M1 and Docker Desktop 4.5.0 HOT 4
- Documentation improvement for SSL using Let's encrypt HOT 2
- add Spyder install script? HOT 2
- It takes long time to load the Jupyter webpage on an offline server from another offline machine HOT 2
- can you add linux/arm64 platform architecture support for current image HOT 1
- unusual login behaviour with nginx access lists HOT 1
- Using ml-workspace with AWS EMR and AWS Sagemaker HOT 1
- How to install CUDA 11 or later? (Currently installed as 10.1) HOT 1
- how can I update the rstudio-server version? HOT 1
- I'm not able to access a file inside a directory in workspace like workspace/folder1/file1.How can this be done? HOT 2
- Error "No NVIDIA driver on your system" HOT 2
- Error "Just wait a minute, some setups are going on in the background" HOT 1
- Updating requirements.txt HOT 3
- Connecting to VNC? HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from ml-workspace.