Describe the bug: We have a setup where we have s

i have the same bug you can find it just by: !stress -c 10

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Jupyter Notebook and Jupyter Lab seems to use only one CPU in multi-cpu configs about ml-workspace HOT 7 CLOSED

andr0idsensei commented on May 22, 2024

Jupyter Notebook and Jupyter Lab seems to use only one CPU in multi-cpu configs

from ml-workspace.

Comments (7)

andr0idsensei commented on May 22, 2024 2

Thanks Lukas. I haven't checked with the latest version, but I found that it was caused by setting thread affinity via the KMP_AFFINITY env variable. For me it worked if I set the value to empty.

from ml-workspace.

qixing-ai commented on May 22, 2024 2

i have the same bug
you can find it just by:

!stress -c 10
import torch
!stress -c 10
step1 show out 100% of 10 cpu
step3 show out 10% of 10 cpu

and i find that bug only in jupyter notebook.

docker version is 20
ml version is last

and its fine by:

import os
os.environ ['KMP_AFFINITY'] = ''

from ml-workspace.

LukasMasuch commented on May 22, 2024 1

@andr0idsensei I just did some evaluation on different machines with your code, but was not able to replicate the issue (atleast with the current build version of workspace: mltooling/ml-workspace:0.9.0-SNAPSHOT).

For example, here I tried your code within JupyterLab with two different Pool sizes and both seem to be able to use the available CPUs.

Have you checked if other Docker containers have access to the full count of CPUs (maybe the Docker daemon has some limiting configuration)? Have you used a CPU limit on the workspace container?

from ml-workspace.

LukasMasuch commented on May 22, 2024 1

Closing this issue for now since it does not seem to be happening with the latest workspace version. Feel free to reopen the issue if you can still replicate it with the latest version.

from ml-workspace.

LukasMasuch commented on May 22, 2024 1

@andr0idsensei Thanks, thats very valuable feedback. The KMP settings where adapted from the recommendations of Intel: https://software.intel.com/en-us/articles/maximize-tensorflow-performance-on-cpu-considerations-and-recommendations-for-inference

Unfortunately, I don't have a deep understanding of how KMP_AFFINITY works, but it seems like that for some hardware setups this setting does not actually optimize the performance.

from ml-workspace.

LukasMasuch commented on May 22, 2024 1

I'm a bit confused that the KMP_AFFINITY actually has an effect on the python multiprocessing module. I don't think that the multiprocessing module actually uses any OpenMP code.

from ml-workspace.

andr0idsensei commented on May 22, 2024

Actually, I think that we might have messed up the initial multiprocess test somehow, and multiprocessing may not be affected by that (we haven't re-tested that though). The fact that you mentioned for you the bug was not reproducible, made me think a bit more about this. What we actually noticed was that when we used FastAI and Pytorch to train some models, that we usually trained on a multi-cpu and multi-gpu rig, using ml-workspace training was slower and not all CPU's were used. We tried to replicate that with the Pool example in the initial bug report, and at first it seemed to reproduce. However I think, I tried it once before finding out about KMP_AFFINITY, and I couldn't reproduce it. Since I don't think FastAI and Pytorch use multiprocessing (also haven't verified that) it would make sense that they were affected by KMP_AFFINITY.

from ml-workspace.

Jupyter Notebook and Jupyter Lab seems to use only one CPU in multi-cpu configs about ml-workspace HOT 7 CLOSED

Comments (7)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs