GithubHelp home page GithubHelp logo

Comments (7)

andr0idsensei avatar andr0idsensei commented on May 22, 2024 2

Thanks Lukas. I haven't checked with the latest version, but I found that it was caused by setting thread affinity via the KMP_AFFINITY env variable. For me it worked if I set the value to empty.

from ml-workspace.

qixing-ai avatar qixing-ai commented on May 22, 2024 2

i have the same bug
you can find it just by:

  1. !stress -c 10
  2. import torch
  3. !stress -c 10
    step1 show out 100% of 10 cpu
    step3 show out 10% of 10 cpu

and i find that bug only in jupyter notebook.

docker version is 20
ml version is last

and its fine by:

import os
os.environ ['KMP_AFFINITY'] = ''

from ml-workspace.

LukasMasuch avatar LukasMasuch commented on May 22, 2024 1

@andr0idsensei I just did some evaluation on different machines with your code, but was not able to replicate the issue (atleast with the current build version of workspace: mltooling/ml-workspace:0.9.0-SNAPSHOT).

For example, here I tried your code within JupyterLab with two different Pool sizes and both seem to be able to use the available CPUs.

Screenshot at Jan 29 10-19-27

Have you checked if other Docker containers have access to the full count of CPUs (maybe the Docker daemon has some limiting configuration)? Have you used a CPU limit on the workspace container?

from ml-workspace.

LukasMasuch avatar LukasMasuch commented on May 22, 2024 1

Closing this issue for now since it does not seem to be happening with the latest workspace version. Feel free to reopen the issue if you can still replicate it with the latest version.

from ml-workspace.

LukasMasuch avatar LukasMasuch commented on May 22, 2024 1

@andr0idsensei Thanks, thats very valuable feedback. The KMP settings where adapted from the recommendations of Intel: https://software.intel.com/en-us/articles/maximize-tensorflow-performance-on-cpu-considerations-and-recommendations-for-inference

Unfortunately, I don't have a deep understanding of how KMP_AFFINITY works, but it seems like that for some hardware setups this setting does not actually optimize the performance.

from ml-workspace.

LukasMasuch avatar LukasMasuch commented on May 22, 2024 1

I'm a bit confused that the KMP_AFFINITY actually has an effect on the python multiprocessing module. I don't think that the multiprocessing module actually uses any OpenMP code.

from ml-workspace.

andr0idsensei avatar andr0idsensei commented on May 22, 2024

Actually, I think that we might have messed up the initial multiprocess test somehow, and multiprocessing may not be affected by that (we haven't re-tested that though). The fact that you mentioned for you the bug was not reproducible, made me think a bit more about this. What we actually noticed was that when we used FastAI and Pytorch to train some models, that we usually trained on a multi-cpu and multi-gpu rig, using ml-workspace training was slower and not all CPU's were used. We tried to replicate that with the Pool example in the initial bug report, and at first it seemed to reproduce. However I think, I tried it once before finding out about KMP_AFFINITY, and I couldn't reproduce it. Since I don't think FastAI and Pytorch use multiprocessing (also haven't verified that) it would make sense that they were affected by KMP_AFFINITY.

from ml-workspace.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.