GithubHelp home page GithubHelp logo

Multi gpu support about atari HOT 9 CLOSED

kaixhin avatar kaixhin commented on August 15, 2024
Multi gpu support

from atari.

Comments (9)

Kaixhin avatar Kaixhin commented on August 15, 2024

Have you tried DataParallelTable at all (with NCCL)? With the default minibatch size of 32 I doubt it'll work that well, but worth double checking? If not then an optional 2 GPU switch for the target network might be decent.

As for a cluster, there's always DistLearn. Though I would rather focus on integrating single-machine async Q-learning first.

from atari.

lake4790k avatar lake4790k commented on August 15, 2024

@Kaixhin Haven't tried DataParallel with Dqn as I thought it would not speed up much the small convnet in Dqn (in supervised learning it only helps beyond a certain network size only). But I will give it a try to compare with the policy/target net split up.

from atari.

Kaixhin avatar Kaixhin commented on August 15, 2024

@lake4790k Worth seeing just in case (with NCCL for sure). How does this code interact with your async code? As this is GPU-only and the other is CPU-only I can imagine there'll be a lot of added complexity with both.

from atari.

lake4790k avatar lake4790k commented on August 15, 2024

@Kaixhin I think the multi gpu support is not that complicated to add in the existing Atari (master) code. The async mode needs more refactoring, but doesn't need any of gpu related functionality as should be CPU only. So that I would do in the separate async branch for now.

It's definitely a challenge to support all the modes in a single codebase (but makes sense). Maybe I'll add some basic testcase (eg. catch) which can be run fast to see if nothing is broken by adding new stuff on top...

from atari.

lake4790k avatar lake4790k commented on August 15, 2024

@Kaixhin strange when I first set up multigpu I compared with and without nccl and saw no speed difference (used the torch blog CIFAR-10 code with R4), I have nccl installed so will test with that.

from atari.

lake4790k avatar lake4790k commented on August 15, 2024

I had a quick look at the speed of running the policy and target nets in parallel, but in the Atari code didn't see much speed difference. This could be because I tried a bigger network before or could be in Atari the memory access is also a dominant factor not only the network forwards.

I'll try the DataParallelTable approach later, but could be the Atari convnet is not big enough to gain much, in which case there's no point in complicating the code. One can also just run multiple separate experiments on multiple gpus, that scales perfectly...

from atari.

Kaixhin avatar Kaixhin commented on August 15, 2024

I also think that in this setup there's a lot of overhead from other sources. Unless DataParallelTable produces significant gains (unlikely), go ahead and close this issue. Probably not worth the extra complication of implementing this.

from atari.

Kaixhin avatar Kaixhin commented on August 15, 2024

@lake4790k Any update? Think we can close this unless you want to try more experiments.

from atari.

lake4790k avatar lake4790k commented on August 15, 2024

Haven't look at this further, but agreed, I'll close this as if one has multiple gpus, best use is to just run multiple experiments to make best use of the resources. Makes more sense to work on algorithmic improvements than this.

from atari.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.