GithubHelp home page GithubHelp logo

Comments (13)

kleingeo avatar kleingeo commented on July 19, 2024 1

Thank you for this. I believe this was the issue. I have been using nn.DataParallel and should upgrade to the distributed method.

from deepspeed.

jeffra avatar jeffra commented on July 19, 2024

Thanks for reporting this. We recently changed to auto initialize the distributed backend but forgot to update thus tutorial. You should be able to get around this by setting dist_init_required=False like you mention.

from deepspeed.

kleingeo avatar kleingeo commented on July 19, 2024

Thanks for reporting this. We recently changed to auto initialize the distributed backend but forgot to update thus tutorial. You should be able to get around this by setting dist_init_required=False like you mention.

I tried that, but there are parts that don't work, specifically in the _initialize_parameter_parallel_groups part in the initialization.

from deepspeed.

jeffra avatar jeffra commented on July 19, 2024

Are you running the cifar example with the deepspeed launcher e.g., deepspeed cifar10_deepspeed.py --deepspeed --deepspeed_config ds_config.json? I seem to be able to recreate your issue if i run with python cifar10_deepspeed.py --deepspeed --deepspeed_config ds_config.json but it works if I use deepspeed. Can you try that?

Since DeepSpeed and ZeRO are intended to run with >1 GPUs a lot of our focus has been on these environments. However, we should probably support running non-distributed mode and not using our deepspeed launcher for single GPU debugging.

from deepspeed.

kleingeo avatar kleingeo commented on July 19, 2024

I am using DeepSpeed within python with a just an import, so not using the DeepSpeed launcher. My intent is to use it with multiple GPU, but not on a distributed network, rather in a single node.

from deepspeed.

jeffra avatar jeffra commented on July 19, 2024

Gotcha, you can still use the deepspeed launcher even if you are not running on multiple nodes. It will attempt to launch on all local gpus (it will discover how many are available) by default in this case. You can also specific the number of gpus you want to launch on your local node via --num_gpus.

from deepspeed.

kleingeo avatar kleingeo commented on July 19, 2024

That won't be the easiest as I'm trying to use it in a pre-existing modelling pipeline I already have developed.

from deepspeed.

jeffra avatar jeffra commented on July 19, 2024

Does your existing modelling pipeline handle launching processes across multiple gpus? If so you'll need to satisfy the requirements of torch.distributed launching to get this to work. We did this recently to support mpirun launching (instead of our deepspeed launcher), you can see the variables that are needed see here: https://github.com/microsoft/DeepSpeed/blob/master/deepspeed/pt/deepspeed_light.py#L209-L213

from deepspeed.

kleingeo avatar kleingeo commented on July 19, 2024

I thought that the torch.distributed was meant more for running a multiple across multiple network-connected devices, rather than the model from multiple cards in a single box. Going through the documentation, I see that this may be a misconception (is that correct?). I will play around and look at the different environmental variables necessary to have torch.distributed work within a single machine. Maybe this was the problem I was having.

Edit: I'm still unclear why the mpi would be better than the nccl backend. Also, in the documentation, I thought that DeepSpeed should be able to work with a single GPU (ie, someone wants the benefits of APEX or other tools in-place). Are they required to still set-up a distributed process, even for a single GPU task?

from deepspeed.

jeffra avatar jeffra commented on July 19, 2024

We can support running 1-gpu without the DeepSpeed launcher, it's on our roadmap now. I'll be sure to update this thread once this support is added.

However, if you're going to want to run multi-gpu (single node) I highly recommend using torch.distributed. The old way of running multi-gpu single node was using nn.DataParallel however we have found significant performance benefits from using torch.distributed instead. One reason is that torch.distributed uses separate processes per GPU instead of sharing a single process across GPUs.

DeepSpeed runs uses torch.distributed with a NCCL back-end for comm collectives. We have recently added support for using MPI simply for launching processes, but in this case it still uses the NCCL torch.distributed back-end for all communication during training.

from deepspeed.

jeffra avatar jeffra commented on July 19, 2024

Feel free to re-open if needed. Otherwise I'll update this thread when we have 1-gpu support, probably more useful for testing in certain scenarios though.

from deepspeed.

stas00 avatar stas00 commented on July 19, 2024

@jeffra, at the very least if 1 gpu is not supported, could you please bail with a user-friendly error saying that non-multi-gpu is not supported?

Currently if fails with:

AssertionError: DeepSpeed requires integer command line parameter --local_rank

which is not documented anywhere as a user-side parameter.

$ CUDA_VISIBLE_DEVICES=1  deepspeed ... -deepspeed --deepspeed_config ds_config.json
  [...]
  File "/home/stas/anaconda3/envs/main-38/lib/python3.8/site-packages/deepspeed/__init__.py", line 109, in initialize
    engine = DeepSpeedEngine(args=args,
  File "/home/stas/anaconda3/envs/main-38/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 150, in __init__
    self._do_args_sanity_check(args)
  File "/home/stas/anaconda3/envs/main-38/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 527, in _do_args_sanity_check
    assert hasattr(args, 'local_rank') and type(args.local_rank) == int, \
AssertionError: DeepSpeed requires integer command line parameter --local_rank

My first card is rtx-3090 which doesn't seem to work with deepspeed - bails on NCCL error - so I tried to check with the 2nd older card only as a sanity check and then had to hunt down why it was failing with this error.

Thank you!

from deepspeed.

stas00 avatar stas00 commented on July 19, 2024

Well, this proved to be unrelated to this issue - one needs to forward --local_rank to deepspeed initialize's args - in the application I am trying to integrate deepspeed in it was goobled up by another consumer of argparser.

from deepspeed.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.