I am trying to get DeepSpeed working and have been following the CIFAR tutorial exampl

Thank you for this. I believe this was the issue. I have been using <code class="notra

Following CIFAR Tutorial but Code Forcing RANK variable about deepspeed HOT 13 CLOSED

microsoft commented on July 19, 2024

Following CIFAR Tutorial but Code Forcing RANK variable

from deepspeed.

Comments (13)

kleingeo commented on July 19, 2024 1

Thank you for this. I believe this was the issue. I have been using nn.DataParallel and should upgrade to the distributed method.

from deepspeed.

jeffra commented on July 19, 2024

Thanks for reporting this. We recently changed to auto initialize the distributed backend but forgot to update thus tutorial. You should be able to get around this by setting dist_init_required=False like you mention.

from deepspeed.

kleingeo commented on July 19, 2024

Thanks for reporting this. We recently changed to auto initialize the distributed backend but forgot to update thus tutorial. You should be able to get around this by setting dist_init_required=False like you mention.

I tried that, but there are parts that don't work, specifically in the _initialize_parameter_parallel_groups part in the initialization.

from deepspeed.

jeffra commented on July 19, 2024

Are you running the cifar example with the deepspeed launcher e.g., deepspeed cifar10_deepspeed.py --deepspeed --deepspeed_config ds_config.json? I seem to be able to recreate your issue if i run with python cifar10_deepspeed.py --deepspeed --deepspeed_config ds_config.json but it works if I use deepspeed. Can you try that?

Since DeepSpeed and ZeRO are intended to run with >1 GPUs a lot of our focus has been on these environments. However, we should probably support running non-distributed mode and not using our deepspeed launcher for single GPU debugging.

from deepspeed.

kleingeo commented on July 19, 2024

I am using DeepSpeed within python with a just an import, so not using the DeepSpeed launcher. My intent is to use it with multiple GPU, but not on a distributed network, rather in a single node.

from deepspeed.

jeffra commented on July 19, 2024

Gotcha, you can still use the deepspeed launcher even if you are not running on multiple nodes. It will attempt to launch on all local gpus (it will discover how many are available) by default in this case. You can also specific the number of gpus you want to launch on your local node via --num_gpus.

from deepspeed.

kleingeo commented on July 19, 2024

That won't be the easiest as I'm trying to use it in a pre-existing modelling pipeline I already have developed.

from deepspeed.

jeffra commented on July 19, 2024

Does your existing modelling pipeline handle launching processes across multiple gpus? If so you'll need to satisfy the requirements of torch.distributed launching to get this to work. We did this recently to support mpirun launching (instead of our deepspeed launcher), you can see the variables that are needed see here: https://github.com/microsoft/DeepSpeed/blob/master/deepspeed/pt/deepspeed_light.py#L209-L213

from deepspeed.

kleingeo commented on July 19, 2024

I thought that the torch.distributed was meant more for running a multiple across multiple network-connected devices, rather than the model from multiple cards in a single box. Going through the documentation, I see that this may be a misconception (is that correct?). I will play around and look at the different environmental variables necessary to have torch.distributed work within a single machine. Maybe this was the problem I was having.

Edit: I'm still unclear why the mpi would be better than the nccl backend. Also, in the documentation, I thought that DeepSpeed should be able to work with a single GPU (ie, someone wants the benefits of APEX or other tools in-place). Are they required to still set-up a distributed process, even for a single GPU task?

from deepspeed.

jeffra commented on July 19, 2024

We can support running 1-gpu without the DeepSpeed launcher, it's on our roadmap now. I'll be sure to update this thread once this support is added.

However, if you're going to want to run multi-gpu (single node) I highly recommend using torch.distributed. The old way of running multi-gpu single node was using nn.DataParallel however we have found significant performance benefits from using torch.distributed instead. One reason is that torch.distributed uses separate processes per GPU instead of sharing a single process across GPUs.

DeepSpeed runs uses torch.distributed with a NCCL back-end for comm collectives. We have recently added support for using MPI simply for launching processes, but in this case it still uses the NCCL torch.distributed back-end for all communication during training.

from deepspeed.

jeffra commented on July 19, 2024

Feel free to re-open if needed. Otherwise I'll update this thread when we have 1-gpu support, probably more useful for testing in certain scenarios though.

from deepspeed.

stas00 commented on July 19, 2024

@jeffra, at the very least if 1 gpu is not supported, could you please bail with a user-friendly error saying that non-multi-gpu is not supported?

Currently if fails with:

AssertionError: DeepSpeed requires integer command line parameter --local_rank

which is not documented anywhere as a user-side parameter.

$ CUDA_VISIBLE_DEVICES=1  deepspeed ... -deepspeed --deepspeed_config ds_config.json
  [...]
  File "/home/stas/anaconda3/envs/main-38/lib/python3.8/site-packages/deepspeed/__init__.py", line 109, in initialize
    engine = DeepSpeedEngine(args=args,
  File "/home/stas/anaconda3/envs/main-38/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 150, in __init__
    self._do_args_sanity_check(args)
  File "/home/stas/anaconda3/envs/main-38/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 527, in _do_args_sanity_check
    assert hasattr(args, 'local_rank') and type(args.local_rank) == int, \
AssertionError: DeepSpeed requires integer command line parameter --local_rank

My first card is rtx-3090 which doesn't seem to work with deepspeed - bails on NCCL error - so I tried to check with the 2nd older card only as a sanity check and then had to hunt down why it was failing with this error.

Thank you!

from deepspeed.

stas00 commented on July 19, 2024

Well, this proved to be unrelated to this issue - one needs to forward --local_rank to deepspeed initialize's args - in the application I am trying to integrate deepspeed in it was goobled up by another consumer of argparser.

from deepspeed.

Following CIFAR Tutorial but Code Forcing RANK variable about deepspeed HOT 13 CLOSED

Comments (13)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs