When deploying the bigscience/bloom-3b (in fp32) via

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

<a class="user-mention notranslate" data-hovercard-type="user" data-hover

OOM Error when deploying BLOOM-3B on 16GB GPU via MII about deepspeed-mii HOT 6 CLOSED

microsoft commented on June 22, 2024 1

OOM Error when deploying BLOOM-3B on 16GB GPU via MII

from deepspeed-mii.

Comments (6)

Tianwei-She commented on June 22, 2024

I'm having the same issue. would like to know the answer too

from deepspeed-mii.

mrwyattii commented on June 22, 2024

I can confirm that I'm able to reproduce this on an A6000 as well. With MII: VRAM usage is ~18GB, with tranformers.pipeline: VRAM usage is ~12GB.

The difference is unexpectedly large, and we are investigating the cause. I'll also note that this is not the case for all models. I just tested a few and many have the same memory usage.

from deepspeed-mii.

mrwyattii commented on June 22, 2024

@marshmellow77 it appears that the OOM you are seeing when using MII is due to the need for extra VRAM when injecting kernels with DeepSpeed-Inference. This can be avoided by loading the model onto system memory rather than GPU memory before using DeepSpeed-Inference. #105 adds the option to allow users to do this if you could give it a try and let me know the results:

Install this version of MII
pip install git+https://github.com/microsoft/deepspeed-mii@mrwyattii/address-poor-vram-usage

and add the following to your mii_configs: "load_with_sys_mem": True

from deepspeed-mii.

marshmellow77 commented on June 22, 2024

I can confirm that the mdeol now load into GPU. I can't test the text generation because of #102, but I believe this issue can be closed.

from deepspeed-mii.

satpalsr commented on June 22, 2024

@mrwyattii Can we not later release the extra gpu memory used when load_with_sys_mem is False?
For smaller models that would at least allow more free memory.

from deepspeed-mii.

mrwyattii commented on June 22, 2024

@mrwyattii Can we not later release the extra gpu memory used when load_with_sys_mem is False? For smaller models that would at least allow more free memory.

DeepSpeed-inference will release the extra memory after kernel injection happens

from deepspeed-mii.

Recommend Projects

OOM Error when deploying BLOOM-3B on 16GB GPU via MII about deepspeed-mii HOT 6 CLOSED

Comments (6)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs