Describe the bug i can only use pytorch to training model with Qw

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

RuntimeError: Error(s) in loading state_dict about deepspeed HOT 3 OPEN

lxd551326 commented on July 22, 2024 1

RuntimeError: Error(s) in loading state_dict

from deepspeed.

DavidYanAnDe commented on July 22, 2024 1

I have the same question in deepspeed stage3 but for the shape in current model is torch.Size([0]), please someone help us. T_T

from deepspeed.

tjruwase commented on July 22, 2024

@lxd551326, it seems you seeing two different issues.

CUDA OOM using DeepSpeed for a model that works with pure pytorch is very strange and should be investigated. Can you provide more repro details for that?
The checkpoint loading problem seems to be due to a mismatch between checkpoint and model definition. Can you check that it works with pytorch only?

For both above cases, it would be very helpful if you provide repro steps?

from deepspeed.

lhyscau commented on July 22, 2024

DeepSpeed for a model that works with pure pytorch is very stran

Have you solved the problem? I meet it too. The shape is correct in my program without using deepspeed.

from deepspeed.