Hi, do you have any ideas for running the code for pytorch v1.5 with data parallel?</p

Any progress for pytorch 1.5? about deq HOT 4 CLOSED

locuslab commented on July 21, 2024

Any progress for pytorch 1.5?

from deq.

Comments (4)

jerrybai1995 commented on July 21, 2024 1

Yes, that is a design choice due to PyTorch's nn.DataParallel. If you use only 1 GPU (i.e., no nn.DataParallel), then you are able to do the actual implicit differentiation all in the backward() in deq.py and without func_copy. You can simply do it through one layer, as we hoped.

However, the weird thing we found was, once nn.DataParallel was invoked, the parameter gradients on the replica will all vanish. In other words, the gradients computed in the backward() will disappear. This happened in PyTorch 1.4, I'm not so sure about 1.5. But anyway, that was the rationale behind this design choice; we found no good choice but to leave a func_copy there for the Jacobian-vector product part computation.

Indeed, once we are able to solve this issue, we will have much better memory efficiency than the ones reported in our paper.

from deq.

jerrybai1995 commented on July 21, 2024

Yup, I have pushed a branch named "pytorch-1.5" for the repo. Please pull the repo, do git checkout pytorch-1.5 and train the model there. Also, see the updated README on what's been changed.

Let me know if it works!

from deq.

LuChengTHU commented on July 21, 2024

Thanks! And I'm confused about the 'func_copy' model. It seems that we need to use 2x GPU memory because of this implementation. Is there a more efficient way of implementing the backward method?

from deq.

LuChengTHU commented on July 21, 2024

Thanks! I'm waiting for the better implementation!

from deq.

Recommend Projects

Any progress for pytorch 1.5? about deq HOT 4 CLOSED

Comments (4)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs