Hi, I wanted to know more about the codebase. Why does it have cloning and copying of

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Updated code with load_state_dict about deq HOT 4 CLOSED

locuslab commented on July 21, 2024

Updated code with load_state_dict

from deq.

Comments (4)

jerrybai1995 commented on July 21, 2024

Hi, the cloning and copying was actually a workaround for the older version of PyTorch where it was not allowed to call backward within a custom backward function (recall that a DEQ's backward requires vector-Jacobian products).

With the latest versions of PyTorch, this code can be significantly simplified and we no longer need files such as deq.py or deq_transformer_module.py, and there will be no need to clone or copy. Instead, we can use backward hook and autograd.grad. I have been planning to make a major renovation to the repo but never got the chance to do so. Maybe I'll do that in the next few days ;-)

That being said, if you are interested in what the new implementation would be like (with hook and autograd.grad), you can take a look at the tutorial code from NeurIPS 2020: http://implicit-layers-tutorial.org/deep_equilibrium_models/

from deq.

sarthmit commented on July 21, 2024

Thanks for the quick response! I will wait for the updated code then, I want to use the WikiText-103 Transformer code and the tutorial code has only some basic image classification examples. Looking forward to when the major renovation comes, it would be super helpful :)

Thank you!

from deq.

jerrybai1995 commented on July 21, 2024

Hi @sarthmit,

I have updated the code to the beta branch of this repo. Since only the implementation for the Transformer instantiation is available now, and I haven't been able to fully test out the cleaner implementation on all experimental settings, it'll probably be merged with the master branch later this year.

After you check out to the beta branch of the repo (i.e., git pull followed by git checkout beta), you can download the pretrained DEQ-Transformer model (use the link in the beta branch README!), and run:

bash run_wt103_deq_transformer.sh train --debug --data ../data/wikitext-103 --f_thres 30 --eval --load [PRETRAINED_FILE].pkl --mem_len 300 --pretrain_step 0

It should give you something like 23.2ppl on WT103.

Please let me know if you have any issue running with this new implementation!!

from deq.

jerrybai1995 commented on July 21, 2024

@sarthmit I'm closing this issue but if you have trouble with the cleaner version code, feel free to re-open it!

from deq.

Recommend Projects

Updated code with load_state_dict about deq HOT 4 CLOSED

Comments (4)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs