GithubHelp home page GithubHelp logo

Updated code with load_state_dict about deq HOT 4 CLOSED

locuslab avatar locuslab commented on July 21, 2024
Updated code with load_state_dict

from deq.

Comments (4)

jerrybai1995 avatar jerrybai1995 commented on July 21, 2024

Hi, the cloning and copying was actually a workaround for the older version of PyTorch where it was not allowed to call backward within a custom backward function (recall that a DEQ's backward requires vector-Jacobian products).

With the latest versions of PyTorch, this code can be significantly simplified and we no longer need files such as deq.py or deq_transformer_module.py, and there will be no need to clone or copy. Instead, we can use backward hook and autograd.grad. I have been planning to make a major renovation to the repo but never got the chance to do so. Maybe I'll do that in the next few days ;-)

That being said, if you are interested in what the new implementation would be like (with hook and autograd.grad), you can take a look at the tutorial code from NeurIPS 2020: http://implicit-layers-tutorial.org/deep_equilibrium_models/

from deq.

sarthmit avatar sarthmit commented on July 21, 2024

Thanks for the quick response! I will wait for the updated code then, I want to use the WikiText-103 Transformer code and the tutorial code has only some basic image classification examples. Looking forward to when the major renovation comes, it would be super helpful :)

Thank you!

from deq.

jerrybai1995 avatar jerrybai1995 commented on July 21, 2024

Hi @sarthmit,

I have updated the code to the beta branch of this repo. Since only the implementation for the Transformer instantiation is available now, and I haven't been able to fully test out the cleaner implementation on all experimental settings, it'll probably be merged with the master branch later this year.

After you check out to the beta branch of the repo (i.e., git pull followed by git checkout beta), you can download the pretrained DEQ-Transformer model (use the link in the beta branch README!), and run:

bash run_wt103_deq_transformer.sh train --debug --data ../data/wikitext-103 --f_thres 30 --eval --load [PRETRAINED_FILE].pkl --mem_len 300 --pretrain_step 0

It should give you something like 23.2ppl on WT103.

Please let me know if you have any issue running with this new implementation!!

from deq.

jerrybai1995 avatar jerrybai1995 commented on July 21, 2024

@sarthmit I'm closing this issue but if you have trouble with the cleaner version code, feel free to re-open it!

from deq.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.