GithubHelp home page GithubHelp logo

uds-lsv / bert-stable-fine-tuning Goto Github PK

View Code? Open in Web Editor NEW
129.0 129.0 21.0 3.07 MB

On the Stability of Fine-tuning BERT: Misconceptions, Explanations, and Strong Baselines

Home Page: https://arxiv.org/abs/2006.04884

License: Apache License 2.0

Makefile 0.02% Shell 0.02% Dockerfile 0.13% Python 97.82% Jupyter Notebook 2.01%
bert fine-tuning nlp

bert-stable-fine-tuning's People

Contributors

mmarius avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

bert-stable-fine-tuning's Issues

Did RoBERTa longer trained?

Thank you for great work and paper!

[1] I wonder if you guys trained RoBERTa longer than 3 epochs?
[2] Is there any log other than one included in the paper?

Adam Epsilon Choice

Hi authors,

Thank you for your excellent work!
I just found a difference between Adam Epsilon that the paper state as 1e-6,
while the example scripts on this repo are set to default as 1e-8,
and all the instructions in the examples dir use the default value as 1e-8.

Do you have any instruction/recommendations for the value?

Thank you so much!

Screen Shot 2021-12-17 at 12 07 15 PM

Screen Shot 2021-12-17 at 12 07 09 PM

the std of what? and the mean of what? ......

I hava read the paper and It is really cool
but I hava a problem, when it refers to table1, what does the std, mean and max mean? ie, the std of what? and the mean of what? ......
which of them is the lower the better and which is the higher the better? why?
thank you very much

Loss surface axis

Hi Authors,

Thanks for your impressive paper. I'm very interested in your implementation of the loss surfaces. I have checked the original loss surface paper Li et al., 2018. I was wondering why you set the axis to θf−θp and θs−θp in Figure 7.
In my understanding, you are using them as two directions instead of random directions. But why θf locate at θf−θp=1 and θs locate at θs−θp=1.
Could you explain more on this and hopefully share your code for generating this surface?
Also, in your paper, you said that there is a barrier between θf θs. However, it looks like there also exists a similar barrier between θf θp. If so, how θp gradually reach θf?

Looking forward to your reply.

madamW

Thanks for your great work. I was wondering can I simply use adamW from the transformer library and set bias_correction to True. Or should I use madamW from your repo?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.