uds-lsv / bert-stable-fine-tuning Goto Github PK

View Code? Open in Web Editor NEW

129.0 129.0 21.0 3.07 MB

On the Stability of Fine-tuning BERT: Misconceptions, Explanations, and Strong Baselines

Home Page: https://arxiv.org/abs/2006.04884

License: Apache License 2.0

Makefile 0.02% Shell 0.02% Dockerfile 0.13% Python 97.82% Jupyter Notebook 2.01%

bert fine-tuning nlp

bert-stable-fine-tuning's People

Contributors

Stargazers

Watchers

bert-stable-fine-tuning's Issues

Did RoBERTa longer trained?

Thank you for great work and paper!

[1] I wonder if you guys trained RoBERTa longer than 3 epochs?
[2] Is there any log other than one included in the paper?

Thank you for your excellent work!
I just found a difference between Adam Epsilon that the paper state as 1e-6,
while the example scripts on this repo are set to default as 1e-8,
and all the instructions in the examples dir use the default value as 1e-8.

Do you have any instruction/recommendations for the value?

Thank you so much!

the std of what? and the mean of what? ......

I hava read the paper and It is really cool
but I hava a problem, when it refers to table1, what does the std, mean and max mean? ie, the std of what? and the mean of what? ......
which of them is the lower the better and which is the higher the better? why?
thank you very much

Loss surface axis

Hi Authors,

Thanks for your impressive paper. I'm very interested in your implementation of the loss surfaces. I have checked the original loss surface paper Li et al., 2018. I was wondering why you set the axis to θf−θp and θs−θp in Figure 7.
In my understanding, you are using them as two directions instead of random directions. But why θf locate at θf−θp=1 and θs locate at θs−θp=1.
Could you explain more on this and hopefully share your code for generating this surface?
Also, in your paper, you said that there is a barrier between θf θs. However, it looks like there also exists a similar barrier between θf θp. If so, how θp gradually reach θf?

Looking forward to your reply.

madamW

Thanks for your great work. I was wondering can I simply use adamW from the transformer library and set bias_correction to True. Or should I use madamW from your repo?

uds-lsv / bert-stable-fine-tuning Goto Github PK

bert-stable-fine-tuning's People

Contributors

Stargazers

Watchers

Forkers

bert-stable-fine-tuning's Issues

Did RoBERTa longer trained?

Adam Epsilon Choice

the std of what? and the mean of what? ......

Loss surface axis

madamW

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs