uds-lsv / bert-stable-fine-tuning Goto Github PK
View Code? Open in Web Editor NEWOn the Stability of Fine-tuning BERT: Misconceptions, Explanations, and Strong Baselines
Home Page: https://arxiv.org/abs/2006.04884
License: Apache License 2.0
On the Stability of Fine-tuning BERT: Misconceptions, Explanations, and Strong Baselines
Home Page: https://arxiv.org/abs/2006.04884
License: Apache License 2.0
Thank you for great work and paper!
[1] I wonder if you guys trained RoBERTa longer than 3 epochs?
[2] Is there any log other than one included in the paper?
Hi authors,
Thank you for your excellent work!
I just found a difference between Adam Epsilon that the paper state as 1e-6,
while the example scripts on this repo are set to default as 1e-8,
and all the instructions in the examples dir use the default value as 1e-8.
Do you have any instruction/recommendations for the value?
Thank you so much!
I hava read the paper and It is really cool
but I hava a problem, when it refers to table1, what does the std, mean and max mean? ie, the std of what? and the mean of what? ......
which of them is the lower the better and which is the higher the better? why?
thank you very much
Hi Authors,
Thanks for your impressive paper. I'm very interested in your implementation of the loss surfaces. I have checked the original loss surface paper Li et al., 2018. I was wondering why you set the axis to θf−θp and θs−θp in Figure 7.
In my understanding, you are using them as two directions instead of random directions. But why θf locate at θf−θp=1 and θs locate at θs−θp=1.
Could you explain more on this and hopefully share your code for generating this surface?
Also, in your paper, you said that there is a barrier between θf θs. However, it looks like there also exists a similar barrier between θf θp. If so, how θp gradually reach θf?
Looking forward to your reply.
Thanks for your great work. I was wondering can I simply use adamW from the transformer library and set bias_correction to True. Or should I use madamW from your repo?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.