Comments (6)
Hi @liwei9719,
Thanks for your interest in this repo. Does this happen all the time? Would you be able to load the previous checkpoint and resume the training with expected performance? From my experiences, the training is largely stable if the gradients are clipped properly (e.g., set clip_grad_val to 1.0)
Regarding the negative training loss, I must apologize for the confusion here. The reason is that we don't use the actual alpha-divergence for logging. Instead we first compute the gradient we need, and then constructing a surrogate loss to produce this gradient. The surrogate loss might be negative. Thanks.
from alphanet.
I only trained once because of the huge cost.I have set clip_grad_val to 1.0. Does this parameter setting cause the above phenomenon?
from alphanet.
Maybe try to resume from a previously saved checkpoint with a different random seed?
from alphanet.
Hi @liwei9719 @dilinwang820,
I met the same problem at around 60 epoch, and I tried multiple times to resume from a saved checkpoint with another random seed but it did not solve the issue.
@liwei9719 did you find a solution for this?
@dilinwang820 do you have an idea why this happens and what could be a good way to avoid this phenomenon?
Thanks!
from alphanet.
@jun-fang to my experience, the training is always stable with the default settings; maybe try to warm up a little bit with less regularization and data augmentation, and then resume with default settings?
from alphanet.
I met the same problem..
from alphanet.
Related Issues (12)
- is AlphaNet a0 ~ a6 exactly same as the a0 ~ a6 in attentative NAS? HOT 1
- How to modify the loss function to apply to multi-label classification tasks HOT 6
- The problem of increasing memory usage and learning rate HOT 3
- there are some files missing or I can't find them HOT 1
- How were the final architectures selected? HOT 4
- Why use the training dataset in the test stage? HOT 3
- How can I preserve the search architecture? HOT 6
- evolutionary search in a single gpu HOT 4
- The AdaptiveLossSoft become NAN HOT 4
- Can Adaptive-KD use with additional attentive sampling at the same time ? HOT 3
- Re-training code is available? HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from alphanet.