GithubHelp home page GithubHelp logo

Comments (6)

dilinwang820 avatar dilinwang820 commented on August 15, 2024

Hi @liwei9719,

Thanks for your interest in this repo. Does this happen all the time? Would you be able to load the previous checkpoint and resume the training with expected performance? From my experiences, the training is largely stable if the gradients are clipped properly (e.g., set clip_grad_val to 1.0)

Regarding the negative training loss, I must apologize for the confusion here. The reason is that we don't use the actual alpha-divergence for logging. Instead we first compute the gradient we need, and then constructing a surrogate loss to produce this gradient. The surrogate loss might be negative. Thanks.

from alphanet.

liwei109 avatar liwei109 commented on August 15, 2024

I only trained once because of the huge cost.I have set clip_grad_val to 1.0. Does this parameter setting cause the above phenomenon?

from alphanet.

dilinwang820 avatar dilinwang820 commented on August 15, 2024

Maybe try to resume from a previously saved checkpoint with a different random seed?

from alphanet.

jun-fang avatar jun-fang commented on August 15, 2024

Hi @liwei9719 @dilinwang820,

I met the same problem at around 60 epoch, and I tried multiple times to resume from a saved checkpoint with another random seed but it did not solve the issue.

@liwei9719 did you find a solution for this?

@dilinwang820 do you have an idea why this happens and what could be a good way to avoid this phenomenon?

Thanks!

from alphanet.

dilinwang820 avatar dilinwang820 commented on August 15, 2024

@jun-fang to my experience, the training is always stable with the default settings; maybe try to warm up a little bit with less regularization and data augmentation, and then resume with default settings?

from alphanet.

pprp avatar pprp commented on August 15, 2024

I met the same problem..

image

from alphanet.

Related Issues (12)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.