GithubHelp home page GithubHelp logo

May I use CLR for Adam optimizer? about clr HOT 10 OPEN

bckenstler avatar bckenstler commented on August 15, 2024 4
May I use CLR for Adam optimizer?

from clr.

Comments (10)

christianversloot avatar christianversloot commented on August 15, 2024 14

I may be a bit late here, but I'll add my two cents for the sake of a hopefully valuable addition for future readers.

Besides the practical side of things - "I haven't had any issues with it", see above - I would also conceptually argue that usage is perfectly fine when using Adam.

At a very high level, Adam differs from classic SGD in the sense that it (1) performs local parameter updates (i.e., makes changes at the parameter level) rather than SGD which does this globally, and (2) that it performs some momentum-like optimization contrary to no momentum with classic SGD.

Now, cyclical learning rates do nothing but move the learning rate back and forth between a higher and a lower value with the goal of escaping saddle points and, by consequence of the design, local minima as well.

Does this violate the conceptual improvements of Adam over SGD? Not in my opinion. Local optimization still takes place with respect to the current loss (irrespective of future learning rates), and CLRs slow down momentum one time, while making it a bit faster the other time (depending on where in the cycle you are).

Perhaps, CLR does thus even extend the conceptual improvements of the Adam optimizer, making it even better.

Now, this should all be verified empirically and at scale, but let's hope this answers your question from a conceptual point of view as well. And maybe not yours, but the ones of others who find this issue in the future 😄

from clr.

mdhimes avatar mdhimes commented on August 15, 2024 2

Just a warning about using Adam with CLR: I wouldn’t do the LR range test with Adam, the momentum will throw off the results and not give the best max and base LR.

Can you comment more on this? Whenever I've done an LR range test with Adam, the results have been fairly consistent. I can understand why the determined max LR might not be the best due to the momentum, but I'm having a hard time seeing why the base LR would not be the best. From a plot of validation loss vs. learning rate, it has always been quite clear, and the result is consistent as long as that determined base LR is within the explored range (e.g., 1e-15 -- 1e-1 and 1e-5 -- 1e-3 finds the same base LR of, e.g., 1e-4). Perhaps you can share an example demonstrating that it doesn't find the best LR range?

from clr.

mdhimes avatar mdhimes commented on August 15, 2024 1

@robert-giaquinto these are good points... Someone should do a thorough investigation of it, I think it'd make for a good paper. I'm sure there is some way to do the LR range test with Adam.

One thing I want to mention is that the LR range test discussed in Smith (2015), where accuracy vs. LR is plotted to determine the LR range, isn't very telling when using Adam in my experience. However, using val loss vs. LR is usually quite clear, as you can see when things become unstable (loss will be constant for tiny LR, then start decreasing at some base LR, decreases smoothly until some slightly-above-max LR where it goes crazy). I haven't done nearly a thorough enough investigation to conclude that the LR range test with Adam works when done in this manner, but I have observed that it leads to results that are better than using a constant LR with Adam. YMMV.

from clr.

mdhimes avatar mdhimes commented on August 15, 2024

I use CLR with Adam. I haven't had any issues with it.

from clr.

ahmadmughees avatar ahmadmughees commented on August 15, 2024

@mdhimes which framework?

from clr.

mdhimes avatar mdhimes commented on August 15, 2024

@MugheesAhmad Keras/TensorFlow. It should also work with PyTorch, though I haven't implemented it there.

from clr.

robert-giaquinto avatar robert-giaquinto commented on August 15, 2024

Just a warning about using Adam with CLR: I wouldn’t do the LR range test with Adam, the momentum will throw off the results and not give the best max and base LR.

from clr.

christianversloot avatar christianversloot commented on August 15, 2024

Fair enough Robert. Any tips?

from clr.

robert-giaquinto avatar robert-giaquinto commented on August 15, 2024

@mdhimes That's a good point, there isn't any reason the base LR would differ too significantly when testing with Adam.

I've had bad results running my LR Range test with SGD, and then trying those learning rates with an Adam optimizer. In particular, I've had a LR of 0.001 work with plain Adam, and a SGD-based LR range test also conclude max_lr=0.001, but then have very unstable training with CLR + Adam using a max_lr=0.001.

I haven't seen this looked at rigorously in papers (only blogs posts doing a single run of CLR with Adam on one dataset, as opposed to Smith's work which focused on SGD + CLR and some forms of regularization: https://arxiv.org/pdf/1708.07120 https://arxiv.org/pdf/1803.09820 and https://arxiv.org/pdf/1506.01186), so I'm not sure if there is a consensus opinion on combining CLR and Adam. In the meantime, the simple solution may just be to use Adam during the range test if you're set on using Adam + CLR during training.

from clr.

robert-giaquinto avatar robert-giaquinto commented on August 15, 2024

I was also surprised accuracy is often shown in the LR range test plots. Accuracy isn’t a proper scoring rule, validation loss should be much more stable and informative.

from clr.

Related Issues (16)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.