Comments (10)
I may be a bit late here, but I'll add my two cents for the sake of a hopefully valuable addition for future readers.
Besides the practical side of things - "I haven't had any issues with it", see above - I would also conceptually argue that usage is perfectly fine when using Adam.
At a very high level, Adam differs from classic SGD in the sense that it (1) performs local parameter updates (i.e., makes changes at the parameter level) rather than SGD which does this globally, and (2) that it performs some momentum-like optimization contrary to no momentum with classic SGD.
Now, cyclical learning rates do nothing but move the learning rate back and forth between a higher and a lower value with the goal of escaping saddle points and, by consequence of the design, local minima as well.
Does this violate the conceptual improvements of Adam over SGD? Not in my opinion. Local optimization still takes place with respect to the current loss (irrespective of future learning rates), and CLRs slow down momentum one time, while making it a bit faster the other time (depending on where in the cycle you are).
Perhaps, CLR does thus even extend the conceptual improvements of the Adam optimizer, making it even better.
Now, this should all be verified empirically and at scale, but let's hope this answers your question from a conceptual point of view as well. And maybe not yours, but the ones of others who find this issue in the future 😄
from clr.
Just a warning about using Adam with CLR: I wouldn’t do the LR range test with Adam, the momentum will throw off the results and not give the best max and base LR.
Can you comment more on this? Whenever I've done an LR range test with Adam, the results have been fairly consistent. I can understand why the determined max LR might not be the best due to the momentum, but I'm having a hard time seeing why the base LR would not be the best. From a plot of validation loss vs. learning rate, it has always been quite clear, and the result is consistent as long as that determined base LR is within the explored range (e.g., 1e-15 -- 1e-1 and 1e-5 -- 1e-3 finds the same base LR of, e.g., 1e-4). Perhaps you can share an example demonstrating that it doesn't find the best LR range?
from clr.
@robert-giaquinto these are good points... Someone should do a thorough investigation of it, I think it'd make for a good paper. I'm sure there is some way to do the LR range test with Adam.
One thing I want to mention is that the LR range test discussed in Smith (2015), where accuracy vs. LR is plotted to determine the LR range, isn't very telling when using Adam in my experience. However, using val loss vs. LR is usually quite clear, as you can see when things become unstable (loss will be constant for tiny LR, then start decreasing at some base LR, decreases smoothly until some slightly-above-max LR where it goes crazy). I haven't done nearly a thorough enough investigation to conclude that the LR range test with Adam works when done in this manner, but I have observed that it leads to results that are better than using a constant LR with Adam. YMMV.
from clr.
I use CLR with Adam. I haven't had any issues with it.
from clr.
@mdhimes which framework?
from clr.
@MugheesAhmad Keras/TensorFlow. It should also work with PyTorch, though I haven't implemented it there.
from clr.
Just a warning about using Adam with CLR: I wouldn’t do the LR range test with Adam, the momentum will throw off the results and not give the best max and base LR.
from clr.
Fair enough Robert. Any tips?
from clr.
@mdhimes That's a good point, there isn't any reason the base LR would differ too significantly when testing with Adam.
I've had bad results running my LR Range test with SGD, and then trying those learning rates with an Adam optimizer. In particular, I've had a LR of 0.001 work with plain Adam, and a SGD-based LR range test also conclude max_lr=0.001, but then have very unstable training with CLR + Adam using a max_lr=0.001.
I haven't seen this looked at rigorously in papers (only blogs posts doing a single run of CLR with Adam on one dataset, as opposed to Smith's work which focused on SGD + CLR and some forms of regularization: https://arxiv.org/pdf/1708.07120 https://arxiv.org/pdf/1803.09820 and https://arxiv.org/pdf/1506.01186), so I'm not sure if there is a consensus opinion on combining CLR and Adam. In the meantime, the simple solution may just be to use Adam during the range test if you're set on using Adam + CLR during training.
from clr.
I was also surprised accuracy is often shown in the LR range test plots. Accuracy isn’t a proper scoring rule, validation loss should be much more stable and informative.
from clr.
Related Issues (16)
- PR to keras HOT 7
- Linking R implementation of CRL
- LR vs Accuracy HOT 1
- Strange Error
- Order of learning rate augmentation
- Clarification for step_size? HOT 1
- AttributeError: 'CyclicLR' object has no attribute 'on_train_batch_begin' HOT 1
- Possible error in the algorithm as shown on the README HOT 1
- Displaying Epochs instead of Iterations
- CLR callback for R's keras HOT 1
- Plotting range of Learning Rate
- On learning rate range test HOT 1
- FYI - Trapezoid schedule implementation is ready HOT 2
- Have you considered submitting this to Pypi? HOT 3
- How to reset the lr cycle
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from clr.