ildoonet / pytorch-gradual-warmup-lr Goto Github PK
View Code? Open in Web Editor NEWGradually-Warmup Learning Rate Scheduler for PyTorch
License: MIT License
Gradually-Warmup Learning Rate Scheduler for PyTorch
License: MIT License
Hi, ildoonet!
Thanks for your code. But I found that your code does not work when the pytorch version <= 1.2.0.
In order not to cost time waste, I think it better specific the required pytorch version.
Hello!
Within each epoch, shouldn't we firstly set gradient of the optimizer as zero? I think we should use "optim.zero_grad()" in front of each loop, is it right?
File "train_mesh.py", line 266, in main scheduler_warmup.step() File "/home/liuziming/anaconda3/lib/python3.6/site-packages/warmup_scheduler/scheduler.py", line 39, in step return super(GradualWarmupScheduler, self).step(epoch) File "/home/liuziming/anaconda3/lib/python3.6/site-packages/torch/optim/lr_scheduler.py", line 52, in step for param_group, lr in zip(self.optimizer.param_groups, self.get_lr()): File "/home/liuziming/anaconda3/lib/python3.6/site-packages/warmup_scheduler/scheduler.py", line 30, in get_lr return self.after_scheduler.get_lr() AttributeError: 'ReduceLROnPlateau' object has no attribute 'get_lr'
when warmup epoches ends. i got this error. it use the examples as u shows. and run a simple resnet50 of torchvision
I don't find the initialization of base_lrs
? Does it initialize with 0?
When loading the GradualWarmupScheduler
from a state dict to resume a training, the optimizer
attribute of the nested after_scheduler
is loaded from the state_dict
. This causes a static learning rate after resuming a training, as the after_scheduler
tries to update the learning rate of an optimizer that doesn't match the one used by the resumed training. Setting self.after_scheduler.optimizer = self.optimizer
as a part of the load_state_dict()
method should probably suffice to fix this.
AttributeError: 'StepLR' object has no attribute 'get_last_lr'
This is not an issue per say, maybe a modification/extension
I feel there should be an argument to set the learning rate at epoch zero, then gradually increase it to the target learning rate over some number of epochs (=5 in the paper).
Let me know what you guys think
scheduler.step.is call after each batch or epoch?
optimizer = optim.SGD(model.parameters(), lr=lr, momentum=0.9, nesterov=True, weight_decay=0.0001) scheduler = torch.optim.lr_scheduler.CosineAnnealingLR(optimizer, max_epoch, eta_min=0, last_epoch=-1) scheduler_warmup = GradualWarmupScheduler(optimizer, multiplier=8, total_epoch=5, after_scheduler=scheduler)
AttributeError: 'CosineAnnealingLR' object has no attribute 'get_last_lr'
UserWarning: Detected call of lr_scheduler.step()
before optimizer.step()
. In PyTorch 1.1.0 and later, you should call them in the opposite order: optimizer.step()
before lr_scheduler.step()
. Failure to do this will result in PyTorch skipping the first value of the learning rate schedule.See more details at https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate
Is in the usage example required a mandatory metric param?
https://github.com/ildoonet/pytorch-gradual-warmup-lr#usage
While I modify the example code like this:
import torch
from torch.optim.lr_scheduler import StepLR, ExponentialLR
from torch.optim.sgd import SGD
from warmup_scheduler import GradualWarmupScheduler
if __name__ == '__main__':
model = [torch.nn.Parameter(torch.randn(2, 2, requires_grad=True))]
optim = SGD(model, 0.0001)
# scheduler_warmup is chained with schduler_steplr
scheduler_steplr = StepLR(optim, step_size=10, gamma=0.1)
scheduler_warmup = GradualWarmupScheduler(optim, multiplier=10, total_epoch=5, after_scheduler=scheduler_steplr)
# this zero gradient update is needed to avoid a warning message, issue #8.
optim.zero_grad()
optim.step()
for epoch in range(1, 20):
scheduler_warmup.step(epoch)
print(epoch, optim.param_groups[0]['lr'])
optim.step() # backward pass (update network)
I get an unexcepted result, the sixth epoch is strange
1 0.00028
2 0.00045999999999999996
3 0.00064
4 0.00082
5 0.001
6 0.0001
7 0.001
8 0.001
9 0.001
10 0.001
11 0.001
12 0.001
13 0.001
14 0.001
15 0.0001
16 0.0001
17 0.0001
18 0.0001
19 0.0001
If I read your code correctly, I think your learning rate computes based on number of epochs, not loops. According to the paper, it seems the learning rate should be computed based on number of loops.
I wonder whether you forgot to modify like the line shown below in:
+ warmup_lr = self.get_lr()
- warmup_lr = [base_lr * ((self.multiplier - 1.) * self.last_epoch / self.total_epoch + 1.) for base_lr in self.base_lrs]
Here is the details:
ReduceLROnPlateau
as the after_scheduler
of GradualWarmupScheduler
, the warm-up failed. The way I get the learning rate is: optim.param_groups[0]['lr']
. Then I use get_lr
to get the learning rate, I found it is correct.StepLR
as the after_scheduler
, I found there was no exception and no error.Therefor, I think the learning rate of the optimizer hadn't been warmed up correctly.
I want to use this linear warmup for gradually increasing learning rate from 0 to base_lr, which involves multiplier being 1.0.
However, this code enforce us using multiplier greater than 1.0.
Do we really need this restriction?
Thank you for sharing great works!
The initial LR value seems to be larger than I expected.
v = torch.zeros(10)
optim = torch.optim.SGD([v], lr=1e-2)
scheduler = GradualWarmupScheduler(optim, multiplier=8, total_epoch=10)
for epoch in range(1, 20):
scheduler.step(epoch)
print(epoch, optim.param_groups[0]['lr'])
1 0.017
2 0.024
3 0.031000000000000003
4 0.038
5 0.045
6 0.052000000000000005
7 0.059000000000000004
8 0.066
9 0.073
10 0.08
11 0.08
12 0.08
13 0.08
14 0.08
15 0.08
16 0.08
17 0.08
18 0.08
19 0.08
As you can see, the initial lr value (0.017) is higher than target lr value (0.01).
This result is right ??
Here
it should have the same special case from a few lines above:if self.multiplier == 1.0:
warmup_lr = [base_lr * (float(self.last_epoch) / self.total_epoch) for base_lr in self.base_lrs]
else:
warmup_lr = [base_lr * ((self.multiplier - 1.) * self.last_epoch / self.total_epoch + 1.) for base_lr in self.base_lrs]
Otherwise the calculation will always be:
warmup_lr = [base_lr * ((self.multiplier - 1.) * self.last_epoch / self.total_epoch + 1.) for base_lr in self.base_lrs]
# <=>
warmup_lr = [base_lr * (0 * self.last_epoch / self.total_epoch + 1.) for base_lr in self.base_lrs]
# <=>
warmup_lr = [base_lr * (1.) for base_lr in self.base_lrs]
# <=>
warmup_lr = [base_lr for base_lr in self.base_lrs]
# <=>
warmup_lr = self.base_lrs
Hello, thank you very much for your code, but I am currently encountering a minor issue. I encountered the following issue while using the Anconda prompt to install Python graphical warmup lr. git: WARNING: Did not find branch or tag '08f7d5e', assuming revision or ref. I am not sure if it was an address change or a module name change, and I hope you can help me clarify this
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.