GithubHelp home page GithubHelp logo

mpyrozhok / adamwr Goto Github PK

View Code? Open in Web Editor NEW
143.0 4.0 24.0 19 KB

Implements https://arxiv.org/abs/1711.05101 AdamW optimizer, cosine learning rate scheduler and "Cyclical Learning Rates for Training Neural Networks" https://arxiv.org/abs/1506.01186 for PyTorch framework

License: MIT License

Python 100.00%
clr scheduler pytorch adamw adamw-optimizer restarts triangular optimizer cyclical-learning-rate cosine-annealing

adamwr's People

Contributors

mpyrozhok avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

adamwr's Issues

scheduler.batch_step() AttributeError: 'CosineLRWithRestarts' object has no attribute 'batch_increment'

Z:\sp2\nhdeblur_pytorch>python "train.py" 1>"train_log.txt"
Traceback (most recent call last):
File "train.py", line 140, in
train(train_gen=trainloader, model=model, criterion=criterion, optimizer=optimizer, epoch=epoch)
File "train.py", line 115, in train
scheduler.batch_step()
File "Z:\sp2\nhdeblur_pytorch\cosine_scheduler.py", line 110, in batch_step
t_cur = self.t_epoch + next(self.batch_increment)
AttributeError: 'CosineLRWithRestarts' object has no attribute 'batch_increment'

optimizer = adamw.AdamW(model.parameters(), lr=opt.lr, weight_decay=0)
scheduler = cosine_scheduler.CosineLRWithRestarts(optimizer, batch_size=opt.batch_size, epoch_size=len(src_set), restart_period=5, t_mult=1.2)

def train(train_gen, model, criterion, optimizer, epoch):
    epoch_loss = 0
    for iteration, batch in enumerate(train_gen, 1):
        nr = batch[0].to(device)
        hr = batch[1].to(device)
        
        optimizer.zero_grad()
        loss = criterion(model(nr), hr)
        epoch_loss += loss.item()
        loss.backward()
        optimizer.step()
        scheduler.batch_step()
    
        if iteration % 1000 == 0:
            print('===> Epoch[{e}]({it}/{dl}): Loss{l:.4f};'.format(e=epoch, it=iteration, dl=len(train_gen), l=loss.cpu()))
            
    Current_time = time.strftime('%Y-%m-%d %H:%M:%S', time.localtime())
    epoch_loss_average = epoch_loss / len(train_gen)
    print('===> {ct} Epoch {e} Complete: Avg Loss: {avg_loss:.4f}, Sum Loss: {sum_loss:.4f}'
          .format(e=epoch, avg_loss=epoch_loss_average, sum_loss=epoch_loss, ct=Current_time))

LR Scheduler help

Can you please help me write my own learning rate scheduler? I mean I couldn't find much docs on how to write one in Pytorch. I went through this mxnet guide, and came to the conclusion that if I do the following:

lrs = [scheduler(i+1) for i in range(epochs*batch_size)]
iters = 1
for i in range(epochs):
	for data,label in train:
		... # backward and calculate loss
		for group in optimizer.param_groups:
			group['lr'] = lrs[iters]
		optimizer.step()
		iters+=1

What is the more elegant way of doing it?

Persisting CosineAnnealingLRWithRestarts

Hi there,

Up to now all my scheduler inherited from _LRScheduler and so I didn't need to care too much about how it would be persisted.

For my checkpoints I define my state like this

 state = {
                                "model_state": model.state_dict(),
                                "optimizer_state": optimizer.state_dict(),
                                "scheduler_state": scheduler.state_dict(),
                            }

However with CosineAnnealingLRWithRestarts, I don't have this method state_dict().

I checked in the documentation the implementation of the state_dict()
https://pytorch.org/docs/stable/_modules/torch/optim/lr_scheduler.html#LambdaLR

and tried to extend your code myself, however I probably missed something.
Could you take a look?

Diffs are:

I inherit the class from _LRScheduler:

from torch.optim.lr_scheduler import _LRScheduler

class CosineAnnealingLRWithRestarts(_LRScheduler):

And rewrite the state_dict()



    def state_dict(self):
        """Returns the state of the scheduler as a :class:`dict`.

        It contains an entry for every variable in self.__dict__ which
        is not the optimizer.
        The learning rate lambda functions will only be saved if they are callable objects
        and not if they are functions or lambdas.
        """
        state_dict = {key: value for key, value in self.__dict__.items() if key not in ('optimizer', 'base_lrs', 'base_weight_decays')}
        state_dict['base_lrs'] = [None] * len(self.base_lrs)
        state_dict['base_weight_decays'] = [None] * len(self.base_weight_decays)

        for idx, fn in enumerate(self.base_weight_decays):
            if not isinstance(fn, types.FunctionType):
                # state_dict['base_weight_decays'][idx] = fn.__dict__.copy()
                state_dict['base_weight_decays'][idx] = fn

        for idx, fn in enumerate(self.base_lrs):
            if not isinstance(fn, types.FunctionType):
                # state_dict['base_lrs'][idx] = fn.__dict__.copy()
                state_dict['base_lrs'][idx] = fn


        return state_dict

    def load_state_dict(self, state_dict):
        """Loads the schedulers state.

        Arguments:
            state_dict (dict): scheduler state. Should be an object returned
                from a call to :meth:`state_dict`.
        """
        base_lrs = state_dict.pop('base_lrs')
        base_weight_decays = state_dict.pop('base_weight_decays')

        self.__dict__.update(state_dict)

        for idx, fn in enumerate(base_lrs):
            if fn is not None:
                self.base_lrs[idx] = fn        

        for idx, fn in enumerate(base_weight_decays):
            if fn is not None:
                self.base_weight_decays[idx] = fn

However I still get AttributeError: Can't pickle local object 'Tensor.__iter__.<locals>.<lambda>'
It would be terrific to be able to persist the state of this Scheduler :-)

Getting Stop Iteration when running for training


StopIteration Traceback (most recent call last)
in ()
1 training(model=model, epoch=20, eval_every=500,
2 loss_func=loss_function, optimizer=optimizer, train_iter=train_iter,
----> 3 val_iter=val_iter, scheduler=scheduler, warmup_epoch=3, early_stop=2)

in training(epoch, model, eval_every, loss_func, optimizer, train_iter, val_iter, scheduler, early_stop, warmup_epoch)
37 loss.backward()
38 optimizer.step()
---> 39 scheduler.batch_step()
40 if step % eval_every == 0:
41 model.eval()

in batch_step(self)
274
275 def batch_step(self):
--> 276 t_cur = self.t_epoch + next(self.batch_increment)
277 for param_group, (lr, weight_decay) in zip(self.optimizer.param_groups,
278 self.get_lr(t_cur)):

StopIteration:

StopIteration

Hi, thank you for your share. Following your description, I try to use your code in my project, but I got the error in 'scheduler.batch_step()', this happened on this line 't_cur = self.t_epoch + next(self.batch_increment)'

Lower/Upper Bound for LR and Upper Bound decay

Hey there,

Nice update of the scheduler! It's really usefull!

Also nice would be to have the possibility to set following parameters: base_lr, max_lr and scale_fn

The scale_fn would be a function that decreases the max_lr:

  • by half after each period, while keeping the base lr constant.
  • scales max_lr by a factor gamma**(iterations)
  • or whatever lambda_function is given

Here an example implementation in Keras: https://github.com/bckenstler/CLR

I tried to hack this myself but I'm stucked. I'm not entirely sure which eta you use. (is this the one from weight decay?) And even if i'm right, I can't persist my hack because of the lambda function -.-

And also I'm not sure why, but in my case (Superresolution), when using cosine/arccosine my model diverges each times after restarting. (AdamW, wd=1e-6)
It happens with triangular too but not directly at the start of the second cycle.
Do you maybe have an idea where it could come from?

Thanks for your time!

Hypergradient Descent

Thank you for sharing this. Would it be possible if you can also integrate Hypergradient Descent technique into your AdamW implementation? It reduces the necessity of hypertuning the initial learning rate. https://github.com/gbaydin/hypergradient-descent

                if state['step'] > 1:
                    prev_bias_correction1 = 1 - beta1 ** (state['step'] - 1)
                    prev_bias_correction2 = 1 - beta2 ** (state['step'] - 1)
                    # Hypergradient for Adam:
                    h = torch.dot(grad.view(-1), torch.div(exp_avg, exp_avg_sq.sqrt().add_(group['eps'])).view(-1)) * math.sqrt(prev_bias_correction2) / prev_bias_correction1
                    # Hypergradient descent of the learning rate:
                    group['lr'] += group['hypergrad_lr'] * h

I have also read lots of criticism about AmsGrad and haven't been able to yet get any improvement with that variant. Can I please learn your thoughts about that? FYI, two other techniques that I am currently experimenting with are Padam and QHAdam.

Add License

Could you add a license to this project so that people can copy, modify, and redistribute? Thanks!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.