davidtvs / pytorch-lr-finder Goto Github PK

View Code? Open in Web Editor NEW

912.0 912.0 118.0 408 KB

A learning rate range test implementation in PyTorch

License: MIT License

Python 100.00%

learning-rate pytorch

pytorch-lr-finder's People

Stargazers

Watchers

Forkers

kuzand beitadoge naleraphael rosstntt nischalhp sixitingting yaojunr zehaoy savourylie icesohelrana rishav09 tahsin314 avi777 geochri jaideepmurkute eric-le-12 jschubnell michelml fenglian425 dwright37 sailfish009 marrrcin i-pan jegernoutt avangelizer mpaepper phymucs avinregmi ankitsainidev glmanhtu pabloppp basameera personx000 haisu7025 guker lliai joanna-janos shashi29 gadirajusanjayvarma soorajkc ronva-h hemanth346 bikash-bhoi mmaruthi linhduongtuan pgsrv karthikeyanmurugesan alexgrig yongduek zymale dummy-action chawater jahnaviramagiri chaitalideb jinac exp-deeplearning-tools tyunist saralatif99 konstantinklepikov zihua nikshrimali angelvillar96 e-yi akasan crystal-dragon-liu ezhirko tylerkirby hoainamken vmbbc lightcome codingfarmers tikquuss rifqiahmadf rijobro lv-tuan indigoviolet boxiangliu toquochungg ashok-arjun viotemp1 comeonlgq sonata165 answerisy i-amgeek ziaf xiangn95 cydal techtoker dylandragon tks1998 mksifakis sjtujx rivertre emungai augeremed gilmartinspinheiro jc138691 hoverinc yabosu lwdebug

pytorch-lr-finder's Issues

Support pytorch's native amp

Hey hey 👋

PyTorch now has it's native amp module for a while now

https://pytorch.org/docs/stable/amp.html

it would be great to move to that, or at least prefer using it if it's available.

Multiple Input Support

Hey,
is there an elegant way to use a multiple input model with lr_finder?
Given forward looks like this:

def forward(self, x, x_embds):
    ...

and my DataLoader looks like this:

train_loader = DataLoader(TensorDataset(xtrain, xtrain_emb, ytrain), batch_size=BATCHSIZE, shuffle=True)

I want to separate numerical inputs from variables that will be used as embeddings in the model. Therefore the DataLoader yields the numerical variables (xtrain), the variable to be embedded (xtrain_emb) separately, and of course the labels y.
In this case, my lr_finder called like this:

lrf = LRFinder(net, optim, criterion)
lrf.range_test(train_loader, val_loader, start_lr=0.00001, end_lr=1)
lrf.plot()
lrf.reset()

gives this stack trace because it does not pass the "additional" component from the DataLoader (xtrain_emb) to the forward method:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
~/TestProject/pytorch/torch_net.py in 
      200 if 1:
      201     lrf = LRFinder(net, optim, criterion)
----> 202     lrf.range_test(train_loader, val_loader, start_lr=0.00001, end_lr=1)
      203     lrf.plot()
      204     lrf.reset()

~/miniconda3/envs/py38/lib/python3.8/site-packages/torch_lr_finder/lr_finder.py in range_test(self, train_loader, val_loader, start_lr, end_lr, num_iter, step_mode, smooth_f, diverge_th, accumulation_steps, non_blocking_transfer)
    315         for iteration in tqdm(range(num_iter)):
    316             # Train on batch and retrieve loss
--> 317             loss = self._train_batch(
    318                 train_iter,
    319                 accumulation_steps,

~/miniconda3/envs/py38/lib/python3.8/site-packages/torch_lr_finder/lr_finder.py in _train_batch(self, train_iter, accumulation_steps, non_blocking_transfer)
    375 
    376             # Forward pass
--> 377             outputs = self.model(inputs)
    378             loss = self.criterion(outputs, labels)
    379 

~/miniconda3/envs/py38/lib/python3.8/site-packages/torch/nn/modules/module.py in __call__(self, *input, **kwargs)
    530             result = self._slow_forward(*input, **kwargs)
    531         else:
--> 532             result = self.forward(*input, **kwargs)
    533         for hook in self._forward_hooks.values():
    534             hook_result = hook(self, input, result)

TypeError: forward() missing 1 required positional argument: 'x_embds'

Issue #18 mentions a scenario in which the DataLoader yields additional outputs, but it seems like the additional inputs are still not used. As a workaround I might be able to pass a single design matrix X through the loader and forward and then, within the forward method, extract the column to be embedded. This seems like a not-so-nice workaround though.

Suggested LR not returned when min_grad_idx is 0 in plot()

When using the plot function in a situation like the following:

where the first value in lrs[min_grad_idx] is the suggested learning rate, even if the suggested learning rate is printed, it is not returned.

Expected behavior: return ax, lrs[min_grad_idx]
Observed behavior: return ax

These seem to be the relevant lines. Seems that ax is returned due to min_grad_idx evaluating to False (because it is 0):

pytorch-lr-finder/torch_lr_finder/lr_finder.py

Lines 535 to 538 in 9cfcbec

 if suggest_lr and min_grad_idx: 

 return ax, lrs[min_grad_idx] 

 else: 

 return ax

Plot not showing up

I'm seeing a suggested learning rate, but no plot when calling .plot() with all default arguments.

New release where .plot accepts ac

Hey,

I really enjoy using this handy package. Could you please make a new pip installable version where the plot method accepts the ax argument, this would be super helpful!

Thanks and best regards,

Fabio

Cannot determine `batch_size` from a list of string while running `range_test()` with `val_loader`

Hey @davidtvs, this issue is found while I was writing an example for utilizing this package with huggingface/transformers for #55 .

Condition

Input data: list of string (Dataset returns string)
Running range_test() with val_loader

Error message

---> 10 lr_finder.range_test(train_loader, val_loader=valid_loader, start_lr=1e-5, end_lr=10, num_iter=100, step_mode='linear')

1 frames

/usr/local/lib/python3.6/dist-packages/torch_lr_finder/lr_finder.py in range_test(self, train_loader, val_loader, start_lr, end_lr, num_iter, step_mode, smooth_f, diverge_th, accumulation_steps, non_blocking_transfer)
    288             if val_loader:
    289                 loss = self._validate(
--> 290                     val_iter, non_blocking_transfer=non_blocking_transfer
    291                 )
    292 

/usr/local/lib/python3.6/dist-packages/torch_lr_finder/lr_finder.py in _validate(self, val_iter, non_blocking_transfer)
    398 
    399                 if isinstance(inputs, tuple) or isinstance(inputs, list):
--> 400                     batch_size = inputs[0].size(0)
    401                 else:
    402                     batch_size = inputs.size(0)

AttributeError: 'str' object has no attribute 'size'

Description

In current implementation, batch_size is determined dynamically according to the shape of inputs in LRFinder._validate(). (v0.2.0) L399-L402 will work normally only when given inputs is a torch.tensor. And that's why it failed when inputs is a list of string.

Maybe it's not a usual case that Dataset returns non-torch.tensor values, but I think it would be more easier to access it from DataLoader.batch_size since it's going to iterate a val_loader in LRFinder._validate().

Hence that I proposed a fix for this in that notebook, it's simply add a line batch_size = val_iter.data_loader.batch_size before entering the loop and remove those if-else statement, you can check it out here.

But I'm having doubts about adding a property batch_size in DataLoaderIter, e.g.

class DataLoaderIter(object):
    # ...
    @property
    def batch_size(self):
        return self.data_loader.batch_size

With this property, proposed fix can be simplified a little into this:

class LRFinder(object):
    def _validate(self, val_iter, non_blocking_transfer=True):
        # Set model to evaluation mode and disable gradient computation
        running_loss = 0
        self.model.eval()

        with torch.no_grad():
            for inputs, labels in val_iter:
                # Move data to the correct device
                inputs, labels = self._move_to_device(
                    inputs, labels, non_blocking=non_blocking_transfer
                )

                # Forward pass and loss computation
                outputs = self.model(inputs)
                loss = self.criterion(outputs, labels)
                running_loss += loss.item() * val_iter.batch_size

        return running_loss / len(val_iter.dataset)

What do you think of it?

Validation loader flat loss

I copied your example notebook to colab and ran the code without changing anything. But the validation loss I get goes flat, which is clearly a mistake when compared to your example. I also experienced this with my other networks which do the same, the loss just goes flat.

You can see my results from colab and your example in the figures below.

EDIT: If I replace val_iter with val_loader inside loss = self._validate(...) it does seem to "work" as I'd expect. So somewhere there seems to be a mistake in how the val_iter is iterated.

Colab	Your Example notebook

Is apex a must have or not?

In your README file apex seems to be an optinal requirement.

However, during import time an annoying message warning me that I don't have this module installed keeps poluting my code log.

I tried to reinstall the package using the recommended command and nothing changed at all. I also tried to use warning library to ignore the warning but the way it is implemented is using the Python logging library and I can't remove it this way.

Help with lr-finder working with transformers?

I am in need of a tool like this for a particular problem that is very sensitive to the LR. I am, however, unable to get this package to work with any transformer model unfortunately.

My error is as below and I am wondering if you have any insight!

from torch_lr_finder import LRFinder
import torch.optim as optim
from transformers import XLMRobertaTokenizer, XLMRobertaForSequenceClassification
model = XLMRobertaForSequenceClassification.from_pretrained("xlm-roberta-base", num_labels=3).cuda()
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=1e-7, weight_decay=1e-2)
lr_finder = LRFinder(model, optimizer, criterion, device="cuda")
lr_finder.range_test(train_dataloader, val_loader=valid_dataloader, end_lr=1, num_iter=100, step_mode="linear")

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-8-decc9b6c423b> in <module>
----> 1 lr_finder.range_test(train_dataloader, val_loader=valid_dataloader, end_lr=1, num_iter=100, step_mode="linear")

~\Anaconda3\envs\my_ml\lib\site-packages\torch_lr_finder\lr_finder.py in range_test(self, train_loader, val_loader, start_lr, end_lr, num_iter, step_mode, smooth_f, diverge_th, accumulation_steps, non_blocking_transfer)
    284                 train_iter,
    285                 accumulation_steps,
--> 286                 non_blocking_transfer=non_blocking_transfer,
    287             )
    288             if val_loader:

~\Anaconda3\envs\my_ml\lib\site-packages\torch_lr_finder\lr_finder.py in _train_batch(self, train_iter, accumulation_steps, non_blocking_transfer)
    342             # Forward pass
    343             outputs = self.model(inputs)
--> 344             loss = self.criterion(outputs, labels)
    345 
    346             # Loss should be averaged in each step

~\Anaconda3\envs\my_ml\lib\site-packages\torch\nn\modules\module.py in _call_impl(self, *input, **kwargs)
    724             result = self._slow_forward(*input, **kwargs)
    725         else:
--> 726             result = self.forward(*input, **kwargs)
    727         for hook in itertools.chain(
    728                 _global_forward_hooks.values(),

~\Anaconda3\envs\my_ml\lib\site-packages\torch\nn\modules\loss.py in forward(self, input, target)
    946     def forward(self, input: Tensor, target: Tensor) -> Tensor:
    947         return F.cross_entropy(input, target, weight=self.weight,
--> 948                                ignore_index=self.ignore_index, reduction=self.reduction)
    949 
    950 

~\Anaconda3\envs\my_ml\lib\site-packages\torch\nn\functional.py in cross_entropy(input, target, weight, size_average, ignore_index, reduce, reduction)
   2420     if size_average is not None or reduce is not None:
   2421         reduction = _Reduction.legacy_get_string(size_average, reduce)
-> 2422     return nll_loss(log_softmax(input, 1), target, weight, None, ignore_index, None, reduction)
   2423 
   2424 

~\Anaconda3\envs\my_ml\lib\site-packages\torch\nn\functional.py in log_softmax(input, dim, _stacklevel, dtype)
   1589         dim = _get_softmax_dim('log_softmax', input.dim(), _stacklevel)
   1590     if dtype is None:
-> 1591         ret = input.log_softmax(dim)
   1592     else:
   1593         ret = input.log_softmax(dim, dtype=dtype)

AttributeError: 'tuple' object has no attribute 'log_softmax'

Should you add compatibility with unusual models like gans for example

In general, I need some wrappers about how batch prcessed and data flow to model.

device issue says its bool not torch.deivce but i have printed the device as well it is torch.device("cuda:0")

Traceback (most recent call last):
File "example_copy.py", line 31, in
lr_finder = LRFinder(model, optimizer, criterion)
File "/home/snehaverma/anaconda3/envs/gmesh/lib/python3.6/site-packages/torch_lr_finder/lr_finder.py", line 166, in init
self.state_cacher.store("model", self.model.state_dict())
File "/home/snehaverma/anaconda3/envs/gmesh/lib/python3.6/site-packages/torch_lr_finder/lr_finder.py", line 624, in store
self.cached.update({key: copy.deepcopy(state_dict)})
File "/home/snehaverma/anaconda3/envs/gmesh/lib/python3.6/copy.py", line 180, in deepcopy
y = _reconstruct(x, memo, *rv)
File "/home/snehaverma/anaconda3/envs/gmesh/lib/python3.6/copy.py", line 306, in _reconstruct
value = deepcopy(value, memo)
File "/home/snehaverma/anaconda3/envs/gmesh/lib/python3.6/copy.py", line 161, in deepcopy
y = copier(memo)
File "/home/snehaverma/anaconda3/envs/gmesh/lib/python3.6/site-packages/torch/nn/parameter.py", line 32, in deepcopy
result = type(self)(self.data.clone(memory_format=torch.preserve_format), self.requires_grad)
File "/home/snehaverma/anaconda3/envs/gmesh/lib/python3.6/site-packages/torch/nn/parameter.py", line 153, in new
data = torch.tensor([], **factory_kwargs)
TypeError: tensor(): argument 'device' must be torch.device, not bool

How to get the best learning without inspect plot ?

Flat loss

Hey guys,
I'm trying to find the optimal range of learning rates for Facial Expression classification but the problem is that I'm getting a flat LR from 1e-6 to 1e-2 and then it just shoots up and diverges. The flat curve occurs at loss of 1.4, which doesn't seem too low. So does it mean there's a problem with my dataset to it's due to something else?
Thanks

Steepest gradiant value

i want to be able to pull the value of steepest gradiant and use it in my code as a integer value.
i see in the code it is under lr_finder.plot() but i am not abke to just assign min_grad to anything
can you please help me

Lr-finder with multiple inputs, outputs and losses

Hello,

Firstly, thank you for this wonderful library.
I have a model which expects 2 inputs. I am working with 2 kinds of images, one of size (512, 1536) and the other of size (128, 384). Therefore, my train_loader contains 2 inputs and one target of shape (128, 384, 16). My model has 4 prediction heads and hence is trained using 4 losses for different purposes.

So my collate_fn for the data loader looks like this:

def detection_collate(batch):
    """Custom collate fn for dealing with batches of images that have a different
    number of associated object annotations (bounding boxes).
    Arguments:
        batch: (tuple) A tuple of tensor images and lists of annotations
    Return:
        A tuple containing:
            1) (tensor) batch of images stacked on their 0 dim
            2) (list of tensors) annotations for a given image are stacked on
                                 0 dim
    """
    targets = []
    imgs = []
    deps = []
    for sample in batch:
        imgs.append(sample[0])
        deps.append(sample[1])
        targets.append(sample[2])
    return torch.stack(imgs, 0), torch.stack(deps, 0), torch.stack(targets, 0)

As mentioned, there are 4 different losses: Custom Heatmap (Focal) loss, SmoothL1, SmoothL1, BCE loss.

The forward method of the model expects 2 inputs. A small snippet is shown below:

 def forward(self, x, dep=None, target=None):
        # Backbone: ResNet18, x is image size: (512, 1536)

Here, targets are the labels so to say.

In this case, how do I go about finding the best learning rate using lr-finder?
Notably, I can only use batch_size=2 because of the computational limitations.

ValueError: too many values to unpack (expected 2)

Code:
lr_finder.range_test(dataloader, end_lr=100, num_iter=100)
Response:
ValueError: too many values to unpack (expected 2)

Latest Pytorch and data loader

LR Finder for RNN network

Hi,

I want to find lr for my RNN network which will have sequence as an input. When i try with torch_lr_finder it throws error as

File "/usr/local/lib/python3.5/dist-packages/torch_lr_finder/lr_finder.py", line 125, in range_test
    inputs, labels = next(iterator)
ValueError: too many values to unpack (expected 2)

Can you help me to get over this error
@davidtvs

ValueError ValDataLoaderIter next() call missing

In the _validate function, you try to iterate through the elements of ValDataLoaderIter with a simple for loop:

pytorch-lr-finder/torch_lr_finder/lr_finder.py

Line 427 in acc5e7e

for inputs, labels in val_iter:

This throws a 'ValueError: too many values to unpack' because it is attempting to unpack the entire dataloader which has way more then two elements. I think what you want here is just the next element in val_iter, similarly to your _train_batch function, so a loop over the length of val_iter with a call to next(val_iter) every loop or an enumerate(val_iter), no?

plot not showing anything

I tried to use the package but when plotting the learning curve it doesn't show anything, just a plot with the labels but not curve

ERROR: torchvision 0.4.2 has requirement torch==1.3.1, but you'll have torch 1.2.0 which is incompatible.

Executing in Google Colab the command,

!pip install https://download.pytorch.org/whl/cu100/torch-1.2.0-cp36-cp36m-manylinux1_x86_64.whl && pip install https://download.pytorch.org/whl/cu100/torchvision-0.4.0-cp36-cp36m-manylinux1_x86_64.whl

found in https://colab.research.google.com/drive/1BhWYtLFOa24wisNckt9i6rQhBKurVWWV

gives the error

ERROR: torchvision 0.4.2 has requirement torch==1.3.1, but you'll have torch 1.2.0 which is incompatible..

By the way, executing !cat /usr/local/cuda/version.txt gives
CUDA Version 10.0.130

Please advise.

Thanks,

Vassilis

TypeError: DataLoaderIterWrapper object is not an iterator ?

class DataLoaderIterWrapper(object):
     def __init__(self, data_loader, auto_reset=True):
        self.data_loader = data_loader
        self.auto_reset = auto_reset
        self._iterator = iter(data_loader)

    def __next__(self):
        # Get a new set of inputs and labels
        try:
            # inputs, labels, *_ = next(self._iterator)
            inputs, labels = next(self._iterator)
        except StopIteration:
            if not self.auto_reset:
                raise
            self._iterator = iter(self.data_loader)
            # inputs, labels, *_ = next(self._iterator)
            inputs, labels = next(self._iterator)

        return inputs, labels

Make this work for TripletMarginLoss

Can we make this work for nn.TripletMarginLoss?

where the dataset object returns query, positive_image, negative_image which are passed to the model one-by-one and the three resultant embeddings are passed to the loss function (https://medium.com/@akarshzingade/image-similarity-using-deep-ranking-c1bd83855978).

Can we plot the learning rate vs accuracy and get LR at max accuracy using your library - need for SuperConvergence

Hi David @davidtvs,

Is there a way we can plot the learning rate vs accuracy and get LR at max accuracy using your library?

I am trying to use SuperConvergence(https://arxiv.org/pdf/1708.07120.pdf) by Leslie N. Smith. So I am using PyTorch's OneCycleLR scheduler for this. And it is expecting max_lr value.

I used your lr-finder but it is plotting between loss curve and learning rates and suggesting LR at steepest descent. But I am looking for learning rate vs accuracy and get LR at maximum accuracy.

Please suggest to me.

Thanks in advance,
Naga Pavan

How to use in the command, not notebook?

Thanks for grate tool!

I want to find best lr or save the fig.

I will run this wonderful library at the python command, not jupyter notebook.

how can I save the fig or find the best lr?

thanks.

LR Finder doesn't restore original model weights?

Hey! I love this repo, thanks for making it 💯

Everything works well except for one thing, after some digging around/experimenting, here's what I've found:

Below are some figures for the training loss and training accuracy (on MNIST, using a resnet18).

Problem:

Using LRFinder on a model, and then training with it afterwards appears to hurt the models learning (see pink curve below).

Solution:

Using LRFinder on a model, and manually restoring the weights, appears to train the model optimally. (see green curve below).
Using LRFinder on a clone of the model, and then using the original model for training, appears to train the model optimally. (see green curve below).

Regarding the figure/graphs below, both models used the same hyperparameters.

An in-code example of option 1) would be similar to what was given in the README.md:

from torch_lr_finder import LRFinder

model = ...
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=1e-7, weight_decay=1e-2)
lr_finder = LRFinder(model, optimizer, criterion, device="cuda")
lr_finder.range_test(trainloader, end_lr=100, num_iter=100)
lr_finder.plot()

// Then use "model" for training

An in-code example of option 3) would be:

from torch_lr_finder import LRFinder

model = ...
temp_model = *create model with same architecture*
// copy weights over
temp_model.load_state_dict(model.state_dict)
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=1e-7, weight_decay=1e-2)
// use temp model in lr_finder
lr_finder = LRFinder(temp_model, optimizer, criterion, device="cuda")
lr_finder.range_test(trainloader, end_lr=100, num_iter=100)
lr_finder.plot()

I'm getting a blank graph

I running a semantic segmentation model, Deeplabv3+ with a modified CrossEntropyLoss and either SGD or Adam optimizer.
When I run the LRFinder, I get a blank graph. No losses seen. Even though I printed the losses and the criterion is def returning valid values.

Sweeping across start_lr = 1e-07 and end_lr = 0.0001
  0%|                                                                                                                          | 0/10 [00:00<?, ?it/s]
loss:  tensor(89984., device='cuda:0', grad_fn=<DivBackward0>)
 10%|███████████▍                                                                                                      | 1/10 [00:06<00:54,  6.01s/it]
loss:  tensor(1588043.6250, device='cuda:0', grad_fn=<DivBackward0>)
 20%|██████████████████████▊                                                                                           | 2/10 [00:09<00:40,  5.12s/it]
loss:  tensor(420687.0938, device='cuda:0', grad_fn=<DivBackward0>)
 30%|██████████████████████████████████▏                                                                               | 3/10 [00:12<00:31,  4.50s/it]
loss:  tensor(653955.4375, device='cuda:0', grad_fn=<DivBackward0>)
 40%|█████████████████████████████████████████████▌                                                                    | 4/10 [00:15<00:24,  4.07s/it]
loss:  tensor(141592.6875, device='cuda:0', grad_fn=<DivBackward0>)
 50%|█████████████████████████████████████████████████████████                                                         | 5/10 [00:18<00:18,  3.76s/it]
loss:  tensor(97450.2891, device='cuda:0', grad_fn=<DivBackward0>)
 60%|████████████████████████████████████████████████████████████████████▍                                             | 6/10 [00:21<00:14,  3.55s/it]
loss:  tensor(160497.9375, device='cuda:0', grad_fn=<DivBackward0>)
 70%|███████████████████████████████████████████████████████████████████████████████▊                                  | 7/10 [00:24<00:10,  3.44s/it]
loss:  tensor(151121.3594, device='cuda:0', grad_fn=<DivBackward0>)
 80%|███████████████████████████████████████████████████████████████████████████████████████████▏                      | 8/10 [00:27<00:06,  3.38s/it]
loss:  tensor(123211.6484, device='cuda:0', grad_fn=<DivBackward0>)
 90%|██████████████████████████████████████████████████████████████████████████████████████████████████████▌           | 9/10 [00:31<00:03,  3.40s/it]
loss:  tensor(98576.7578, device='cuda:0', grad_fn=<DivBackward0>)
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 10/10 [00:34<00:00,  3.43s/it]
Learning rate search finished. See the graph with {finder_name}.plot()

Lemme know what other details I can attach.

My criterion:

def cross_entropy2d(logit, target, ignore_index=255, weight=None, batch_average=True):
    """
    The loss is

    .. math::
        \sum_{i=1}^{\\infty} x_{i}

        `(minibatch, C, d_1, d_2, ..., d_K)`

    Args:
        logit (Tensor): Output of network
        target (Tensor): Ground Truth
        ignore_index (int, optional): Defaults to 255. The pixels with this labels do not contribute to loss
        weight (List, optional): Defaults to None. Weight assigned to each class
        batch_average (bool, optional): Defaults to True. Whether to consider the loss of each element in the batch.

    Returns:
        Float: The value of loss.
    """

    n, c, h, w = logit.shape
    target = target.squeeze(1)

    if weight is None:
        criterion = nn.CrossEntropyLoss(weight=weight, ignore_index=ignore_index, reduction='sum')
    else:
        criterion = nn.CrossEntropyLoss(weight=torch.tensor(weight, dtype=torch.float32),
                                        ignore_index=ignore_index,
                                        reduction='sum')

    loss = criterion(logit, target.long())

    if batch_average:
        loss /= n

    return loss

TrainDataLoadIter post-process network prediction

Hello. Currently the *DataLoadIter classes allow us to do some custom pre-processing of the (x, y) pairs with the "inputs_labels_from_batch" method.

I have a network where I do some post-processing on the output of the network, e.g. (simplified):

x, y = next(train_sampler)

Y_hat = model(x)
y_hat = custom_func(Y_hat)

loss = mse(y_hat, y)

Could/should this be an option of the data loader classes, to have a "output_labels_from_batch" such that we can post-process the model forward() output?

Thanks.

LR finder for regression problems

Is this code useable on regression problems, or only classification problems? I've been trying to get it to work for sometime with no success.

please make it installable using pip

Currently it can't be installed with pip install git+https://github.com/davidtvs/pytorch-lr-finder

Collecting git+https://github.com/davidtvs/pytorch-lr-finder
  Cloning https://github.com/davidtvs/pytorch-lr-finder to /tmp/pip-req-build-bies1fy1
    Complete output from command python setup.py egg_info:
    Traceback (most recent call last):
      File "<string>", line 1, in <module>
      File "/home/a_yaroshevich/anaconda3/envs/rnd/lib/python3.6/tokenize.py", line 452, in open
        buffer = _builtin_open(filename, 'rb')
    FileNotFoundError: [Errno 2] No such file or directory: '/tmp/pip-req-build-bies1fy1/setup.py'
    
    ----------------------------------------
Command "python setup.py egg_info" failed with error code 1 in /tmp/pip-req-build-bies1fy1/

For it to work you need to create setup.py and list there your dependencies.plus modules need to be correct

Update class DataLoaderIterWrapper to accommodate extra returned values

At https://github.com/davidtvs/pytorch-lr-finder/blob/master/torch_lr_finder/lr_finder.py#L453

inputs, labels = next(self._iterator)

Could it be changed to

inputs, labels, *rest = next(self._iterator)

to accommodate cases where more values are returned?

For example, if a weighted loss is used, then also the weights need to be returned to calculate the loss. This weighted loss can be found in U-Net original paper. Another example is to return the training data file name being used; this is useful for debugging.

Support torchtext dataloaders

See e.g. https://github.com/pytorch/text.

Because these classes inherit from DataLoader, I think they should work out of the box were it not for the type check.

Kind regards, and thanks for the awesome package!

Input type (torch.FloatTensor) and weight type (torch.cuda.FloatTensor) should be the same

I sent the model to cuda and froze certain layers:

  model = prep_model(args)
  model.cuda()
  freeze_layers(model, [True, True, False])

Then I did:

lr_finder = AccumulationLRFinder(
            model, run.optimizer, criterion, 
            accumulation_steps=accumulation_steps
        )
lr_finder.range_test(train_loader,end_lr=10, num_iter=100, step_mode="exp")
lr_finder.plot()
lr_finder.reset()

Note that when I don't send model in gpu and add the device parameter to lr_finder, it works. My question is why isn't the input being sent to gpu as that is the device on which the model is mounted?

Multi-output regression problem

Hi,

My model has several outputs from the forward method:

def forward(self, x):
       ---code---
       return ClCd, angle

This returns a tuple, which LR finder does not like. I get the following error message:

if not (target.size() == input.size()):
AttributeError: 'tuple' object has no attribute 'size'

Is there a way for LR finder to work with tuples?
Alternatively, should I be structuring the output from my forward method differently (i.e. using a single output tensor)? I tried outputting a single tensor with two columns from my forward method (each column representing an output), but this gave significantly worse results in training.

AttributeError: 'tuple' object has no attribute 'log_softmax'

'dict' object has no attribute 'param_groups'

I am facing the following error, any suggestions?

py3.8.egg/torch_lr_finder/lr_finder.py", line 361, in _check_for_scheduler
AttributeError: 'dict' object has no attribute 'param_groups'

The code is a simple one

        lr_finder = LRFinder(models, optimizers, criterion, device="cuda")
        lr_finder.range_test(train_loader, end_lr=100, num_iter=100, step_mode='exp')
        lr_finder.plot(log_lr=False) # to inspect the loss-learning rate graph
        lr_finder.reset()

Obtaining ValueError on m-BERT even after using TrainDataLoaderIter

I am training a multilingual-bert model for a sentiment classification task. My torch dataset returns a dictionary. I tried to run lr_finder.range_test(....) with and without TrainDataLoaderIter but I get the same ValueError both times.

Torch Dataset

class JigsawDataset:
    def __init__(self, df, train_transforms = None):
        self.comment_text = df["comment_text"].values
        self.target = df["toxic"].values
        self.tokenizer = config.BERT_TOKENIZER
        self.max_len = config.MAX_LEN
        self.langs = df["lang"].values
        self.train_transforms = train_transforms

    def __len__(self):
        return len(self.comment_text)

    def __getitem__(self, item):
        comment_text = str(self.comment_text[item])
        comment_text = " ".join(comment_text.split())
        lang = self.langs[item]
        
        if self.train_transforms:
            comment_text, _ = self.train_transforms(data=(comment_text, lang))['data']

        inputs = self.tokenizer.encode_plus(
            comment_text,
            None,
            add_special_tokens=True,
            max_length=self.max_len,
            pad_to_max_length=True,
            truncation=True
        )

        ids = inputs["input_ids"]
        mask = inputs["attention_mask"]
        token_type_ids = inputs["token_type_ids"]

        data_loader_dict = {}
        data_loader_dict["ids"] = torch.tensor(ids, dtype=torch.long)
        data_loader_dict["mask"] = torch.tensor(mask, dtype=torch.long)
        data_loader_dict["token_type_ids"] = torch.tensor(token_type_ids, dtype=torch.long)
        data_loader_dict["targets"] = torch.tensor(self.target[item], dtype=torch.float)
        
        return data_loader_dict

Run Function

%%time

def run():

    class CustomTrainIter(TrainDataLoaderIter):
        def input_labels_from_batch(self, batch_data):
            return batch_data["ids"], batch_data["mask"], batch_data["token_type_ids"], batch_data["targets"]
    
    def loss_fn(outputs, targets):
        return nn.BCEWithLogitsLoss()(outputs, targets.view(-1, 1))

    def train_fn(data_loader, model, optimizer, device,):
        
        model, optimizer, data_loader = accelerator.prepare(model, optimizer, data_loader)
        model.train()

        for bi, d in tqdm(enumerate(data_loader), total=len(data_loader)):
            ids = d["ids"]
            token_type_ids = d["token_type_ids"]
            mask = d["mask"]
            targets = d["targets"]

            ids = ids.to(device, dtype=torch.long)
            token_type_ids = token_type_ids.to(device, dtype=torch.long)
            mask = mask.to(device, dtype=torch.long)
            targets = targets.to(device, dtype=torch.float)
            
            optimizer.zero_grad()
            outputs = model(ids=ids, mask=mask, token_type_ids=token_type_ids)

            loss = loss_fn(outputs, targets)
            
            if bi % 1000 == 0:
                print(f"bi={bi}, loss={loss}")

            accelerator.backward(loss)
            optimizer.step()

    def eval_fn(data_loader, model, device):
        model.eval()
        fin_targets = []
        fin_outputs = []

        with torch.no_grad():
            for bi, d in tqdm(enumerate(data_loader), total=len(data_loader)):
                ids = d["ids"]
                token_type_ids = d["token_type_ids"]
                mask = d["mask"]
                targets = d["targets"]

                ids = ids.to(device, dtype=torch.long)
                token_type_ids = token_type_ids.to(device, dtype=torch.long)
                mask = mask.to(device, dtype=torch.long)
                targets = targets.to(device, dtype=torch.float)

                outputs = model(ids=ids, mask=mask, token_type_ids=token_type_ids)
                fin_targets.extend(targets.cpu().detach().numpy().tolist())
                fin_outputs.extend(torch.sigmoid(outputs).cpu().detach().numpy().tolist())
        return fin_outputs, fin_targets

    df1 = pd.read_csv(
        "/workspace/data/jigsaw-multilingual/input/jigsaw-data/jigsaw-toxic-comment-train.csv", 
        usecols = ["comment_text", "toxic"]    
    )
    
    df1 = df1.head(1000)

    df2 = pd.read_csv(
        "/workspace/data/jigsaw-multilingual/input/jigsaw-data/jigsaw-unintended-bias-train.csv",
        usecols = ["comment_text", "toxic"]
    )
    
    df2 = df2.head(1000)

    df_train = pd.concat([df1, df2], axis = 0).reset_index(drop = True)
    df_train["comment_text"] = df_train["comment_text"].apply(clean_text)

    df_valid = pd.read_csv("/workspace/data/jigsaw-multilingual/input/jigsaw-data/Translated Datasets/jigsaw_miltilingual_valid_translated.csv")
    df_valid["comment_text"] = df_valid["translated"]
    df_valid.drop("translated", axis = 1, inplace = True)
    df_valid["comment_text"] = df_valid["comment_text"].apply(clean_text)


    nlp_transform = NLPTransform()

    df_train['lang'] = 'en'
    non_toxic_sentences = set()
    for comment_text in tqdm(df_train['comment_text'], total=df.shape[0]):
        non_toxic_sentences.update(nlp_transform.get_sentences(comment_text), 'en')

    transform = AddNonToxicSentencesTransform(non_toxic_sentences=list(non_toxic_sentences), p=1.0, sentence_range=(1,2))
           
    train_dataset = JigsawDataset(
       df =  df_train,
       train_transforms = get_train_transforms()
    )

    train_data_loader = torch.utils.data.DataLoader(
        train_dataset, 
        batch_size=config.TRAIN_BATCH_SIZE, 
        num_workers=4
    )

    valid_dataset = JigsawDataset(
        df = df_valid,
    )

    valid_data_loader = torch.utils.data.DataLoader(
        valid_dataset, 
        batch_size=config.VALID_BATCH_SIZE, 
        num_workers=1
    )

    device = torch.device(config.DEVICE)
    model = BERTModel()

    param_optimizer = list(model.named_parameters())
    no_decay = ["bias", "LayerNorm.bias", "LayerNorm.weight"]
    optimizer_parameters = [
        {
            "params": [
                p for n, p in param_optimizer if not any(nd in n for nd in no_decay)
            ],
            "weight_decay": 0.001,
        },
        {
            "params": [
                p for n, p in param_optimizer if any(nd in n for nd in no_decay)
            ],
            "weight_decay": 0.0,
        },
    ]

    num_train_steps = int(len(df_train) / config.TRAIN_BATCH_SIZE * config.EPOCHS)
    optimizer = AdamW(optimizer_parameters, lr=config.LEARNING_RATE)
    
    criterion = nn.BCEWithLogitsLoss()
    lr_finder = LRFinder(
        model, 
        optimizer, 
        criterion, 
        device = config.DEVICE
    )
    
    custom_train_iter = CustomTrainIter(train_data_loader)
    
    lr_finder.range_test(
        custom_train_iter, 
        end_lr = 10, 
        num_iter = 100, 
        step_mode = "exp"
    )

    best_accuracy = 0
    for epoch in range(config.EPOCHS):
        
        print(f"----------EPOCH: {epoch}----------")
        train_fn(train_data_loader, model, optimizer, device)
        outputs, targets = eval_fn(valid_data_loader, model, device)
        targets = np.array(targets) >= 0.5
        accuracy = metrics.roc_auc_score(targets, outputs)
        print(f"----------ROC AUC Score = {accuracy}----------")
        print()
        if accuracy > best_accuracy:
            torch.save(model.state_dict(), config.MODEL_PATH)
            best_accuracy = accuracy

if name == "main":
run()

Error

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<timed exec> in <module>

<timed exec> in run()

/opt/conda/lib/python3.6/site-packages/torch_lr_finder/lr_finder.py in range_test(self, train_loader, val_loader, start_lr, end_lr, num_iter, step_mode, smooth_f, diverge_th, accumulation_steps, non_blocking_transfer)
    318                 train_iter,
    319                 accumulation_steps,
--> 320                 non_blocking_transfer=non_blocking_transfer,
    321             )
    322             if val_loader:

/opt/conda/lib/python3.6/site-packages/torch_lr_finder/lr_finder.py in _train_batch(self, train_iter, accumulation_steps, non_blocking_transfer)
    369         self.optimizer.zero_grad()
    370         for i in range(accumulation_steps):
--> 371             inputs, labels = next(train_iter)
    372             inputs, labels = self._move_to_device(
    373                 inputs, labels, non_blocking=non_blocking_transfer

/opt/conda/lib/python3.6/site-packages/torch_lr_finder/lr_finder.py in __next__(self)
     57         try:
     58             batch = next(self._iterator)
---> 59             inputs, labels = self.inputs_labels_from_batch(batch)
     60         except StopIteration:
     61             if not self.auto_reset:

/opt/conda/lib/python3.6/site-packages/torch_lr_finder/lr_finder.py in inputs_labels_from_batch(self, batch_data)
     34                 "Your batch type is not supported: {}. Please inherit from "
     35                 "`TrainDataLoaderIter` or `ValDataLoaderIter` and override the "
---> 36                 "`inputs_labels_from_batch` method.".format(type(batch_data))
     37             )
     38 

ValueError: Your batch type is not supported: <class 'dict'>. Please inherit from `TrainDataLoaderIter` or `ValDataLoaderIter` and override the `inputs_labels_from_batch` method.

How do I get the lr_finder to run multiple batches for each "iteration", as defined by `num_iter` in `lr_finder.range_test()`?

How do I get the lr_finder to run multiple batches for each "iteration"? Logic being that running multiple batches would give a more precise result.
Based on the naming, I'd assumed that num_iter in lr_finder.range_test() would control the number of batches/iterations for each value of lr in the given range. However, num_iter controls the number of unique lr values to test within the given interval, running only 1 batch through the network.

TypeError: forward() missing 1 required positional argument: 'labels'

I've been following and making all the necessary changes required to run the lr_finder.range_test(). However, I'm still facing this error!
Here's my code defining the Dataset class:


class HappyWhaleDataset(Dataset):
    def __init__(self, df, transforms=None):
        self.df = df
        self.file_names = df['file_path'].values
        self.labels = df['individual_id'].values
        self.transforms = transforms
        
    def __len__(self):
        return len(self.df)
    
    def __getitem__(self, index):
        img_path = self.file_names[index]
        img = cv2.imread(img_path)
        img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
        label = self.labels[index]
        
        if self.transforms:
            img = self.transforms(image=img)["image"]
            
        return {
            'image': img,
            'label': torch.tensor(label, dtype=torch.long)
        }


def prepare_loaders(df, fold):
    df_train = df[df.kfold != fold].reset_index(drop=True)
    df_valid = df[df.kfold == fold].reset_index(drop=True)
    
    train_dataset = HappyWhaleDataset(df_train, transforms=data_transforms["train"])
    valid_dataset = HappyWhaleDataset(df_valid, transforms=data_transforms["valid"])

    train_loader = DataLoader(train_dataset, batch_size=CONFIG['train_batch_size'], 
                              num_workers=2, shuffle=True, pin_memory=True, drop_last=True)
    valid_loader = DataLoader(valid_dataset, batch_size=CONFIG['valid_batch_size'], 
                              num_workers=2, shuffle=False, pin_memory=True)
    
    return train_loader, valid_loader

train_loader, valid_loader = prepare_loaders(df, fold=0)

Note: Model training goes without error when I'm just creating a usual train_loader with the above code.

class CustomTrainIter(TrainDataLoaderIter):
    def inputs_labels_from_batch(self, batch_data):
        return batch_data["image"], batch_data["label"]
    
custom_loader = CustomTrainIter(train_loader)

lr_finder = LRFinder(model, optimizer, criterion, device=CONFIG['device'])
lr_finder.range_test(custom_loader, end_lr=1, num_iter=100, step_mode="linear")
lr_finder.plot(log_lr=False)
lr_finder.reset()

TypeError                                 Traceback (most recent call last)
/tmp/ipykernel_34/1446799792.py in <module>
      6 
      7 lr_finder = LRFinder(model, optimizer, criterion, device=CONFIG['device'])
----> 8 lr_finder.range_test(custom_loader, end_lr=1, num_iter=100, step_mode="linear")
      9 lr_finder.plot(log_lr=False)
     10 lr_finder.reset()

/opt/conda/lib/python3.7/site-packages/torch_lr_finder/lr_finder.py in range_test(self, train_loader, val_loader, start_lr, end_lr, num_iter, step_mode, smooth_f, diverge_th, accumulation_steps, non_blocking_transfer)
    318                 train_iter,
    319                 accumulation_steps,
--> 320                 non_blocking_transfer=non_blocking_transfer,
    321             )
    322             if val_loader:

/opt/conda/lib/python3.7/site-packages/torch_lr_finder/lr_finder.py in _train_batch(self, train_iter, accumulation_steps, non_blocking_transfer)
    375 
    376             # Forward pass
--> 377             outputs = self.model(inputs)
    378             loss = self.criterion(outputs, labels)
    379 

/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
   1049         if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
   1050                 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1051             return forward_call(*input, **kwargs)
   1052         # Do not call functions when jit is used
   1053         full_backward_hooks, non_full_backward_hooks = [], []

TypeError: forward() missing 1 required positional argument: 'labels'

how to define num_iter?

hi davidtvs,
when use lr find, I find the loss curve is a little different when use different num_iter.
for example, image num is 6700, batch size is 252, when use num_iter=27, the loss decreases obviouslyfrom 1e-4 to 1e-3, but when num_iter =270, it decreases obviously from 1e-4 to 5e-4,when num_iter=540, it decreases obviously from 1e-4 to 8e-4, so , I am not sure which loss is correct?
@davidtvs
Thanks! it is a really good tool

No Data Loader?

Is it possible to make it also compatible when there's no dataloader? My dataset is fully loaded on memory.

Question: Number of iterations

I tried the LR finder with 100 and with 1000 iterations (all other parameters staying the same) and got very different recommendations for the LR - for 100 iterations it was 1.2e-3, and for 1000 iterations 3.3e-5. I tried training with both of these, and they don't produce optimal results compared to a more aggressive learning rate - 5e-2 (which is actually found by the 100 iterations to be the maximum LR).

What would be the ideal number of iterations that you would recommend? Would this number be calculated using model / dataset size in any way? I get the feeling that doing more iterations in general affects the finder, because the maximum learning rate is found faster. I did not look at the code, but I guess that the weights are not reset after each iteration - do you think it would be good to reset them, such that the previous iterations don't affect subsequent ones?

LR finder for optimizing a single input tensor?

I have an optimization task that optimizes a single tensor by passing it through a set of transforms and then into the model. Losses are then calculated by using hooks attached to various model layers.

Is is possible to use this project for finding the optimal LR for my optimization task? The code looks like it requires a DataLoader instance.

How to use w/ LSTM

Hi,

I would like to use the lr-finder with an LSTM. in the forward step of my model I do:

for epoch in range(100):
    model.train()
    hidden = model.init_hidden(batch_size)
    total_loss = 0

    for data, target in dataloader:
        hidden = repackage_hidden(hidden)
        output, hidden = model(data, hidden)
        loss = loss_fn(output, target.view(-1))

        optimizer.zero_grad()
        loss.backward()
        optimizer.step()
        total_loss += loss.item()

My dataloader yields x, y where x is a sequence and y is the next step in the sequence (think language model), e.g.

data = [1, 2, 3, 4]
target = [2, 3, 4, 5]

Now when I try to do:

lr_finder.range_test(dataloader, end_lr=100, num_iter=100)

... I get the following error:

TypeError: forward() missing 1 required positional argument: 'hidden'

How can I pass hidden to the model using lr-finder?

Issue with DataLoader with lr_finder.range_test

I try to use:

class CustomTrainIter(TrainDataLoaderIter):
    def inputs_labels_from_batch(self, batch_data):
        return batch_data["img"], batch_data["target"]

to work with DataLoader for the lr_finder.range_test() but still got the error:
TypeError: list indices must be integers or slices, not str

TypeError                                 Traceback (most recent call last)
<ipython-input-60-b2a8b27d6c88> in <module>()
      3 optim = torch.optim.Adam(model_ft.parameters(), lr=1e-7, weight_decay=1e-2)
      4 lr_finder = LRFinder(model_ft,optim, criterion, device='cuda')
----> 5 lr_finder.range_test( custom_train_iter ,end_lr=100,num_iter=100)
      6 lr_finder.plot()
      7 lr_finder.reset()

3 frames
/usr/local/lib/python3.7/dist-packages/torch_lr_finder/lr_finder.py in range_test(self, train_loader, val_loader, start_lr, end_lr, num_iter, step_mode, smooth_f, diverge_th, accumulation_steps, non_blocking_transfer)
    318                 train_iter,
    319                 accumulation_steps,
--> 320                 non_blocking_transfer=non_blocking_transfer,
    321             )
    322             if val_loader:

/usr/local/lib/python3.7/dist-packages/torch_lr_finder/lr_finder.py in _train_batch(self, train_iter, accumulation_steps, non_blocking_transfer)
    369         self.optimizer.zero_grad()
    370         for i in range(accumulation_steps):
--> 371             inputs, labels = next(train_iter)
    372             inputs, labels = self._move_to_device(
    373                 inputs, labels, non_blocking=non_blocking_transfer

/usr/local/lib/python3.7/dist-packages/torch_lr_finder/lr_finder.py in __next__(self)
     57         try:
     58             batch = next(self._iterator)
---> 59             inputs, labels = self.inputs_labels_from_batch(batch)
     60         except StopIteration:
     61             if not self.auto_reset:

<ipython-input-58-f89d28995874> in inputs_labels_from_batch(self, batch_data)
      4 
      5 
----> 6         return batch_data["img"], batch_data["target"]
      7 
      8 custom_train_iter = CustomTrainIter(train_dl)

TypeError: list indices must be integers or slices, not str

Any suggestion ? thanks !

How to find the best lr?

Hi, I am new to this package. I have done the things here. It gives me set of loss functions with their corresponding lr. But I don't know how to find the best lr. I would like to know if there is a method to automatically find the proposed best lr within this package.

LRFinder w/ Gradient Accumulation

Great package! Thank you for sharing :)

I was wondering if you plan on adding gradient accumulation support for using LRFinder with a larger batch size.
Will you be adding mixed precision support?

Distributed training with ddp

Thanks for the work! I'd like to know if distributed training is supported, like the DataParallel and DistributedDataParallel, is it compatible in this work?

How can use this lib for auto-encoders?

In manual of this library the examples are for a unified model but auto-encoders are made of two parts:encoder and decoder. How can I use this library for auto-encoder?

Feature Req: Option to disable showing graph and enable saving graph to disk, for purposes of running on headless server

As mentioned in title, I'm running experiments on a headless server. Can't view matplotlib graphs. It'd be nice to be able to disable the graphical pop-up (which hangs system till it's closed) and instead save the generated plot to disk.
The plot could be named by user given string and/or time of generation.

If you prefer/are busy, I can generate a pull request for the same.

RuntimeError: Expected object of scalar type Long but got scalar type Byte for argument #2 'target'

With optimizer = torch.optim.Adam( model.parameters(), lr = learning_rate, weight_decay = weight_decay)

criterion = nn.CrossEntropyLoss( weight = None, ignore_index = ignore_index, reduce = False)

and then executing

lr_finder = LRFinder(model, optimizer, criterion, device="cuda") lr_finder.range_test( dataLoader[ 'train'], end_lr=100, num_iter=100) lr_finder.plot() # to inspect the loss-learning rate graph lr_finder.reset() # to reset the model and optimizer to their initial state

I am getting the error,