davidtvs / pytorch-lr-finder Goto Github PK
View Code? Open in Web Editor NEWA learning rate range test implementation in PyTorch
License: MIT License
A learning rate range test implementation in PyTorch
License: MIT License
Hey hey ๐
PyTorch now has it's native amp module for a while now
https://pytorch.org/docs/stable/amp.html
it would be great to move to that, or at least prefer using it if it's available.
Hey,
is there an elegant way to use a multiple input model with lr_finder?
Given forward
looks like this:
def forward(self, x, x_embds):
...
and my DataLoader looks like this:
train_loader = DataLoader(TensorDataset(xtrain, xtrain_emb, ytrain), batch_size=BATCHSIZE, shuffle=True)
I want to separate numerical inputs from variables that will be used as embeddings in the model. Therefore the DataLoader yields the numerical variables (xtrain), the variable to be embedded (xtrain_emb) separately, and of course the labels y.
In this case, my lr_finder called like this:
lrf = LRFinder(net, optim, criterion)
lrf.range_test(train_loader, val_loader, start_lr=0.00001, end_lr=1)
lrf.plot()
lrf.reset()
gives this stack trace because it does not pass the "additional" component from the DataLoader (xtrain_emb) to the forward method:
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
~/TestProject/pytorch/torch_net.py in
200 if 1:
201 lrf = LRFinder(net, optim, criterion)
----> 202 lrf.range_test(train_loader, val_loader, start_lr=0.00001, end_lr=1)
203 lrf.plot()
204 lrf.reset()
~/miniconda3/envs/py38/lib/python3.8/site-packages/torch_lr_finder/lr_finder.py in range_test(self, train_loader, val_loader, start_lr, end_lr, num_iter, step_mode, smooth_f, diverge_th, accumulation_steps, non_blocking_transfer)
315 for iteration in tqdm(range(num_iter)):
316 # Train on batch and retrieve loss
--> 317 loss = self._train_batch(
318 train_iter,
319 accumulation_steps,
~/miniconda3/envs/py38/lib/python3.8/site-packages/torch_lr_finder/lr_finder.py in _train_batch(self, train_iter, accumulation_steps, non_blocking_transfer)
375
376 # Forward pass
--> 377 outputs = self.model(inputs)
378 loss = self.criterion(outputs, labels)
379
~/miniconda3/envs/py38/lib/python3.8/site-packages/torch/nn/modules/module.py in __call__(self, *input, **kwargs)
530 result = self._slow_forward(*input, **kwargs)
531 else:
--> 532 result = self.forward(*input, **kwargs)
533 for hook in self._forward_hooks.values():
534 hook_result = hook(self, input, result)
TypeError: forward() missing 1 required positional argument: 'x_embds'
Issue #18 mentions a scenario in which the DataLoader yields additional outputs, but it seems like the additional inputs are still not used. As a workaround I might be able to pass a single design matrix X through the loader and forward and then, within the forward method, extract the column to be embedded. This seems like a not-so-nice workaround though.
When using the plot
function in a situation like the following:
where the first value in lrs[min_grad_idx]
is the suggested learning rate, even if the suggested learning rate is printed, it is not returned.
Expected behavior: return ax, lrs[min_grad_idx]
Observed behavior: return ax
These seem to be the relevant lines. Seems that ax
is returned due to min_grad_idx evaluating to False (because it is 0):
pytorch-lr-finder/torch_lr_finder/lr_finder.py
Lines 535 to 538 in 9cfcbec
I'm seeing a suggested learning rate, but no plot when calling .plot() with all default arguments.
Hey,
I really enjoy using this handy package. Could you please make a new pip installable version where the plot method accepts the ax argument, this would be super helpful!
Thanks and best regards,
Fabio
Hey @davidtvs, this issue is found while I was writing an example for utilizing this package with huggingface/transformers
for #55 .
Dataset
returns string)range_test()
with val_loader
---> 10 lr_finder.range_test(train_loader, val_loader=valid_loader, start_lr=1e-5, end_lr=10, num_iter=100, step_mode='linear')
1 frames
/usr/local/lib/python3.6/dist-packages/torch_lr_finder/lr_finder.py in range_test(self, train_loader, val_loader, start_lr, end_lr, num_iter, step_mode, smooth_f, diverge_th, accumulation_steps, non_blocking_transfer)
288 if val_loader:
289 loss = self._validate(
--> 290 val_iter, non_blocking_transfer=non_blocking_transfer
291 )
292
/usr/local/lib/python3.6/dist-packages/torch_lr_finder/lr_finder.py in _validate(self, val_iter, non_blocking_transfer)
398
399 if isinstance(inputs, tuple) or isinstance(inputs, list):
--> 400 batch_size = inputs[0].size(0)
401 else:
402 batch_size = inputs.size(0)
AttributeError: 'str' object has no attribute 'size'
In current implementation, batch_size
is determined dynamically according to the shape of inputs
in LRFinder._validate()
. (v0.2.0) L399-L402 will work normally only when given inputs
is a torch.tensor
. And that's why it failed when inputs
is a list of string.
Maybe it's not a usual case that Dataset
returns non-torch.tensor
values, but I think it would be more easier to access it from DataLoader.batch_size
since it's going to iterate a val_loader
in LRFinder._validate()
.
Hence that I proposed a fix for this in that notebook, it's simply add a line batch_size = val_iter.data_loader.batch_size
before entering the loop and remove those if-else statement, you can check it out here.
But I'm having doubts about adding a property batch_size
in DataLoaderIter
, e.g.
class DataLoaderIter(object):
# ...
@property
def batch_size(self):
return self.data_loader.batch_size
With this property, proposed fix can be simplified a little into this:
class LRFinder(object):
def _validate(self, val_iter, non_blocking_transfer=True):
# Set model to evaluation mode and disable gradient computation
running_loss = 0
self.model.eval()
with torch.no_grad():
for inputs, labels in val_iter:
# Move data to the correct device
inputs, labels = self._move_to_device(
inputs, labels, non_blocking=non_blocking_transfer
)
# Forward pass and loss computation
outputs = self.model(inputs)
loss = self.criterion(outputs, labels)
running_loss += loss.item() * val_iter.batch_size
return running_loss / len(val_iter.dataset)
What do you think of it?
I copied your example notebook to colab and ran the code without changing anything. But the validation loss I get goes flat, which is clearly a mistake when compared to your example. I also experienced this with my other networks which do the same, the loss just goes flat.
You can see my results from colab and your example in the figures below.
EDIT: If I replace val_iter
with val_loader
inside loss = self._validate(...)
it does seem to "work" as I'd expect. So somewhere there seems to be a mistake in how the val_iter is iterated.
Colab | Your Example notebook |
---|---|
In your README file apex seems to be an optinal requirement.
However, during import time an annoying message warning me that I don't have this module installed keeps poluting my code log.
I tried to reinstall the package using the recommended command and nothing changed at all. I also tried to use warning library to ignore the warning but the way it is implemented is using the Python logging library and I can't remove it this way.
I am in need of a tool like this for a particular problem that is very sensitive to the LR. I am, however, unable to get this package to work with any transformer model unfortunately.
My error is as below and I am wondering if you have any insight!
from torch_lr_finder import LRFinder
import torch.optim as optim
from transformers import XLMRobertaTokenizer, XLMRobertaForSequenceClassification
model = XLMRobertaForSequenceClassification.from_pretrained("xlm-roberta-base", num_labels=3).cuda()
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=1e-7, weight_decay=1e-2)
lr_finder = LRFinder(model, optimizer, criterion, device="cuda")
lr_finder.range_test(train_dataloader, val_loader=valid_dataloader, end_lr=1, num_iter=100, step_mode="linear")
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-8-decc9b6c423b> in <module>
----> 1 lr_finder.range_test(train_dataloader, val_loader=valid_dataloader, end_lr=1, num_iter=100, step_mode="linear")
~\Anaconda3\envs\my_ml\lib\site-packages\torch_lr_finder\lr_finder.py in range_test(self, train_loader, val_loader, start_lr, end_lr, num_iter, step_mode, smooth_f, diverge_th, accumulation_steps, non_blocking_transfer)
284 train_iter,
285 accumulation_steps,
--> 286 non_blocking_transfer=non_blocking_transfer,
287 )
288 if val_loader:
~\Anaconda3\envs\my_ml\lib\site-packages\torch_lr_finder\lr_finder.py in _train_batch(self, train_iter, accumulation_steps, non_blocking_transfer)
342 # Forward pass
343 outputs = self.model(inputs)
--> 344 loss = self.criterion(outputs, labels)
345
346 # Loss should be averaged in each step
~\Anaconda3\envs\my_ml\lib\site-packages\torch\nn\modules\module.py in _call_impl(self, *input, **kwargs)
724 result = self._slow_forward(*input, **kwargs)
725 else:
--> 726 result = self.forward(*input, **kwargs)
727 for hook in itertools.chain(
728 _global_forward_hooks.values(),
~\Anaconda3\envs\my_ml\lib\site-packages\torch\nn\modules\loss.py in forward(self, input, target)
946 def forward(self, input: Tensor, target: Tensor) -> Tensor:
947 return F.cross_entropy(input, target, weight=self.weight,
--> 948 ignore_index=self.ignore_index, reduction=self.reduction)
949
950
~\Anaconda3\envs\my_ml\lib\site-packages\torch\nn\functional.py in cross_entropy(input, target, weight, size_average, ignore_index, reduce, reduction)
2420 if size_average is not None or reduce is not None:
2421 reduction = _Reduction.legacy_get_string(size_average, reduce)
-> 2422 return nll_loss(log_softmax(input, 1), target, weight, None, ignore_index, None, reduction)
2423
2424
~\Anaconda3\envs\my_ml\lib\site-packages\torch\nn\functional.py in log_softmax(input, dim, _stacklevel, dtype)
1589 dim = _get_softmax_dim('log_softmax', input.dim(), _stacklevel)
1590 if dtype is None:
-> 1591 ret = input.log_softmax(dim)
1592 else:
1593 ret = input.log_softmax(dim, dtype=dtype)
AttributeError: 'tuple' object has no attribute 'log_softmax'
In general, I need some wrappers about how batch prcessed and data flow to model.
Traceback (most recent call last):
File "example_copy.py", line 31, in
lr_finder = LRFinder(model, optimizer, criterion)
File "/home/snehaverma/anaconda3/envs/gmesh/lib/python3.6/site-packages/torch_lr_finder/lr_finder.py", line 166, in init
self.state_cacher.store("model", self.model.state_dict())
File "/home/snehaverma/anaconda3/envs/gmesh/lib/python3.6/site-packages/torch_lr_finder/lr_finder.py", line 624, in store
self.cached.update({key: copy.deepcopy(state_dict)})
File "/home/snehaverma/anaconda3/envs/gmesh/lib/python3.6/copy.py", line 180, in deepcopy
y = _reconstruct(x, memo, *rv)
File "/home/snehaverma/anaconda3/envs/gmesh/lib/python3.6/copy.py", line 306, in _reconstruct
value = deepcopy(value, memo)
File "/home/snehaverma/anaconda3/envs/gmesh/lib/python3.6/copy.py", line 161, in deepcopy
y = copier(memo)
File "/home/snehaverma/anaconda3/envs/gmesh/lib/python3.6/site-packages/torch/nn/parameter.py", line 32, in deepcopy
result = type(self)(self.data.clone(memory_format=torch.preserve_format), self.requires_grad)
File "/home/snehaverma/anaconda3/envs/gmesh/lib/python3.6/site-packages/torch/nn/parameter.py", line 153, in new
data = torch.tensor([], **factory_kwargs)
TypeError: tensor(): argument 'device' must be torch.device, not bool
Hey guys,
I'm trying to find the optimal range of learning rates for Facial Expression classification but the problem is that I'm getting a flat LR from 1e-6 to 1e-2 and then it just shoots up and diverges. The flat curve occurs at loss of 1.4, which doesn't seem too low. So does it mean there's a problem with my dataset to it's due to something else?
Thanks
i want to be able to pull the value of steepest gradiant and use it in my code as a integer value.
i see in the code it is under lr_finder.plot() but i am not abke to just assign min_grad to anything
can you please help me
Hello,
Firstly, thank you for this wonderful library.
I have a model which expects 2 inputs. I am working with 2 kinds of images, one of size (512, 1536) and the other of size (128, 384). Therefore, my train_loader contains 2 inputs and one target of shape (128, 384, 16). My model has 4 prediction heads and hence is trained using 4 losses for different purposes.
So my collate_fn for the data loader looks like this:
def detection_collate(batch):
"""Custom collate fn for dealing with batches of images that have a different
number of associated object annotations (bounding boxes).
Arguments:
batch: (tuple) A tuple of tensor images and lists of annotations
Return:
A tuple containing:
1) (tensor) batch of images stacked on their 0 dim
2) (list of tensors) annotations for a given image are stacked on
0 dim
"""
targets = []
imgs = []
deps = []
for sample in batch:
imgs.append(sample[0])
deps.append(sample[1])
targets.append(sample[2])
return torch.stack(imgs, 0), torch.stack(deps, 0), torch.stack(targets, 0)
As mentioned, there are 4 different losses: Custom Heatmap (Focal) loss, SmoothL1, SmoothL1, BCE loss.
The forward method of the model expects 2 inputs. A small snippet is shown below:
def forward(self, x, dep=None, target=None):
# Backbone: ResNet18, x is image size: (512, 1536)
Here, targets are the labels so to say.
In this case, how do I go about finding the best learning rate using lr-finder?
Notably, I can only use batch_size=2 because of the computational limitations.
Code:
lr_finder.range_test(dataloader, end_lr=100, num_iter=100)
Response:
ValueError: too many values to unpack (expected 2)
Latest Pytorch and data loader
Hi,
I want to find lr for my RNN network which will have sequence as an input. When i try with torch_lr_finder it throws error as
File "/usr/local/lib/python3.5/dist-packages/torch_lr_finder/lr_finder.py", line 125, in range_test
inputs, labels = next(iterator)
ValueError: too many values to unpack (expected 2)
Can you help me to get over this error
@davidtvs
In the _validate function, you try to iterate through the elements of ValDataLoaderIter with a simple for loop:
This throws a 'ValueError: too many values to unpack' because it is attempting to unpack the entire dataloader which has way more then two elements. I think what you want here is just the next element in val_iter, similarly to your _train_batch function, so a loop over the length of val_iter with a call to next(val_iter) every loop or an enumerate(val_iter), no?
I tried to use the package but when plotting the learning curve it doesn't show anything, just a plot with the labels but not curve
Executing in Google Colab the command,
!pip install https://download.pytorch.org/whl/cu100/torch-1.2.0-cp36-cp36m-manylinux1_x86_64.whl && pip install https://download.pytorch.org/whl/cu100/torchvision-0.4.0-cp36-cp36m-manylinux1_x86_64.whl
found in https://colab.research.google.com/drive/1BhWYtLFOa24wisNckt9i6rQhBKurVWWV
gives the error
ERROR: torchvision 0.4.2 has requirement torch==1.3.1, but you'll have torch 1.2.0 which is incompatible..
By the way, executing !cat /usr/local/cuda/version.txt
gives
CUDA Version 10.0.130
Please advise.
Thanks,
Vassilis
class DataLoaderIterWrapper(object):
def __init__(self, data_loader, auto_reset=True):
self.data_loader = data_loader
self.auto_reset = auto_reset
self._iterator = iter(data_loader)
def __next__(self):
# Get a new set of inputs and labels
try:
# inputs, labels, *_ = next(self._iterator)
inputs, labels = next(self._iterator)
except StopIteration:
if not self.auto_reset:
raise
self._iterator = iter(self.data_loader)
# inputs, labels, *_ = next(self._iterator)
inputs, labels = next(self._iterator)
return inputs, labels
Can we make this work for nn.TripletMarginLoss?
where the dataset object returns query, positive_image, negative_image
which are passed to the model one-by-one and the three resultant embeddings are passed to the loss function (https://medium.com/@akarshzingade/image-similarity-using-deep-ranking-c1bd83855978).
Hi David @davidtvs,
Is there a way we can plot the learning rate vs accuracy and get LR at max accuracy using your library?
I am trying to use SuperConvergence(https://arxiv.org/pdf/1708.07120.pdf) by Leslie N. Smith. So I am using PyTorch's OneCycleLR scheduler for this. And it is expecting max_lr value.
I used your lr-finder but it is plotting between loss curve and learning rates and suggesting LR at steepest descent. But I am looking for learning rate vs accuracy and get LR at maximum accuracy.
Please suggest to me.
Thanks in advance,
Naga Pavan
Thanks for grate tool!
I want to find best lr or save the fig.
I will run this wonderful library at the python command, not jupyter notebook.
how can I save the fig or find the best lr?
thanks.
Hey! I love this repo, thanks for making it ๐ฏ
Everything works well except for one thing, after some digging around/experimenting, here's what I've found:
Below are some figures for the training loss and training accuracy (on MNIST, using a resnet18).
Problem:
Solution:
Regarding the figure/graphs below, both models used the same hyperparameters.
An in-code example of option 1) would be similar to what was given in the README.md:
from torch_lr_finder import LRFinder
model = ...
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=1e-7, weight_decay=1e-2)
lr_finder = LRFinder(model, optimizer, criterion, device="cuda")
lr_finder.range_test(trainloader, end_lr=100, num_iter=100)
lr_finder.plot()
// Then use "model" for training
An in-code example of option 3) would be:
from torch_lr_finder import LRFinder
model = ...
temp_model = *create model with same architecture*
// copy weights over
temp_model.load_state_dict(model.state_dict)
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=1e-7, weight_decay=1e-2)
// use temp model in lr_finder
lr_finder = LRFinder(temp_model, optimizer, criterion, device="cuda")
lr_finder.range_test(trainloader, end_lr=100, num_iter=100)
lr_finder.plot()
I running a semantic segmentation model, Deeplabv3+ with a modified CrossEntropyLoss and either SGD or Adam optimizer.
When I run the LRFinder, I get a blank graph. No losses seen. Even though I printed the losses and the criterion is def returning valid values.
Sweeping across start_lr = 1e-07 and end_lr = 0.0001
0%| | 0/10 [00:00<?, ?it/s]
loss: tensor(89984., device='cuda:0', grad_fn=<DivBackward0>)
10%|โโโโโโโโโโโโ | 1/10 [00:06<00:54, 6.01s/it]
loss: tensor(1588043.6250, device='cuda:0', grad_fn=<DivBackward0>)
20%|โโโโโโโโโโโโโโโโโโโโโโโ | 2/10 [00:09<00:40, 5.12s/it]
loss: tensor(420687.0938, device='cuda:0', grad_fn=<DivBackward0>)
30%|โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ | 3/10 [00:12<00:31, 4.50s/it]
loss: tensor(653955.4375, device='cuda:0', grad_fn=<DivBackward0>)
40%|โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ | 4/10 [00:15<00:24, 4.07s/it]
loss: tensor(141592.6875, device='cuda:0', grad_fn=<DivBackward0>)
50%|โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ | 5/10 [00:18<00:18, 3.76s/it]
loss: tensor(97450.2891, device='cuda:0', grad_fn=<DivBackward0>)
60%|โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ | 6/10 [00:21<00:14, 3.55s/it]
loss: tensor(160497.9375, device='cuda:0', grad_fn=<DivBackward0>)
70%|โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ | 7/10 [00:24<00:10, 3.44s/it]
loss: tensor(151121.3594, device='cuda:0', grad_fn=<DivBackward0>)
80%|โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ | 8/10 [00:27<00:06, 3.38s/it]
loss: tensor(123211.6484, device='cuda:0', grad_fn=<DivBackward0>)
90%|โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ | 9/10 [00:31<00:03, 3.40s/it]
loss: tensor(98576.7578, device='cuda:0', grad_fn=<DivBackward0>)
100%|โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ| 10/10 [00:34<00:00, 3.43s/it]
Learning rate search finished. See the graph with {finder_name}.plot()
Lemme know what other details I can attach.
My criterion:
def cross_entropy2d(logit, target, ignore_index=255, weight=None, batch_average=True):
"""
The loss is
.. math::
\sum_{i=1}^{\\infty} x_{i}
`(minibatch, C, d_1, d_2, ..., d_K)`
Args:
logit (Tensor): Output of network
target (Tensor): Ground Truth
ignore_index (int, optional): Defaults to 255. The pixels with this labels do not contribute to loss
weight (List, optional): Defaults to None. Weight assigned to each class
batch_average (bool, optional): Defaults to True. Whether to consider the loss of each element in the batch.
Returns:
Float: The value of loss.
"""
n, c, h, w = logit.shape
target = target.squeeze(1)
if weight is None:
criterion = nn.CrossEntropyLoss(weight=weight, ignore_index=ignore_index, reduction='sum')
else:
criterion = nn.CrossEntropyLoss(weight=torch.tensor(weight, dtype=torch.float32),
ignore_index=ignore_index,
reduction='sum')
loss = criterion(logit, target.long())
if batch_average:
loss /= n
return loss
Hello. Currently the *DataLoadIter classes allow us to do some custom pre-processing of the (x, y) pairs with the "inputs_labels_from_batch" method.
I have a network where I do some post-processing on the output of the network, e.g. (simplified):
x, y = next(train_sampler)
Y_hat = model(x)
y_hat = custom_func(Y_hat)
loss = mse(y_hat, y)
Could/should this be an option of the data loader classes, to have a "output_labels_from_batch" such that we can post-process the model forward() output?
Thanks.
Is this code useable on regression problems, or only classification problems? I've been trying to get it to work for sometime with no success.
Currently it can't be installed with pip install git+https://github.com/davidtvs/pytorch-lr-finder
Collecting git+https://github.com/davidtvs/pytorch-lr-finder
Cloning https://github.com/davidtvs/pytorch-lr-finder to /tmp/pip-req-build-bies1fy1
Complete output from command python setup.py egg_info:
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "/home/a_yaroshevich/anaconda3/envs/rnd/lib/python3.6/tokenize.py", line 452, in open
buffer = _builtin_open(filename, 'rb')
FileNotFoundError: [Errno 2] No such file or directory: '/tmp/pip-req-build-bies1fy1/setup.py'
----------------------------------------
Command "python setup.py egg_info" failed with error code 1 in /tmp/pip-req-build-bies1fy1/
For it to work you need to create setup.py and list there your dependencies.plus modules need to be correct
At https://github.com/davidtvs/pytorch-lr-finder/blob/master/torch_lr_finder/lr_finder.py#L453
inputs, labels = next(self._iterator)
Could it be changed to
inputs, labels, *rest = next(self._iterator)
to accommodate cases where more values are returned?
For example, if a weighted loss is used, then also the weights need to be returned to calculate the loss. This weighted loss can be found in U-Net original paper. Another example is to return the training data file name being used; this is useful for debugging.
See e.g. https://github.com/pytorch/text.
Because these classes inherit from DataLoader, I think they should work out of the box were it not for the type check.
Kind regards, and thanks for the awesome package!
I sent the model to cuda and froze certain layers:
model = prep_model(args)
model.cuda()
freeze_layers(model, [True, True, False])
Then I did:
lr_finder = AccumulationLRFinder(
model, run.optimizer, criterion,
accumulation_steps=accumulation_steps
)
lr_finder.range_test(train_loader,end_lr=10, num_iter=100, step_mode="exp")
lr_finder.plot()
lr_finder.reset()
Note that when I don't send model in gpu and add the device parameter to lr_finder, it works. My question is why isn't the input being sent to gpu as that is the device on which the model is mounted?
Hi,
My model has several outputs from the forward method:
def forward(self, x):
---code---
return ClCd, angle
This returns a tuple, which LR finder does not like. I get the following error message:
if not (target.size() == input.size()):
AttributeError: 'tuple' object has no attribute 'size'
Is there a way for LR finder to work with tuples?
Alternatively, should I be structuring the output from my forward method differently (i.e. using a single output tensor)? I tried outputting a single tensor with two columns from my forward method (each column representing an output), but this gave significantly worse results in training.
I am facing the following error, any suggestions?
py3.8.egg/torch_lr_finder/lr_finder.py", line 361, in _check_for_scheduler
AttributeError: 'dict' object has no attribute 'param_groups'
The code is a simple one
lr_finder = LRFinder(models, optimizers, criterion, device="cuda")
lr_finder.range_test(train_loader, end_lr=100, num_iter=100, step_mode='exp')
lr_finder.plot(log_lr=False) # to inspect the loss-learning rate graph
lr_finder.reset()
I am training a multilingual-bert model for a sentiment classification task. My torch dataset returns a dictionary. I tried to run lr_finder.range_test(....)
with and without TrainDataLoaderIter
but I get the same ValueError
both times.
class JigsawDataset:
def __init__(self, df, train_transforms = None):
self.comment_text = df["comment_text"].values
self.target = df["toxic"].values
self.tokenizer = config.BERT_TOKENIZER
self.max_len = config.MAX_LEN
self.langs = df["lang"].values
self.train_transforms = train_transforms
def __len__(self):
return len(self.comment_text)
def __getitem__(self, item):
comment_text = str(self.comment_text[item])
comment_text = " ".join(comment_text.split())
lang = self.langs[item]
if self.train_transforms:
comment_text, _ = self.train_transforms(data=(comment_text, lang))['data']
inputs = self.tokenizer.encode_plus(
comment_text,
None,
add_special_tokens=True,
max_length=self.max_len,
pad_to_max_length=True,
truncation=True
)
ids = inputs["input_ids"]
mask = inputs["attention_mask"]
token_type_ids = inputs["token_type_ids"]
data_loader_dict = {}
data_loader_dict["ids"] = torch.tensor(ids, dtype=torch.long)
data_loader_dict["mask"] = torch.tensor(mask, dtype=torch.long)
data_loader_dict["token_type_ids"] = torch.tensor(token_type_ids, dtype=torch.long)
data_loader_dict["targets"] = torch.tensor(self.target[item], dtype=torch.float)
return data_loader_dict
%%time
def run():
class CustomTrainIter(TrainDataLoaderIter):
def input_labels_from_batch(self, batch_data):
return batch_data["ids"], batch_data["mask"], batch_data["token_type_ids"], batch_data["targets"]
def loss_fn(outputs, targets):
return nn.BCEWithLogitsLoss()(outputs, targets.view(-1, 1))
def train_fn(data_loader, model, optimizer, device,):
model, optimizer, data_loader = accelerator.prepare(model, optimizer, data_loader)
model.train()
for bi, d in tqdm(enumerate(data_loader), total=len(data_loader)):
ids = d["ids"]
token_type_ids = d["token_type_ids"]
mask = d["mask"]
targets = d["targets"]
ids = ids.to(device, dtype=torch.long)
token_type_ids = token_type_ids.to(device, dtype=torch.long)
mask = mask.to(device, dtype=torch.long)
targets = targets.to(device, dtype=torch.float)
optimizer.zero_grad()
outputs = model(ids=ids, mask=mask, token_type_ids=token_type_ids)
loss = loss_fn(outputs, targets)
if bi % 1000 == 0:
print(f"bi={bi}, loss={loss}")
accelerator.backward(loss)
optimizer.step()
def eval_fn(data_loader, model, device):
model.eval()
fin_targets = []
fin_outputs = []
with torch.no_grad():
for bi, d in tqdm(enumerate(data_loader), total=len(data_loader)):
ids = d["ids"]
token_type_ids = d["token_type_ids"]
mask = d["mask"]
targets = d["targets"]
ids = ids.to(device, dtype=torch.long)
token_type_ids = token_type_ids.to(device, dtype=torch.long)
mask = mask.to(device, dtype=torch.long)
targets = targets.to(device, dtype=torch.float)
outputs = model(ids=ids, mask=mask, token_type_ids=token_type_ids)
fin_targets.extend(targets.cpu().detach().numpy().tolist())
fin_outputs.extend(torch.sigmoid(outputs).cpu().detach().numpy().tolist())
return fin_outputs, fin_targets
df1 = pd.read_csv(
"/workspace/data/jigsaw-multilingual/input/jigsaw-data/jigsaw-toxic-comment-train.csv",
usecols = ["comment_text", "toxic"]
)
df1 = df1.head(1000)
df2 = pd.read_csv(
"/workspace/data/jigsaw-multilingual/input/jigsaw-data/jigsaw-unintended-bias-train.csv",
usecols = ["comment_text", "toxic"]
)
df2 = df2.head(1000)
df_train = pd.concat([df1, df2], axis = 0).reset_index(drop = True)
df_train["comment_text"] = df_train["comment_text"].apply(clean_text)
df_valid = pd.read_csv("/workspace/data/jigsaw-multilingual/input/jigsaw-data/Translated Datasets/jigsaw_miltilingual_valid_translated.csv")
df_valid["comment_text"] = df_valid["translated"]
df_valid.drop("translated", axis = 1, inplace = True)
df_valid["comment_text"] = df_valid["comment_text"].apply(clean_text)
nlp_transform = NLPTransform()
df_train['lang'] = 'en'
non_toxic_sentences = set()
for comment_text in tqdm(df_train['comment_text'], total=df.shape[0]):
non_toxic_sentences.update(nlp_transform.get_sentences(comment_text), 'en')
transform = AddNonToxicSentencesTransform(non_toxic_sentences=list(non_toxic_sentences), p=1.0, sentence_range=(1,2))
train_dataset = JigsawDataset(
df = df_train,
train_transforms = get_train_transforms()
)
train_data_loader = torch.utils.data.DataLoader(
train_dataset,
batch_size=config.TRAIN_BATCH_SIZE,
num_workers=4
)
valid_dataset = JigsawDataset(
df = df_valid,
)
valid_data_loader = torch.utils.data.DataLoader(
valid_dataset,
batch_size=config.VALID_BATCH_SIZE,
num_workers=1
)
device = torch.device(config.DEVICE)
model = BERTModel()
param_optimizer = list(model.named_parameters())
no_decay = ["bias", "LayerNorm.bias", "LayerNorm.weight"]
optimizer_parameters = [
{
"params": [
p for n, p in param_optimizer if not any(nd in n for nd in no_decay)
],
"weight_decay": 0.001,
},
{
"params": [
p for n, p in param_optimizer if any(nd in n for nd in no_decay)
],
"weight_decay": 0.0,
},
]
num_train_steps = int(len(df_train) / config.TRAIN_BATCH_SIZE * config.EPOCHS)
optimizer = AdamW(optimizer_parameters, lr=config.LEARNING_RATE)
criterion = nn.BCEWithLogitsLoss()
lr_finder = LRFinder(
model,
optimizer,
criterion,
device = config.DEVICE
)
custom_train_iter = CustomTrainIter(train_data_loader)
lr_finder.range_test(
custom_train_iter,
end_lr = 10,
num_iter = 100,
step_mode = "exp"
)
best_accuracy = 0
for epoch in range(config.EPOCHS):
print(f"----------EPOCH: {epoch}----------")
train_fn(train_data_loader, model, optimizer, device)
outputs, targets = eval_fn(valid_data_loader, model, device)
targets = np.array(targets) >= 0.5
accuracy = metrics.roc_auc_score(targets, outputs)
print(f"----------ROC AUC Score = {accuracy}----------")
print()
if accuracy > best_accuracy:
torch.save(model.state_dict(), config.MODEL_PATH)
best_accuracy = accuracy
if name == "main":
run()
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<timed exec> in <module>
<timed exec> in run()
/opt/conda/lib/python3.6/site-packages/torch_lr_finder/lr_finder.py in range_test(self, train_loader, val_loader, start_lr, end_lr, num_iter, step_mode, smooth_f, diverge_th, accumulation_steps, non_blocking_transfer)
318 train_iter,
319 accumulation_steps,
--> 320 non_blocking_transfer=non_blocking_transfer,
321 )
322 if val_loader:
/opt/conda/lib/python3.6/site-packages/torch_lr_finder/lr_finder.py in _train_batch(self, train_iter, accumulation_steps, non_blocking_transfer)
369 self.optimizer.zero_grad()
370 for i in range(accumulation_steps):
--> 371 inputs, labels = next(train_iter)
372 inputs, labels = self._move_to_device(
373 inputs, labels, non_blocking=non_blocking_transfer
/opt/conda/lib/python3.6/site-packages/torch_lr_finder/lr_finder.py in __next__(self)
57 try:
58 batch = next(self._iterator)
---> 59 inputs, labels = self.inputs_labels_from_batch(batch)
60 except StopIteration:
61 if not self.auto_reset:
/opt/conda/lib/python3.6/site-packages/torch_lr_finder/lr_finder.py in inputs_labels_from_batch(self, batch_data)
34 "Your batch type is not supported: {}. Please inherit from "
35 "`TrainDataLoaderIter` or `ValDataLoaderIter` and override the "
---> 36 "`inputs_labels_from_batch` method.".format(type(batch_data))
37 )
38
ValueError: Your batch type is not supported: <class 'dict'>. Please inherit from `TrainDataLoaderIter` or `ValDataLoaderIter` and override the `inputs_labels_from_batch` method.
How do I get the lr_finder to run multiple batches for each "iteration"? Logic being that running multiple batches would give a more precise result.
Based on the naming, I'd assumed that num_iter
in lr_finder.range_test()
would control the number of batches/iterations for each value of lr in the given range. However, num_iter
controls the number of unique lr values to test within the given interval, running only 1 batch through the network.
I've been following and making all the necessary changes required to run the lr_finder.range_test()
. However, I'm still facing this error!
Here's my code defining the Dataset class:
class HappyWhaleDataset(Dataset):
def __init__(self, df, transforms=None):
self.df = df
self.file_names = df['file_path'].values
self.labels = df['individual_id'].values
self.transforms = transforms
def __len__(self):
return len(self.df)
def __getitem__(self, index):
img_path = self.file_names[index]
img = cv2.imread(img_path)
img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
label = self.labels[index]
if self.transforms:
img = self.transforms(image=img)["image"]
return {
'image': img,
'label': torch.tensor(label, dtype=torch.long)
}
def prepare_loaders(df, fold):
df_train = df[df.kfold != fold].reset_index(drop=True)
df_valid = df[df.kfold == fold].reset_index(drop=True)
train_dataset = HappyWhaleDataset(df_train, transforms=data_transforms["train"])
valid_dataset = HappyWhaleDataset(df_valid, transforms=data_transforms["valid"])
train_loader = DataLoader(train_dataset, batch_size=CONFIG['train_batch_size'],
num_workers=2, shuffle=True, pin_memory=True, drop_last=True)
valid_loader = DataLoader(valid_dataset, batch_size=CONFIG['valid_batch_size'],
num_workers=2, shuffle=False, pin_memory=True)
return train_loader, valid_loader
train_loader, valid_loader = prepare_loaders(df, fold=0)
Note: Model training goes without error when I'm just creating a usual train_loader with the above code.
class CustomTrainIter(TrainDataLoaderIter):
def inputs_labels_from_batch(self, batch_data):
return batch_data["image"], batch_data["label"]
custom_loader = CustomTrainIter(train_loader)
lr_finder = LRFinder(model, optimizer, criterion, device=CONFIG['device'])
lr_finder.range_test(custom_loader, end_lr=1, num_iter=100, step_mode="linear")
lr_finder.plot(log_lr=False)
lr_finder.reset()
TypeError Traceback (most recent call last)
/tmp/ipykernel_34/1446799792.py in <module>
6
7 lr_finder = LRFinder(model, optimizer, criterion, device=CONFIG['device'])
----> 8 lr_finder.range_test(custom_loader, end_lr=1, num_iter=100, step_mode="linear")
9 lr_finder.plot(log_lr=False)
10 lr_finder.reset()
/opt/conda/lib/python3.7/site-packages/torch_lr_finder/lr_finder.py in range_test(self, train_loader, val_loader, start_lr, end_lr, num_iter, step_mode, smooth_f, diverge_th, accumulation_steps, non_blocking_transfer)
318 train_iter,
319 accumulation_steps,
--> 320 non_blocking_transfer=non_blocking_transfer,
321 )
322 if val_loader:
/opt/conda/lib/python3.7/site-packages/torch_lr_finder/lr_finder.py in _train_batch(self, train_iter, accumulation_steps, non_blocking_transfer)
375
376 # Forward pass
--> 377 outputs = self.model(inputs)
378 loss = self.criterion(outputs, labels)
379
/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
1049 if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
1050 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1051 return forward_call(*input, **kwargs)
1052 # Do not call functions when jit is used
1053 full_backward_hooks, non_full_backward_hooks = [], []
TypeError: forward() missing 1 required positional argument: 'labels'
hi davidtvs,
when use lr find, I find the loss curve is a little different when use different num_iter.
for example, image num is 6700, batch size is 252, when use num_iter=27, the loss decreases obviouslyfrom 1e-4 to 1e-3, but when num_iter =270, it decreases obviously from 1e-4 to 5e-4,when num_iter=540, it decreases obviously from 1e-4 to 8e-4, so , I am not sure which loss is correct?
@davidtvs
Thanks! it is a really good tool
Is it possible to make it also compatible when there's no dataloader? My dataset is fully loaded on memory.
I tried the LR finder with 100 and with 1000 iterations (all other parameters staying the same) and got very different recommendations for the LR - for 100 iterations it was 1.2e-3, and for 1000 iterations 3.3e-5. I tried training with both of these, and they don't produce optimal results compared to a more aggressive learning rate - 5e-2 (which is actually found by the 100 iterations to be the maximum LR).
What would be the ideal number of iterations that you would recommend? Would this number be calculated using model / dataset size in any way? I get the feeling that doing more iterations in general affects the finder, because the maximum learning rate is found faster. I did not look at the code, but I guess that the weights are not reset after each iteration - do you think it would be good to reset them, such that the previous iterations don't affect subsequent ones?
I have an optimization task that optimizes a single tensor by passing it through a set of transforms and then into the model. Losses are then calculated by using hooks attached to various model layers.
Is is possible to use this project for finding the optimal LR for my optimization task? The code looks like it requires a DataLoader instance.
Hi,
I would like to use the lr-finder with an LSTM. in the forward step of my model I do:
for epoch in range(100):
model.train()
hidden = model.init_hidden(batch_size)
total_loss = 0
for data, target in dataloader:
hidden = repackage_hidden(hidden)
output, hidden = model(data, hidden)
loss = loss_fn(output, target.view(-1))
optimizer.zero_grad()
loss.backward()
optimizer.step()
total_loss += loss.item()
My dataloader yields x, y where x is a sequence and y is the next step in the sequence (think language model), e.g.
data = [1, 2, 3, 4]
target = [2, 3, 4, 5]
Now when I try to do:
lr_finder.range_test(dataloader, end_lr=100, num_iter=100)
... I get the following error:
TypeError: forward() missing 1 required positional argument: 'hidden'
How can I pass hidden to the model using lr-finder?
I try to use:
class CustomTrainIter(TrainDataLoaderIter):
def inputs_labels_from_batch(self, batch_data):
return batch_data["img"], batch_data["target"]
to work with DataLoader for the lr_finder.range_test() but still got the error:
TypeError: list indices must be integers or slices, not str
TypeError Traceback (most recent call last)
<ipython-input-60-b2a8b27d6c88> in <module>()
3 optim = torch.optim.Adam(model_ft.parameters(), lr=1e-7, weight_decay=1e-2)
4 lr_finder = LRFinder(model_ft,optim, criterion, device='cuda')
----> 5 lr_finder.range_test( custom_train_iter ,end_lr=100,num_iter=100)
6 lr_finder.plot()
7 lr_finder.reset()
3 frames
/usr/local/lib/python3.7/dist-packages/torch_lr_finder/lr_finder.py in range_test(self, train_loader, val_loader, start_lr, end_lr, num_iter, step_mode, smooth_f, diverge_th, accumulation_steps, non_blocking_transfer)
318 train_iter,
319 accumulation_steps,
--> 320 non_blocking_transfer=non_blocking_transfer,
321 )
322 if val_loader:
/usr/local/lib/python3.7/dist-packages/torch_lr_finder/lr_finder.py in _train_batch(self, train_iter, accumulation_steps, non_blocking_transfer)
369 self.optimizer.zero_grad()
370 for i in range(accumulation_steps):
--> 371 inputs, labels = next(train_iter)
372 inputs, labels = self._move_to_device(
373 inputs, labels, non_blocking=non_blocking_transfer
/usr/local/lib/python3.7/dist-packages/torch_lr_finder/lr_finder.py in __next__(self)
57 try:
58 batch = next(self._iterator)
---> 59 inputs, labels = self.inputs_labels_from_batch(batch)
60 except StopIteration:
61 if not self.auto_reset:
<ipython-input-58-f89d28995874> in inputs_labels_from_batch(self, batch_data)
4
5
----> 6 return batch_data["img"], batch_data["target"]
7
8 custom_train_iter = CustomTrainIter(train_dl)
TypeError: list indices must be integers or slices, not str
Any suggestion ? thanks !
Hi, I am new to this package. I have done the things here. It gives me set of loss functions with their corresponding lr
. But I don't know how to find the best lr
. I would like to know if there is a method to automatically find the proposed best lr
within this package.
Great package! Thank you for sharing :)
LRFinder
with a larger batch size.Thanks for the work! I'd like to know if distributed training is supported, like the DataParallel and DistributedDataParallel, is it compatible in this work?
In manual of this library the examples are for a unified model but auto-encoders are made of two parts:encoder and decoder. How can I use this library for auto-encoder?
As mentioned in title, I'm running experiments on a headless server. Can't view matplotlib graphs. It'd be nice to be able to disable the graphical pop-up (which hangs system till it's closed) and instead save the generated plot to disk.
The plot could be named by user given string and/or time of generation.
If you prefer/are busy, I can generate a pull request for the same.
With optimizer = torch.optim.Adam( model.parameters(), lr = learning_rate, weight_decay = weight_decay)
criterion = nn.CrossEntropyLoss( weight = None, ignore_index = ignore_index, reduce = False)
and then executing
lr_finder = LRFinder(model, optimizer, criterion, device="cuda") lr_finder.range_test( dataLoader[ 'train'], end_lr=100, num_iter=100) lr_finder.plot() # to inspect the loss-learning rate graph lr_finder.reset() # to reset the model and optimizer to their initial state
I am getting the error,
RuntimeError: Expected object of scalar type Long but got scalar type Byte for argument #2 'target'
Please find below the whole trace.
So far training my models with the above optimizer and criterion I do not have any problem.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.