Hi~, I meet some problem while running your code on GPU. During training, the program

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Increasingly occupied GPU memory about awd-lstm-lm HOT 6 OPEN

LeePleased commented on June 12, 2024 1

Increasingly occupied GPU memory

from awd-lstm-lm.

Comments (6)

S-Abdelnabi commented on June 12, 2024 2

Hi,

Have you found a fix to this? I am having a similar issue. It was working on pytorch 0.4.1, the compact weight warning was displayed once at the beginning only, and it continues normally until the end of the training.
However, I updated to pytorch 1.2 and I am facing the same issue as yours. The warning is displayed at each call of the forward, and it stops training with OOM after around 100 epochs. I tried to call flatten_parameters() at the forward function of WeightDrop class. But I still get the warning.

Thanks a lot.

from awd-lstm-lm.

rewicks commented on June 12, 2024 1

Also having this issue. I don't think it's related to the flatten_parameters() warnings. It seems to be correlated with the optimizer--specifically, the memory usage only starts to increase after it is switched to ASGD.

from awd-lstm-lm.

AndreaLK3 commented on June 12, 2024 1

I have found a solution. If it works for others as well, this issue can be closed.

I have modified the ASGD optimizer using @mourga's port of AWD-LSTM for PyTorch 1.2.0, from: https://github.com/mourga/awd-lstm-lm

In particular, in main.py, you have to replace:

lines 243-245 with :

for prm in model.parameters():
       if prm in optimizer.state.keys():
                    tmp[prm] = prm.data.detach()
                    prm.data = optimizer.state[prm]['ax'].detach()

lines 259-260 with:

for prm in model.parameters():
      if prm in tmp.keys():
             prm.data = tmp[prm].detach()
             prm.requires_grad = True
del tmp

from awd-lstm-lm.

LeePleased commented on June 12, 2024

I guess it may relate to RNN compact weight, but I don't know how to fix it.

from awd-lstm-lm.

AndreaLK3 commented on June 12, 2024

@rewicks good call, the memory usage increases only with the ASGD optimizer. I think I have found the problem with it, but I am not sure how to solve it.

I printed the tensors living in memory using the GPU memory profiling code mentioned at https://discuss.pytorch.org/t/how-to-debug-causes-of-gpu-memory-leaks/6741/3 , and used the PyCharm debugger to see the variables during training.

The ASGD optimizer is an object that contains:

defaults: its default settings
param_groups = list containing 1 dictionary, with the hyperparameters ‘lr’, ‘alpha’, ‘lambd’, ‘t0’, ‘weight_decay’ and ’params’=a list of 14 Parameters w/Tensors
state = defaultdict:20 {Parameter containing Tensor, Parameter containing Tensor, etc etc.}

As the epochs go on and on, optimizer.state will contain 20,23,26,29,... (un-named) Tensors.
My hypothesis:

either the ASGD averages over all the previous epochs, and thus eventually breaks memory
or, more likely, the Tensors with the past gradients are never de-allocated from memory, we always allocate into new ones

Should we change the t0 parameter, increasing it by 1 each epoch? Or should we delete manually tensors from optimizer.state?
I would like to hear your opinions, and possibly from the authors as well - although maybe they didn't have this problem because they did not have resource constraints (I break memory on a GPU that has 10GB of memory)

from awd-lstm-lm.

zhilizju commented on June 12, 2024

I have found a solution. If it works for others as well, this issue can be closed.

I have modified the ASGD optimizer using @mourga's port of AWD-LSTM for PyTorch 1.2.0, from: https://github.com/mourga/awd-lstm-lm

In particular, in main.py, you have to replace:

lines 243-245 with :
for prm in model.parameters():
       if prm in optimizer.state.keys():
                    tmp[prm] = prm.data.detach()
                    prm.data = optimizer.state[prm]['ax'].detach()
lines 259-260 with:
for prm in model.parameters():
      if prm in tmp.keys():
             prm.data = tmp[prm].detach()
             prm.requires_grad = True
del tmp

hi, @AndreaLK3. It works for me as well. However, I don‘t achieve the perplexities that this instruction declares.

The instruction below trains a PTB model that without finetuning achieves perplexities of approximately 61.2 / 58.8 (validation / testing) .python main.py --batch_size 20 --data data/penn --dropouti 0.4 --dropouth 0.25 --seed 141 --epoch 500 --save PTB.pt

I just achieve perplexities of 64.74/62.23(validation/testing) with the same command.
My torch version is 1.5.0 and cuda version is 10.1.
I'd like to know your experiment result and your advice.

from awd-lstm-lm.

Increasingly occupied GPU memory about awd-lstm-lm HOT 6 OPEN

Comments (6)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs