Hi When fine-tuning the wav2vec2-base model, CUDA out of memory error appears as f

CUDA out of memory appears when fine-tuning wav2vec2-base model about espnet HOT 7 CLOSED

WZLHQ commented on May 27, 2024

CUDA out of memory appears when fine-tuning wav2vec2-base model

from espnet.

Comments (7)

WZLHQ commented on May 27, 2024 1

The problem is solved.
This CUDA out-of-memory error may be caused by the version mismatch between espnet, pytorch, and fairseq.
I use the following versions:

espnet==202301
fairseq==0.12.2
pytorch==1.8.1+cuda11.1
By specifying these versions, no errors exist.

If anyone wants to fine-tune wav2vec2 (used as the encoder), this issue might be helpful.

from espnet.

sw005320 commented on May 27, 2024

Can you try the following?

Remove very long utterances
Further reduce the batch size (while increasing accum_grad)

from espnet.

WZLHQ commented on May 27, 2024

Can you try the following?

Remove very long utterances

Further reduce the batch size (while increasing accum_grad)

thanks. I do the following:

I set "max_wav_duration" to 5 to remove very long utterances.
I set "batch size" to 2.
- speech.size=torch.Size([1, 91498])
- I don't know why the first dimension of speech.size is 1 rather than 2.
"accum_grad" is set to 5

I activate a new conda environment for espnet and re-make. pytorch=2.1.0; pytorch-cuda=12.1, fairseq=1.0.0a0+313ff05; espnet=202310.

the same error occurs, as follows:

File "/home/rosie/espnet/espnet2/asr/espnet_model.py", line 261, in forward
encoder_out, encoder_out_lens = self.encoder(speech, speech_lengths)
File "/home/rosie/espnet/tools/miniconda/envs/espnet-2/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/rosie/espnet/tools/miniconda/envs/espnet-2/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "/home/rosie/espnet/espnet2/asr/encoder/wav2vec2_encoder.py", line 116, in forward
masks = make_pad_mask(ilens).to(xs_pad.device)
File "/home/rosie/espnet/espnet/nets/pytorch_backend/nets_utils.py", line 168, in make_pad_mask
return _make_pad_mask_traceable(lengths, xs, length_dim, maxlen)
File "/home/rosie/espnet/espnet/nets/pytorch_backend/nets_utils.py", line 249, in _make_pad_mask_traceable
mask = triu_onnx(mask)[1:, :-1] # onnx cannot handle diagonal argument.
File "/home/rosie/espnet/espnet/nets/pytorch_backend/nets_utils.py", line 261, in triu_onnx
return x * mask
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 7.80 GiB. GPU 0 has a total capacty of 23.69 GiB of which 7.38 GiB is free. Including non-PyTorch memory, this process has 16.30 GiB memory in use. Of the allocated memory 15.96 GiB is allocated by PyTorch, and 55.08 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

##Note that when I train a branchformer from stratch (encoder: branchformer), there is no errors. It seems this error only occurs when fine-tuning w2v2 (encoder: wav2vec2). Please help me.

from espnet.

sw005320 commented on May 27, 2024

speech.size=torch.Size([1, 91498])
Is 91498 the number of frames or the number of samples?
How long (second) does this audio?

##Note that when I train a branchformer from stratch (encoder: branchformer), there is no errors. It seems this error only occurs when fine-tuning w2v2 (encoder: wav2vec2). Please help me.

Did you train it with the same data?

from espnet.

WZLHQ commented on May 27, 2024

speech.size=torch.Size([1, 91498]) Is 91498 the number of frames or the number of samples? How long (second) does this audio?

##Note that when I train a branchformer from stratch (encoder: branchformer), there is no errors. It seems this error only occurs when fine-tuning w2v2 (encoder: wav2vec2). Please help me.

Did you train it with the same data?

hi, thanks for your reply.

Is 91498 the number of frames or the number of samples? ——> this is the number of frames.
How long (second) does this audio? ——> It is difficult to know. But I gass less than 10 seconds.
Did you train it with the same data? ——> yes of course.

from espnet.

sw005320 commented on May 27, 2024

91498 is too long.
I recommend you check the front-end part to see if you're doing it correctly.
The maximum frame would be 2000 or so. Yours are too large.

from espnet.

WZLHQ commented on May 27, 2024

91498 is too long. I recommend you check the front-end part to see if you're doing it correctly. The maximum frame would be 2000 or so. Yours are too large.

Thank you very much.

from espnet.

CUDA out of memory appears when fine-tuning wav2vec2-base model about espnet HOT 7 CLOSED

Comments (7)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs