GithubHelp home page GithubHelp logo

Comments (7)

WZLHQ avatar WZLHQ commented on May 27, 2024 1

The problem is solved.
This CUDA out-of-memory error may be caused by the version mismatch between espnet, pytorch, and fairseq.
I use the following versions:

  • espnet==202301
  • fairseq==0.12.2
  • pytorch==1.8.1+cuda11.1
    By specifying these versions, no errors exist.

If anyone wants to fine-tune wav2vec2 (used as the encoder), this issue might be helpful.

from espnet.

sw005320 avatar sw005320 commented on May 27, 2024

Can you try the following?

  • Remove very long utterances
  • Further reduce the batch size (while increasing accum_grad)

from espnet.

WZLHQ avatar WZLHQ commented on May 27, 2024

Can you try the following?

  • Remove very long utterances
  • Further reduce the batch size (while increasing accum_grad)

thanks. I do the following:

  • I set "max_wav_duration" to 5 to remove very long utterances.
  • I set "batch size" to 2.
    • speech.size=torch.Size([1, 91498])
    • I don't know why the first dimension of speech.size is 1 rather than 2.
  • "accum_grad" is set to 5

I activate a new conda environment for espnet and re-make. pytorch=2.1.0; pytorch-cuda=12.1, fairseq=1.0.0a0+313ff05; espnet=202310.

the same error occurs, as follows:

  • File "/home/rosie/espnet/espnet2/asr/espnet_model.py", line 261, in forward
    encoder_out, encoder_out_lens = self.encoder(speech, speech_lengths)
    File "/home/rosie/espnet/tools/miniconda/envs/espnet-2/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
    File "/home/rosie/espnet/tools/miniconda/envs/espnet-2/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
    File "/home/rosie/espnet/espnet2/asr/encoder/wav2vec2_encoder.py", line 116, in forward
    masks = make_pad_mask(ilens).to(xs_pad.device)
    File "/home/rosie/espnet/espnet/nets/pytorch_backend/nets_utils.py", line 168, in make_pad_mask
    return _make_pad_mask_traceable(lengths, xs, length_dim, maxlen)
    File "/home/rosie/espnet/espnet/nets/pytorch_backend/nets_utils.py", line 249, in _make_pad_mask_traceable
    mask = triu_onnx(mask)[1:, :-1] # onnx cannot handle diagonal argument.
    File "/home/rosie/espnet/espnet/nets/pytorch_backend/nets_utils.py", line 261, in triu_onnx
    return x * mask
    torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 7.80 GiB. GPU 0 has a total capacty of 23.69 GiB of which 7.38 GiB is free. Including non-PyTorch memory, this process has 16.30 GiB memory in use. Of the allocated memory 15.96 GiB is allocated by PyTorch, and 55.08 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

##Note that when I train a branchformer from stratch (encoder: branchformer), there is no errors. It seems this error only occurs when fine-tuning w2v2 (encoder: wav2vec2). Please help me.

from espnet.

sw005320 avatar sw005320 commented on May 27, 2024

speech.size=torch.Size([1, 91498])
Is 91498 the number of frames or the number of samples?
How long (second) does this audio?

##Note that when I train a branchformer from stratch (encoder: branchformer), there is no errors. It seems this error only occurs when fine-tuning w2v2 (encoder: wav2vec2). Please help me.

Did you train it with the same data?

from espnet.

WZLHQ avatar WZLHQ commented on May 27, 2024

speech.size=torch.Size([1, 91498]) Is 91498 the number of frames or the number of samples? How long (second) does this audio?

##Note that when I train a branchformer from stratch (encoder: branchformer), there is no errors. It seems this error only occurs when fine-tuning w2v2 (encoder: wav2vec2). Please help me.

Did you train it with the same data?

hi, thanks for your reply.

  • Is 91498 the number of frames or the number of samples? ——> this is the number of frames.
  • How long (second) does this audio? ——> It is difficult to know. But I gass less than 10 seconds.
  • Did you train it with the same data? ——> yes of course.

from espnet.

sw005320 avatar sw005320 commented on May 27, 2024

91498 is too long.
I recommend you check the front-end part to see if you're doing it correctly.
The maximum frame would be 2000 or so. Yours are too large.

from espnet.

WZLHQ avatar WZLHQ commented on May 27, 2024

91498 is too long. I recommend you check the front-end part to see if you're doing it correctly. The maximum frame would be 2000 or so. Yours are too large.

Thank you very much.

from espnet.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.