Comments (7)
The problem is solved.
This CUDA out-of-memory error may be caused by the version mismatch between espnet, pytorch, and fairseq.
I use the following versions:
- espnet==202301
- fairseq==0.12.2
- pytorch==1.8.1+cuda11.1
By specifying these versions, no errors exist.
If anyone wants to fine-tune wav2vec2 (used as the encoder), this issue might be helpful.
from espnet.
Can you try the following?
- Remove very long utterances
- Further reduce the batch size (while increasing accum_grad)
from espnet.
Can you try the following?
- Remove very long utterances
- Further reduce the batch size (while increasing accum_grad)
thanks. I do the following:
- I set "max_wav_duration" to 5 to remove very long utterances.
- I set "batch size" to 2.
- speech.size=torch.Size([1, 91498])
- I don't know why the first dimension of speech.size is 1 rather than 2.
- "accum_grad" is set to 5
I activate a new conda environment for espnet and re-make. pytorch=2.1.0; pytorch-cuda=12.1, fairseq=1.0.0a0+313ff05; espnet=202310.
the same error occurs, as follows:
- File "/home/rosie/espnet/espnet2/asr/espnet_model.py", line 261, in forward
encoder_out, encoder_out_lens = self.encoder(speech, speech_lengths)
File "/home/rosie/espnet/tools/miniconda/envs/espnet-2/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/rosie/espnet/tools/miniconda/envs/espnet-2/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "/home/rosie/espnet/espnet2/asr/encoder/wav2vec2_encoder.py", line 116, in forward
masks = make_pad_mask(ilens).to(xs_pad.device)
File "/home/rosie/espnet/espnet/nets/pytorch_backend/nets_utils.py", line 168, in make_pad_mask
return _make_pad_mask_traceable(lengths, xs, length_dim, maxlen)
File "/home/rosie/espnet/espnet/nets/pytorch_backend/nets_utils.py", line 249, in _make_pad_mask_traceable
mask = triu_onnx(mask)[1:, :-1] # onnx cannot handle diagonal argument.
File "/home/rosie/espnet/espnet/nets/pytorch_backend/nets_utils.py", line 261, in triu_onnx
return x * mask
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 7.80 GiB. GPU 0 has a total capacty of 23.69 GiB of which 7.38 GiB is free. Including non-PyTorch memory, this process has 16.30 GiB memory in use. Of the allocated memory 15.96 GiB is allocated by PyTorch, and 55.08 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
##Note that when I train a branchformer from stratch (encoder: branchformer), there is no errors. It seems this error only occurs when fine-tuning w2v2 (encoder: wav2vec2). Please help me.
from espnet.
speech.size=torch.Size([1, 91498])
Is 91498 the number of frames or the number of samples?
How long (second) does this audio?
##Note that when I train a branchformer from stratch (encoder: branchformer), there is no errors. It seems this error only occurs when fine-tuning w2v2 (encoder: wav2vec2). Please help me.
Did you train it with the same data?
from espnet.
speech.size=torch.Size([1, 91498])
Is 91498 the number of frames or the number of samples? How long (second) does this audio?##Note that when I train a branchformer from stratch (encoder: branchformer), there is no errors. It seems this error only occurs when fine-tuning w2v2 (encoder: wav2vec2). Please help me.
Did you train it with the same data?
hi, thanks for your reply.
- Is 91498 the number of frames or the number of samples? ——> this is the number of frames.
- How long (second) does this audio? ——> It is difficult to know. But I gass less than 10 seconds.
- Did you train it with the same data? ——> yes of course.
from espnet.
91498 is too long.
I recommend you check the front-end part to see if you're doing it correctly.
The maximum frame would be 2000 or so. Yours are too large.
from espnet.
91498 is too long. I recommend you check the front-end part to see if you're doing it correctly. The maximum frame would be 2000 or so. Yours are too large.
Thank you very much.
from espnet.
Related Issues (20)
- While trying to espnet2.bin.asr_inference import Speech2Text, "Namespace' object has no attribute 'token_list'" HOT 1
- failed to inference using whisper(./evaluate_asr.sh: invalid option --whisper_tag) HOT 2
- espent whisper inference use_streaming=true
- [ASR] Lack of optimization on BeamSearch HOT 2
- finetune whisper, preprocessor_conf, preprocessor_class, an unexpected keyword argument 'tokenizer_language' HOT 4
- Problems with install_phonemizer.sh
- Slue voxpopuli HOT 9
- Support for new typeguard version HOT 3
- How to properly prepare data for jTubeSpeech ASR training? HOT 1
- How to use conform Model for language translation training
- Inquiry Regarding ACE-Opencpop and KiSing-v2 Corpora Download HOT 3
- On-the-fly Noise Augmentation in ESPnet2 HOT 1
- egs2/TEMPLATE/asr1/asr.sh not passing feats_normalize flag to espnet2.bin.asr_train HOT 4
- Can not download covost2, could you please update it with new link?
- Availability of OWSM-CTC HOT 5
- unclear librispeech data prepare scripts for owsm_v1/s2t1 HOT 3
- [QUESTION] [TTS] 'num_elements_batch_sampler' loses the randomness of the samples HOT 1
- how to set padding_idx in conformer_ctc? set padding_idx = -1 may be wrong ? HOT 5
- Probably a bug in saving checkpoints and loading for inference HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from espnet.