Comments (2)
It might be a hardware issue. On that node, I get now:
zeyer@cn-244 ~ % nvidia-smi
Unable to determine the device handle for GPU0000:03:00.0: Unknown Error
And dmesg
:
[ +0.000844] nvidia 0000:03:00.0: AER: can't recover (no error_detected callback)
[ +0.000001] snd_hda_intel 0000:03:00.1: AER: can't recover (no error_detected callback)
[ +0.000008] pcieport 0000:00:03.0: AER: device recovery failed
[ +0.000001] pcieport 0000:00:03.0: AER: Multiple Uncorrected (Fatal) error received: 0000:00:03.0
[ +0.000004] pcieport 0000:00:03.0: PCIe Bus Error: severity=Uncorrected (Fatal), type=Transaction Layer, (Requester ID)
[ +0.000898] pcieport 0000:00:03.0: device [8086:6f08] error status/mask=00004020/00000000
[ +0.000889] pcieport 0000:00:03.0: [ 5] SDES
[ +0.000894] pcieport 0000:00:03.0: [14] CmpltTO (First)
[ +0.000921] nvidia 0000:03:00.0: AER: can't recover (no error_detected callback)
[ +0.000002] snd_hda_intel 0000:03:00.1: AER: can't recover (no error_detected callback)
[ +1.050439] pcieport 0000:00:03.0: AER: Root Port link has been reset (0)
[ +0.000041] pcieport 0000:00:03.0: AER: device recovery failed
from espnet.
I was just looking at the code. There is:
mask_length = torch.randint(
mask_width_range[0],
mask_width_range[1],
(B, num_mask),
device=spec.device,
)
And it calls the function like this (as you see from the stacktrace):
line: return mask_along_axis(
spec,
spec_lengths,
mask_width_range=self.mask_width_range,
dim=self.dim,
num_mask=self.num_mask,
replace_with_zero=self.replace_with_zero,
)
locals:
spec = <local> tensor[6, 1220, 80] n=585600 (2.2Mb) x∈[-0.051, 0.252] μ=-0.051 σ=0.252 cuda:1
spec_lengths = <local> tensor[6] i32 x∈[1112, 1220] μ=1.194e+03 σ=40.913 [1203, 1206, 1207, 1218, 1220, 1112]
self = <local> MaskAlongAxis(mask_width_range=[0, 27], num_mask=2, axis=freq)
self.mask_width_range = <local> [0, 27]
self.dim = <local> 2
self.num_mask = <local> 2
self.replace_with_zero = <local> True
So, mask_length
should have values in between 0 and 26 (inclusive).
But then you see later in the stacktrace:
mask_length = <local> tensor[6, 2, 1] i64 n=12 x∈[-4825293490701537652, 4474139002465713060] μ=-9.419e+17 σ=4.744e+18 cuda:1
So, I guess it's clear that this is some hardware issue. So I guess we can close this.
from espnet.
Related Issues (20)
- Error while using auxiliary CTC objective for multilingual ASR HOT 5
- Installation cannot be proceeded properly with latest/default python version on Honda HOT 3
- Cannot find CUDA after installation under CUDA 11.2 HOT 3
- How to asr_inference using LoRA HOT 6
- EEND-EDA Der's performance in libri2mix is much higher than EEND-SS's paper HOT 9
- Request for training log HOT 3
- While trying to espnet2.bin.asr_inference import Speech2Text, "Namespace' object has no attribute 'token_list'" HOT 1
- failed to inference using whisper(./evaluate_asr.sh: invalid option --whisper_tag) HOT 2
- espent whisper inference use_streaming=true
- [ASR] Lack of optimization on BeamSearch HOT 2
- CUDA out of memory appears when fine-tuning wav2vec2-base model HOT 7
- finetune whisper, preprocessor_conf, preprocessor_class, an unexpected keyword argument 'tokenizer_language' HOT 4
- Problems with install_phonemizer.sh
- Slue voxpopuli HOT 9
- Support for new typeguard version HOT 3
- How to properly prepare data for jTubeSpeech ASR training? HOT 1
- How to use conform Model for language translation training
- Inquiry Regarding ACE-Opencpop and KiSing-v2 Corpora Download HOT 3
- On-the-fly Noise Augmentation in ESPnet2 HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from espnet.