manojpamk / pytorch_xvectors Goto Github PK
View Code? Open in Web Editor NEWDeep speaker embeddings in PyTorch, including x-vectors. Code used in this work: https://arxiv.org/abs/2007.16196
License: MIT License
Deep speaker embeddings in PyTorch, including x-vectors. Code used in this work: https://arxiv.org/abs/2007.16196
License: MIT License
I used the kaldi/egs/dihard_2018/v2 recipe to make front-end processing,and get the MFCC feature for training X-vector model. Then I reproduce the code of training x-vector on pytorch, the x-vector model is totally same as yours. The PLDA and AHC procedure also never change. Then I used the trained x-vector model of pytorch version to test Diarization performance, and get the following resluts.
The first doesn't use the data augmentation, then get DER=27.38% on dihard2_dev.
Another version use the data augmentation(same as the v2 recipe),and only get DER=27.60%.
Theoretically, using data augmentation will improve performance, but the results are terrible. I don't know which part caused the problem,I need help.
Thanks sincerely!
hi, i follow your Google Drive link to download the pre-trained model, but when i decompress the file "checkpoint_step309.tar", the error is this file is not a tar file? could you upgrade the model or give me a new link? thank you very much @manojpamk
Hi! I am using a custom data which is represented as folder of .wav type files. I am curios how to create x-vector for each of the file in dataset. I tried using 'extract.py' and pytorch_run.sh
only with changing according dirnames and others.
But for now, I understood there must be some feature extraction and splitting processes first. So there is a help request of how to solve the problem above using your pretrained models (I use xvec_preTrained/checkpoint_step309.tar
from readme link).
Hi, thanks for your great contributions to speaker recognition. When I tried to tar -xvf checkpoint_step390.tar
. It shows
tar: This does not look like a tar archive
tar: Skipping to next header
tar: Exiting with failure status due to previous errors
Sorry to bother you, though is there anything wrong in your upload files?
Hi manoj,
Sorry to bother you again. I have followed your reply to continue the evaluation on AMI corpus.
First of all, I have dowload the dataset, and split the dev and test same as the paper: https://arxiv.org/pdf/1902.03190.pdf. Secondly, I used the kaldi "ami" recipe to make data preparation and got "segments" and "utt2spk" files. Thirdly and then,I used the two files to produce the "rttm" files.
I used those files and trained x-vector models to evaluate the DER performance. Surprisingly, when I used the PLDA model that trained by the subset of vox dataset, the DER on AMI dev and test both are 0. So, I checked the produced "rttm" files and found out that the problem. That is ,every utterance only has one single speaker. So, the result makes no sense. The specific details of files are as follows.
(1)segments
AMI_ES2011a_H00_FEE041_0003427_0003714 AMI_ES2011a_H00 34.27 37.14
AMI_ES2011a_H00_FEE041_0003714_0003915 AMI_ES2011a_H00 37.14 39.15
AMI_ES2011a_H00_FEE041_0003915_0004332 AMI_ES2011a_H00 39.15 43.32
AMI_ES2011a_H00_FEE041_0004332_0004439 AMI_ES2011a_H00 43.32 44.39
AMI_ES2011a_H00_FEE041_0004643_0004763 AMI_ES2011a_H00 46.43 47.63
AMI_ES2011a_H00_FEE041_0004763_0005020 AMI_ES2011a_H00 47.63 50.2
AMI_ES2011a_H00_FEE041_0005020_0005133 AMI_ES2011a_H00 50.2 51.33
AMI_ES2011a_H00_FEE041_0005133_0005553 AMI_ES2011a_H00 51.33 55.53
AMI_ES2011a_H00_FEE041_0005553_0005685 AMI_ES2011a_H00 55.53 56.85
AMI_ES2011a_H00_FEE041_0005856_0006217 AMI_ES2011a_H00 58.56 62.17
AMI_ES2011a_H00_FEE041_0006217_0006428 AMI_ES2011a_H00 62.17 64.28
AMI_ES2011a_H00_FEE041_0007704_0007898 AMI_ES2011a_H00 77.04 78.98
AMI_ES2011a_H00_FEE041_0007898_0008079 AMI_ES2011a_H00 78.98 80.79
AMI_ES2011a_H00_FEE041_0008298_0008364 AMI_ES2011a_H00 82.98 83.64
AMI_ES2011a_H00_FEE041_0008364_0008924 AMI_ES2011a_H00 83.64 89.24
AMI_ES2011a_H00_FEE041_0009602_0009635 AMI_ES2011a_H00 96.02 96.35
AMI_ES2011a_H00_FEE041_0009826_0010223 AMI_ES2011a_H00 98.26 102.23
.......
.......
(2)utt2spk
AMI_ES2011a_H00_FEE041_0003427_0003714 AMI_ES2011a_H00_FEE041
AMI_ES2011a_H00_FEE041_0003714_0003915 AMI_ES2011a_H00_FEE041
AMI_ES2011a_H00_FEE041_0003915_0004332 AMI_ES2011a_H00_FEE041
AMI_ES2011a_H00_FEE041_0004332_0004439 AMI_ES2011a_H00_FEE041
AMI_ES2011a_H00_FEE041_0004643_0004763 AMI_ES2011a_H00_FEE041
AMI_ES2011a_H00_FEE041_0004763_0005020 AMI_ES2011a_H00_FEE041
AMI_ES2011a_H00_FEE041_0005020_0005133 AMI_ES2011a_H00_FEE041
AMI_ES2011a_H00_FEE041_0005133_0005553 AMI_ES2011a_H00_FEE041
AMI_ES2011a_H00_FEE041_0005553_0005685 AMI_ES2011a_H00_FEE041
AMI_ES2011a_H00_FEE041_0005856_0006217 AMI_ES2011a_H00_FEE041
AMI_ES2011a_H00_FEE041_0006217_0006428 AMI_ES2011a_H00_FEE041
AMI_ES2011a_H00_FEE041_0007704_0007898 AMI_ES2011a_H00_FEE041
AMI_ES2011a_H00_FEE041_0007898_0008079 AMI_ES2011a_H00_FEE041
AMI_ES2011a_H00_FEE041_0008298_0008364 AMI_ES2011a_H00_FEE041
AMI_ES2011a_H00_FEE041_0008364_0008924 AMI_ES2011a_H00_FEE041
AMI_ES2011a_H00_FEE041_0009602_0009635 AMI_ES2011a_H00_FEE041
AMI_ES2011a_H00_FEE041_0009826_0010223 AMI_ES2011a_H00_FEE041
........
........
(3) rttm
SPEAKER AMI_ES2011a_H00 1 34.27 2.87 AMI_ES2011a_H00_FEE041
SPEAKER AMI_ES2011a_H00 1 37.14 2.01 AMI_ES2011a_H00_FEE041
SPEAKER AMI_ES2011a_H00 1 39.15 4.17 AMI_ES2011a_H00_FEE041
SPEAKER AMI_ES2011a_H00 1 43.32 1.07 AMI_ES2011a_H00_FEE041
SPEAKER AMI_ES2011a_H00 1 46.43 1.20 AMI_ES2011a_H00_FEE041
SPEAKER AMI_ES2011a_H00 1 47.63 2.57 AMI_ES2011a_H00_FEE041
SPEAKER AMI_ES2011a_H00 1 50.20 1.13 AMI_ES2011a_H00_FEE041
SPEAKER AMI_ES2011a_H00 1 51.33 4.20 AMI_ES2011a_H00_FEE041
SPEAKER AMI_ES2011a_H00 1 55.53 1.32 AMI_ES2011a_H00_FEE041
SPEAKER AMI_ES2011a_H00 1 58.56 3.61 AMI_ES2011a_H00_FEE041
SPEAKER AMI_ES2011a_H00 1 62.17 2.11 AMI_ES2011a_H00_FEE041
SPEAKER AMI_ES2011a_H00 1 77.04 1.94 AMI_ES2011a_H00_FEE041
SPEAKER AMI_ES2011a_H00 1 78.98 1.81 AMI_ES2011a_H00_FEE041
SPEAKER AMI_ES2011a_H00 1 82.98 0.66 AMI_ES2011a_H00_FEE041
SPEAKER AMI_ES2011a_H00 1 83.64 5.60 AMI_ES2011a_H00_FEE041
SPEAKER AMI_ES2011a_H00 1 96.02 0.33 AMI_ES2011a_H00_FEE041
SPEAKER AMI_ES2011a_H00 1 98.26 3.97 AMI_ES2011a_H00_FEE041
SPEAKER AMI_ES2011a_H00 1 102.23 5.54 AMI_ES2011a_H00_FEE041
........
........
(4) rttm ---used the best threshold -0.5
SPEAKER AMI_ES2011a_H00 1 34.270 10.120 1
SPEAKER AMI_ES2011a_H00 1 46.430 10.420 1
SPEAKER AMI_ES2011a_H00 1 58.560 5.720 1
SPEAKER AMI_ES2011a_H00 1 77.040 3.750 1
SPEAKER AMI_ES2011a_H00 1 82.980 6.260 1
SPEAKER AMI_ES2011a_H00 1 96.020 0.330 1
SPEAKER AMI_ES2011a_H00 1 98.260 9.510 1
SPEAKER AMI_ES2011a_H00 1 108.920 45.440 1
.........
.........
To fix the problem, I revised the "segments" ,"utt2spk" and "rttm" files. And to make sure that every audio has two or more speakers, I used the corresponding "*.Mix-Headset.wav" . But , I got the terribale DER results 52.17%. The specific details of files are as follows.
(1)segments
AMI_ES2011a_0003427_0003714 AMI_ES2011a 34.27 37.14
AMI_ES2011a_0003714_0003915 AMI_ES2011a 37.14 39.15
AMI_ES2011a_0003915_0004332 AMI_ES2011a 39.15 43.32
AMI_ES2011a_0004332_0004439 AMI_ES2011a 43.32 44.39
AMI_ES2011a_0004643_0004763 AMI_ES2011a 46.43 47.63
AMI_ES2011a_0004763_0005020 AMI_ES2011a 47.63 50.2
AMI_ES2011a_0005020_0005133 AMI_ES2011a 50.2 51.33
AMI_ES2011a_0005133_0005553 AMI_ES2011a 51.33 55.53
AMI_ES2011a_0005553_0005685 AMI_ES2011a 55.53 56.85
AMI_ES2011a_0005856_0006217 AMI_ES2011a 58.56 62.17
AMI_ES2011a_0006217_0006428 AMI_ES2011a 62.17 64.28
AMI_ES2011a_0006500_0007004 AMI_ES2011a 65.0 70.04
AMI_ES2011a_0007004_0007300 AMI_ES2011a 70.04 73.0
........
........
(2)utt2spk
AMI_ES2011a_0003427_0003714 AMI_ES2011a
AMI_ES2011a_0003714_0003915 AMI_ES2011a
AMI_ES2011a_0003915_0004332 AMI_ES2011a
AMI_ES2011a_0004332_0004439 AMI_ES2011a
AMI_ES2011a_0004643_0004763 AMI_ES2011a
AMI_ES2011a_0004763_0005020 AMI_ES2011a
AMI_ES2011a_0005020_0005133 AMI_ES2011a
AMI_ES2011a_0005133_0005553 AMI_ES2011a
AMI_ES2011a_0005553_0005685 AMI_ES2011a
AMI_ES2011a_0005856_0006217 AMI_ES2011a
AMI_ES2011a_0006217_0006428 AMI_ES2011a
AMI_ES2011a_0006500_0007004 AMI_ES2011a
AMI_ES2011a_0007004_0007300 AMI_ES2011a
........
........
(3)rttm
SPEAKER AMI_ES2011a 1 34.27 2.87 AMI_ES2011a_H00_FEE041
SPEAKER AMI_ES2011a 1 37.14 2.01 AMI_ES2011a_H00_FEE041
SPEAKER AMI_ES2011a 1 39.15 4.17 AMI_ES2011a_H00_FEE041
SPEAKER AMI_ES2011a 1 43.32 1.07 AMI_ES2011a_H00_FEE041
SPEAKER AMI_ES2011a 1 46.43 1.20 AMI_ES2011a_H00_FEE041
SPEAKER AMI_ES2011a 1 47.63 2.57 AMI_ES2011a_H00_FEE041
SPEAKER AMI_ES2011a 1 50.20 1.13 AMI_ES2011a_H00_FEE041
SPEAKER AMI_ES2011a 1 51.33 4.20 AMI_ES2011a_H00_FEE041
SPEAKER AMI_ES2011a 1 55.53 1.32 AMI_ES2011a_H00_FEE041
SPEAKER AMI_ES2011a 1 58.56 3.61 AMI_ES2011a_H00_FEE041
SPEAKER AMI_ES2011a 1 62.17 2.11 AMI_ES2011a_H00_FEE041
SPEAKER AMI_ES2011a 1 65.00 5.04 AMI_ES2011a_H03_FEE044
SPEAKER AMI_ES2011a 1 70.04 2.96 AMI_ES2011a_H03_FEE044
........
........
(4)rttm ---used the best threshold 0.2
SPEAKER AMI_ES2011a 1 34.270 2.870 3
SPEAKER AMI_ES2011a 1 37.140 1.125 4
SPEAKER AMI_ES2011a 1 38.265 0.885 3
SPEAKER AMI_ES2011a 1 39.150 1.125 4
SPEAKER AMI_ES2011a 1 40.275 0.750 3
SPEAKER AMI_ES2011a 1 41.025 0.750 4
SPEAKER AMI_ES2011a 1 41.775 0.750 2
SPEAKER AMI_ES2011a 1 42.525 0.795 3
SPEAKER AMI_ES2011a 1 43.320 1.070 4
SPEAKER AMI_ES2011a 1 46.430 1.200 3
SPEAKER AMI_ES2011a 1 47.630 1.125 4
SPEAKER AMI_ES2011a 1 48.755 2.575 3
SPEAKER AMI_ES2011a 1 51.330 1.125 4
SPEAKER AMI_ES2011a 1 52.455 3.075 3
SPEAKER AMI_ES2011a 1 55.530 1.320 4
SPEAKER AMI_ES2011a 1 58.560 1.875 4
SPEAKER AMI_ES2011a 1 60.435 0.750 3
SPEAKER AMI_ES2011a 1 61.185 3.095 4
SPEAKER AMI_ES2011a 1 65.000 4.125 4
SPEAKER AMI_ES2011a 1 69.125 0.915 3
SPEAKER AMI_ES2011a 1 70.040 7.740 4
........
........
I can't firgure out this problem. Could you give me some advice ?
And I would appreciate it if you can provide your "wav.scp"、"segments"、"utt2spk" and"rttm" files.
Yuan
@manojpamk
Hi, I was trying to train the model and it crashed at stage 6
Namespace(baseLR=0.001, batchSize=32, featDim=30, featDir='exp/xvector_nnet_1a/egs/', local_rank=0, logStepSize=200, maxLR=0.002, modelType='xvecTDNN', noiseEps=1e-05, numArchives=84, numEgsPerArk=366150, numEpochs=2, numSpkrs=7323, optimMomentum=0.5, pDropMax=0.2, preFetchRatio=30, preTrainedModelDir=None, protoEpisodesPerArk=25, protoMaxClasses=35, protoMinClasses=5, resumeModelDir=None, stepFrac=0.5, supportFrac=0.7, totalEpisodes=100, trainingMode='init')
Initializing Model..
Reading from archive 1
Traceback (most recent call last):
File "train_xent.py", line 69, in <module>
for _,(X, Y) in par_data_loader:
File "/home/tjw/anaconda3/envs/xvec/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 345, in __next__
data = self._next_data()
File "/home/tjw/anaconda3/envs/xvec/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 385, in _next_data
data = self._dataset_fetcher.fetch(index) # may raise StopIteration
File "/home/tjw/anaconda3/envs/xvec/lib/python3.6/site-packages/torch/utils/data/_utils/fetch.py", line 28, in fetch
data.append(next(self.dataset_iter))
File "/home/tjw/anaconda3/envs/xvec/lib/python3.6/site-packages/kaldi_python_io/inst.py", line 284, in __iter__
with ext_open(self.ark_or_pipe, "rb") as fd:
File "/home/tjw/anaconda3/envs/xvec/lib/python3.6/site-packages/kaldi_python_io/inst.py", line 106, in __enter__
self.fd = _fopen(self.fname, self.mode)
File "/home/tjw/anaconda3/envs/xvec/lib/python3.6/site-packages/kaldi_python_io/inst.py", line 79, in _fopen
"Could not find common file: {}".format(fname))
FileNotFoundError: Could not find common file: exp/xvector_nnet_1a/egs//egs.1.ark
I don't have this directory exp/xvector_nnet_1a. do you know what may cause this problem?
Hi,
I'm tying to train the speaker verification with voxceleb1&2 using 'pytorch_run.sh'
However, index error occurred in train_xent.py(stage 6)
I think the default script need 80 archives(numArchives), however I can get only 75 archives.
I can take only 1483600743 frames in stage 5, so only 75 archives were generated in sid/nnet3/xvector/get_egs.sh(stage 6)
"num_train_archives=$[($num_train_frames*$num_repeats)/$frames_per_iter + 1]" in sid/nnet3/xvector/get_egs.sh line 129
I wonder if it is correct to simply modify numArchives 80 to 75 is correct for reproducing.
thank you
Hi,
Recently I am trying to run your framework on my own corpus. However, while I am running the train_proto.py code, it required the HDF5 file to be the pytorch dataloader's input. I found that you didn't provide the convert_egs_to_hdf5.py (in your ,gitignore), can you please make the file available on your git repository. Thanks!
Hi and many thanks for this nice work. I'm trying to integrate this code into my project in Python to obtain embeddings from a given WAV file. From the source files I can easily get how you apply the network and get the embeddings. However, the nnet3 egs format that it's being read needs to be computed by kaldi... is there an option to preprocess the file with a pure python library? Could you document the exact shape of the MFCCs that the models expects? That way I may implement the feature extraction with librosa or another similar tool
Thank you in advance
pytorch_xvectors/train_xent.py
Line 91 in 350e4b5
Hello,
When I tried to iterate the dataLoadear (for x, _ in dataLoader:), it thows the error,
Traceback (most recent call last):
File "", line 1, in
File "/usr/local/lib/python3.6/dist-packages/torch/utils/data/dataloader.py", line 435, in next
data = self._next_data()
File "/usr/local/lib/python3.6/dist-packages/torch/utils/data/dataloader.py", line 475, in _next_data
data = self._dataset_fetcher.fetch(index) # may raise StopIteration
File "/usr/local/lib/python3.6/dist-packages/torch/utils/data/_utils/fetch.py", line 44, in fetch
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/usr/local/lib/python3.6/dist-packages/torch/utils/data/_utils/fetch.py", line 44, in
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/mnt/ricproject/ahilan-work/kaldi/egs/sre16/v2/train_utils.py", line 65, in getitem
X = self.feats[idx,:,:]
File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
File "/home/ahilan/.local/lib/python3.6/site-packages/h5py/_hl/dataset.py", line 777, in getitem
selection = sel.select(self.shape, args, dataset=self)
File "/home/ahilan/.local/lib/python3.6/site-packages/h5py/_hl/selections.py", line 82, in select
return selector.make_selection(args)
File "h5py/_selector.pyx", line 272, in h5py._selector.Selector.make_selection
File "h5py/_selector.pyx", line 213, in h5py._selector.Selector.apply_args
TypeError: Simple selection can't process 1453.0
I would be grateful to you if you could look into this. Thanks
Hi,
First of all, very nice work.
well done.
I want to ask you about your xvector implementation.
I am not sure if I have a misunderstanding of how The Conv1D with dilation > 1 works.
Why did you use kernel_size=5 for the second and kernel_size=7 for the third TDNN layers?
I would use kernel_size=3 for both of them with the same dilation you used.
Thank you
Gerardo
Hello,
Thanks for sharing the PYtorch code for embedding training.
If we look at thepytorch_xvectors/pytorch_run.sh,
CUDA_VISIBLE_DEVICES=0 python -m torch.distributed.launch --nproc_per_node=1
train_xent.py exp/xvector_nnet_1a/egs/
If we look at the above line,it seems like you are training the DNN on using single GPU. Is it possible to train using multiple gpus?
Further if we look at the train_utils.py script,
def prepareModel(args):
elif args.trainingMode == 'init':
net.to(device)
net = torch.nn.parallel.DistributedDataParallel(net,
device_ids=[0],
output_device=0)
if torch.cuda.device_count() > 1:
print("Using ", torch.cuda.device_count(), "GPUs!")
net = nn.DataParallel(net)
Why we are using both torch.nn.parallel.DistributedDataParallel and net = nn.DataParallel(net) ?
When I tried to train, it's training using single GPU. How it needs to modified to train on multiple gpus?
I look forward to hearing from you.
Thanks.
K. Ahilan
Hi, thank you for the nice work. When Im trying to recreate the experiment on my computer I encountered some failures:
$ bash pytorch_run.sh
utils/fix_data_dir.sh: file data/voxceleb2_train/utt2spk is not in sorted order or not unique, sorting it
utils/fix_data_dir.sh: file data/voxceleb2_train/spk2utt is not in sorted order or not unique, sorting it
utils/fix_data_dir.sh: file data/voxceleb2_train/wav.scp is not in sorted order or not unique, sorting it
fix_data_dir.sh: kept all 1092009 utterances.
fix_data_dir.sh: old files are kept in data/voxceleb2_train/.backup
utils/validate_data_dir.sh: Successfully validated data-directory data/voxceleb2_train
utils/fix_data_dir.sh: file data/voxceleb2_test/utt2spk is not in sorted order or not unique, sorting it
utils/fix_data_dir.sh: file data/voxceleb2_test/spk2utt is not in sorted order or not unique, sorting it
utils/fix_data_dir.sh: file data/voxceleb2_test/wav.scp is not in sorted order or not unique, sorting it
fix_data_dir.sh: kept all 36237 utterances.
fix_data_dir.sh: old files are kept in data/voxceleb2_test/.backup
utils/validate_data_dir.sh: Successfully validated data-directory data/voxceleb2_test
utils/fix_data_dir.sh: file data/voxceleb1_train/utt2spk is not in sorted order or not unique, sorting it
utils/fix_data_dir.sh: file data/voxceleb1_train/spk2utt is not in sorted order or not unique, sorting it
utils/fix_data_dir.sh: file data/voxceleb1_train/wav.scp is not in sorted order or not unique, sorting it
fix_data_dir.sh: kept all 148642 utterances.
fix_data_dir.sh: old files are kept in data/voxceleb1_train/.backup
utils/validate_data_dir.sh: Successfully validated data-directory data/voxceleb1_train
utils/fix_data_dir.sh: file data/voxceleb1_test/utt2spk is not in sorted order or not unique, sorting it
utils/fix_data_dir.sh: file data/voxceleb1_test/spk2utt is not in sorted order or not unique, sorting it
utils/fix_data_dir.sh: file data/voxceleb1_test/wav.scp is not in sorted order or not unique, sorting it
fix_data_dir.sh: kept all 4874 utterances.
fix_data_dir.sh: old files are kept in data/voxceleb1_test/.backup
utils/validate_data_dir.sh: Successfully validated data-directory data/voxceleb1_test
utils/combine_data.sh /home/tjw/pytorch_xvectors/data/train /home/tjw/pytorch_xvectors/data/voxceleb2_train /home/tjw/pytorch_xvectors/data/voxceleb2_test /home/tjw/pytorch_xvectors/data/voxceleb1_train
utils/combine_data.sh [info]: not combining utt2uniq as it does not exist
utils/combine_data.sh [info]: not combining segments as it does not exist
utils/combine_data.sh: combined utt2spk
utils/combine_data.sh [info]: not combining utt2lang as it does not exist
utils/combine_data.sh [info]: not combining utt2dur as it does not exist
utils/combine_data.sh [info]: not combining utt2num_frames as it does not exist
utils/combine_data.sh [info]: not combining reco2dur as it does not exist
utils/combine_data.sh [info]: not combining feats.scp as it does not exist
utils/combine_data.sh [info]: not combining text as it does not exist
utils/combine_data.sh [info]: not combining cmvn.scp as it does not exist
utils/combine_data.sh [info]: not combining vad.scp as it does not exist
utils/combine_data.sh [info]: not combining reco2file_and_channel as it does not exist
utils/combine_data.sh: combined wav.scp
utils/combine_data.sh [info]: not combining spk2gender as it does not exist
fix_data_dir.sh: kept all 1276888 utterances.
fix_data_dir.sh: old files are kept in /home/tjw/pytorch_xvectors/data/train/.backup
steps/make_mfcc.sh --write-utt2num-frames true --mfcc-config conf/mfcc.conf --nj 40 --cmd run.pl /home/tjw/pytorch_xvectors/data/train /home/tjw/pytorch_xvectors/exp/make_mfcc mfcc
utils/validate_data_dir.sh: Successfully validated data-directory /home/tjw/pytorch_xvectors/data/train
steps/make_mfcc.sh: [info]: no segments file exists: assuming wav.scp indexed by utterance.
run.pl: 40 / 40 failed, log is in /home/tjw/pytorch_xvectors/exp/make_mfcc/make_mfcc_train.*.log
fix_data_dir.sh: kept all 1276888 utterances.
fix_data_dir.sh: old files are kept in /home/tjw/pytorch_xvectors/data/train/.backup
sid/compute_vad_decision.sh --nj 40 --cmd run.pl /home/tjw/pytorch_xvectors/data/train exp/make_vad mfcc
compute_vad_decision.sh: no such file /home/tjw/pytorch_xvectors/data/train/feats.scp
fix_data_dir.sh: kept all 1276888 utterances.
fix_data_dir.sh: old files are kept in /home/tjw/pytorch_xvectors/data/train/.backup
steps/make_mfcc.sh --write-utt2num-frames true --mfcc-config conf/mfcc.conf --nj 40 --cmd run.pl /home/tjw/pytorch_xvectors/data/voxceleb1_test /home/tjw/pytorch_xvectors/exp/make_mfcc mfcc
utils/validate_data_dir.sh: Successfully validated data-directory /home/tjw/pytorch_xvectors/data/voxceleb1_test
steps/make_mfcc.sh: [info]: no segments file exists: assuming wav.scp indexed by utterance.
run.pl: 40 / 40 failed, log is in /home/tjw/pytorch_xvectors/exp/make_mfcc/make_mfcc_voxceleb1_test.*.log
fix_data_dir.sh: kept all 4874 utterances.
fix_data_dir.sh: old files are kept in /home/tjw/pytorch_xvectors/data/voxceleb1_test/.backup
sid/compute_vad_decision.sh --nj 40 --cmd run.pl /home/tjw/pytorch_xvectors/data/voxceleb1_test exp/make_vad mfcc
compute_vad_decision.sh: no such file /home/tjw/pytorch_xvectors/data/voxceleb1_test/feats.scp
fix_data_dir.sh: kept all 4874 utterances.
fix_data_dir.sh: old files are kept in /home/tjw/pytorch_xvectors/data/voxceleb1_test/.backup
awk: cannot open data/train/utt2num_frames (No such file or directory)
steps/data/reverberate_data_dir.py --rir-set-parameters 0.5, /home/tjw/pytorch_xvectors/RIRS_NOISES/simulated_rirs/smallroom/rir_list --rir-set-parameters 0.5, /home/tjw/pytorch_xvectors/RIRS_NOISES/simulated_rirs/mediumroom/rir_list --speech-rvb-probability 1 --pointsource-noise-addition-probability 0 --isotropic-noise-addition-probability 0 --num-replications 1 --source-sampling-rate 16000 data/train data/train_reverb
Number of RIRs is 40000
Traceback (most recent call last):
File "steps/data/reverberate_data_dir.py", line 682, in <module>
main()
File "steps/data/reverberate_data_dir.py", line 675, in main
max_noises_per_minute = args.max_noises_per_minute)
File "steps/data/reverberate_data_dir.py", line 433, in create_reverberated_copy
pointsource_noise_addition_probability, max_noises_per_minute)
File "steps/data/reverberate_data_dir.py", line 352, in generate_reverberated_wav_scp
speech_dur = durations[recording_id]
KeyError: 'id00012-21Uxsk56VDQ-00001'
cp: cannot stat 'data/train/vad.scp': No such file or directory
copy_data_dir.sh: no such file data/train_reverb/utt2spk
mv: cannot stat 'data/train_reverb.new': No such file or directory
steps/data/make_musan.sh --sampling-rate 16000 /home/tjw/pytorch_xvectors/musan data
steps/data/make_musan.py --use-vocals true --sampling-rate 16000 /home/tjw/pytorch_xvectors/musan data/musan
Preparing data/musan/musan...
In music directory, processed 645 files; 0 had missing wav data
In speech directory, processed 426 files; 0 had missing wav data
In noise directory, processed 930 files; 0 had missing wav data
utils/fix_data_dir.sh: file data/musan/utt2spk is not in sorted order or not unique, sorting it
utils/fix_data_dir.sh: file data/musan/wav.scp is not in sorted order or not unique, sorting it
fix_data_dir.sh: kept all 2001 utterances.
fix_data_dir.sh: old files are kept in data/musan/.backup
utils/subset_data_dir.sh: reducing #utt from 2001 to 645
utils/subset_data_dir.sh: reducing #utt from 2001 to 426
utils/subset_data_dir.sh: reducing #utt from 2001 to 930
fix_data_dir.sh: kept all 645 utterances.
fix_data_dir.sh: old files are kept in data/musan_music/.backup
fix_data_dir.sh: kept all 426 utterances.
fix_data_dir.sh: old files are kept in data/musan_speech/.backup
fix_data_dir.sh: kept all 930 utterances.
fix_data_dir.sh: old files are kept in data/musan_noise/.backup
utils/data/get_reco2dur.sh: obtaining durations from recordings
utils/data/get_reco2dur.sh: could not get recording lengths from sphere-file headers, using wav-to-duration
utils/data/get_reco2dur.sh: wav-to-duration is not on your path
utils/data/get_utt2dur.sh: segments file does not exist so getting durations from wave files
utils/data/get_utt2dur.sh: could not get utterance lengths from sphere-file headers, using wav-to-duration
utils/data/get_utt2dur.sh: wav-to-duration is not on your path
utils/data/get_utt2dur.sh: segments file does not exist so getting durations from wave files
utils/data/get_utt2dur.sh: could not get utterance lengths from sphere-file headers, using wav-to-duration
utils/data/get_utt2dur.sh: wav-to-duration is not on your path
utils/data/get_utt2dur.sh: segments file does not exist so getting durations from wave files
utils/data/get_utt2dur.sh: could not get utterance lengths from sphere-file headers, using wav-to-duration
utils/data/get_utt2dur.sh: wav-to-duration is not on your path
steps/data/augment_data_dir.py --utt-suffix noise --fg-interval 1 --fg-snrs 15:10:5:0 --fg-noise-dir data/musan_noise data/train data/train_noise
Traceback (most recent call last):
File "steps/data/augment_data_dir.py", line 298, in <module>
main()
File "steps/data/augment_data_dir.py", line 242, in main
dur = reco2dur[utt]
KeyError: 'id00012-21Uxsk56VDQ-00001'
steps/data/augment_data_dir.py --utt-suffix music --bg-snrs 15:10:8:5 --num-bg-noises 1 --bg-noise-dir data/musan_music data/train data/train_music
Traceback (most recent call last):
File "steps/data/augment_data_dir.py", line 298, in <module>
main()
File "steps/data/augment_data_dir.py", line 242, in main
dur = reco2dur[utt]
KeyError: 'id00012-21Uxsk56VDQ-00001'
steps/data/augment_data_dir.py --utt-suffix babble --bg-snrs 20:17:15:13 --num-bg-noises 3:4:5:6:7 --bg-noise-dir data/musan_speech data/train data/train_babble
Traceback (most recent call last):
File "steps/data/augment_data_dir.py", line 298, in <module>
main()
File "steps/data/augment_data_dir.py", line 242, in main
dur = reco2dur[utt]
KeyError: 'id00012-21Uxsk56VDQ-00001'
utils/combine_data.sh data/train_aug data/train_reverb data/train_noise data/train_music data/train_babble
utils/combine_data.sh: no such file data/train_reverb/utt2spk
utils/subset_data_dir.sh: no such file data/train_aug/utt2spk
utils/fix_data_dir.sh: no such file data/train_aug_1m/utt2spk
steps/make_mfcc.sh --mfcc-config conf/mfcc.conf --nj 40 --cmd run.pl data/train_aug_1m exp/make_mfcc mfcc
steps/make_mfcc.sh: no such file data/train_aug_1m/wav.scp
utils/combine_data.sh data/train_combined data/train_aug_1m data/train
utils/combine_data.sh: no such file data/train_aug_1m/utt2spk
local/nnet3/xvector/prepare_feats_for_egs.sh --nj 40 --cmd run.pl data/train_combined data/train_combined_no_sil exp/train_combined_no_sil
local/nnet3/xvector/prepare_feats_for_egs.sh: No such file data/train_combined/feats.scp
utils/fix_data_dir.sh: no such file data/train_combined_no_sil/utt2spk
local/nnet3/xvector/prepare_feats_for_egs.sh --nj 10 --cmd run.pl data/voxceleb1_test data/voxceleb1_test_no_sil exp/voxceleb1_test_no_sil
local/nnet3/xvector/prepare_feats_for_egs.sh: No such file data/voxceleb1_test/feats.scp
utils/fix_data_dir.sh: no such file data/voxceleb1_test_no_sil/utt2spk
mv: cannot stat 'data/train_combined_no_sil/utt2num_frames': No such file or directory
awk: cannot open data/train_combined_no_sil/utt2num_frames.bak (No such file or directory)
Can't open data/train_combined_no_sil/utt2spk: No such file or directory at utils/filter_scp.pl line 65.
fix_data_dir.sh: no utterances remained: not proceeding further.
fix_data_dir.sh: no utterances remained: not proceeding further.
sid/nnet3/xvector/get_egs.sh --cmd run.pl --nj 8 --stage 0 --frames-per-iter 1000000000 --frames-per-iter-diagnostic 100000 --min-frames-per-chunk 200 --max-frames-per-chunk 400 --num-diagnostic-archives 3 --num-repeats 50 data/train_combined_no_sil exp/xvector_nnet_1a/egs/
sid/nnet3/xvector/get_egs.sh: expected file data/train_combined_no_sil/feats.scp
Namespace(baseLR=0.001, batchSize=32, featDim=30, featDir='exp/xvector_nnet_1a/egs/', local_rank=0, logStepSize=200, maxLR=0.002, modelType='xvecTDNN', noiseEps=1e-05, numArchives=84, numEgsPerArk=366150, numEpochs=2, numSpkrs=7323, optimMomentum=0.5, pDropMax=0.2, preFetchRatio=30, preTrainedModelDir=None, protoEpisodesPerArk=25, protoMaxClasses=35, protoMinClasses=5, resumeModelDir=None, stepFrac=0.5, supportFrac=0.7, totalEpisodes=100, trainingMode='init')
Initializing Model..
Traceback (most recent call last):
File "train_xent.py", line 32, in <module>
net, optimizer, step, saveDir = prepareModel(args)
File "/home/tjw/pytorch_xvectors/train_utils.py", line 211, in prepareModel
output_device=0)
File "/home/tjw/anaconda3/envs/xvec/lib/python3.6/site-packages/torch/nn/parallel/distributed.py", line 250, in __init__
).format(device_ids, output_device, {p.device for p in module.parameters()})
AssertionError: DistributedDataParallel device_ids and output_device arguments only work with single-device CUDA modules, but got device_ids [0], output_device 0, and module parameters {device(type='cpu')}.
Traceback (most recent call last):
File "/home/tjw/anaconda3/envs/xvec/lib/python3.6/runpy.py", line 193, in _run_module_as_main
"__main__", mod_spec)
File "/home/tjw/anaconda3/envs/xvec/lib/python3.6/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/home/tjw/anaconda3/envs/xvec/lib/python3.6/site-packages/torch/distributed/launch.py", line 263, in <module>
main()
File "/home/tjw/anaconda3/envs/xvec/lib/python3.6/site-packages/torch/distributed/launch.py", line 259, in main
cmd=cmd)
subprocess.CalledProcessError: Command '['/home/tjw/anaconda3/envs/xvec/bin/python', '-u', 'train_xent.py', '--local_rank=0', 'exp/xvector_nnet_1a/egs/']' returned non-zero exit status 1.
Traceback (most recent call last):
File "extract.py", line 96, in <module>
main()
File "extract.py", line 39, in main
key=getSplitNum)[-1].split('/')[-1].lstrip('split'))
IndexError: list index out of range
Traceback (most recent call last):
File "extract.py", line 96, in <module>
main()
File "extract.py", line 39, in main
key=getSplitNum)[-1].split('/')[-1].lstrip('split'))
IndexError: list index out of range
run.pl: job failed, log is in xvectors/xvec_preTrained/train/log/compute_mean.log
run.pl: job failed, log is in xvectors/xvec_preTrained/train/log/lda.log
run.pl: job failed, log is in xvectors/xvec_preTrained/train/log/plda.log
run.pl: job failed, log is in xvectors/xvec_preTrained/test/log/voxceleb1_test_scoring.log
Traceback (most recent call last):
File "local/prepare_for_eer.py", line 12, in <module>
scores = open(sys.argv[2], 'r').readlines()
FileNotFoundError: [Errno 2] No such file or directory: 'xvectors/xvec_preTrained/test/scores_voxceleb1_test'
EER: %
minDCF(p-target=0.01):
minDCF(p-target=0.001):
Before running this I have modified 'voxceleb1_root' and 'voxceleb2_root' to my local dataset path, and set stage = 0.
The first failure seems to occur in run.pl when it outputs
run.pl: 40 / 40 failed, log is in /home/tjw/pytorch_xvectors/exp/make_mfcc/make_mfcc_train.*.log
And I looked at make_mfcc_train.1.log, it shows
# compute-mfcc-feats --write-utt2dur=ark,t:/home/tjw/pytorch_xvectors/exp/make_mfcc/utt2dur.17 --verbose=2 --config=conf/mfcc.conf scp,p:/home/tjw/pytorch_xvectors/exp/make_mfcc/wav_train.17.scp ark:- | copy-feats --write-num-frames=ark,t:/home/tjw/pytorch_xvectors/exp/make_mfcc/utt2num_frames.17 --compress=true ark:- ark,scp:/home/tjw/pytorch_xvectors/mfcc/raw_mfcc_train.17.ark,/home/tjw/pytorch_xvectors/mfcc/raw_mfcc_train.17.scp
# Started at Wed Dec 9 14:43:22 CST 2020
#
bash: line 1: compute-mfcc-feats: command not found
bash: line 1: copy-feats: command not found
# Accounting: time=0 threads=1
# Ended (code 127) at Wed Dec 9 14:43:22 CST 2020, elapsed time 0 seconds
Do you know what may cause this problem? Thanks in advance.
hi,
First of all thank you for your fantastic project!
I have no idea which mfcc.conf file used in your project, the conf/mfcc.conf is not in your project.
Hope your soon reply.
thanks!
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.