santi-pdp / pase Goto Github PK

Problem Agnostic Speech Encoder

License: MIT License

Python 69.32% PHP 0.88% C++ 0.14% Assembly 0.05% Shell 11.95% Perl 17.67%

deep-learning waveform-analysis pytorch unsupervised-learning multi-task-learning speech-processing self-supervised-learning

pase's People

Contributors

Stargazers

Watchers

Forkers

jfsantos ominux marcinwal stjordanis zhyoung24 templeblock ruohoruotsi rpersie shamanez pswietojanski zge dmitriy-serdyuk dendisuhubdy jianyuanzhong malarinv afd77 lessonxmk tombarti ryul99 lanhhv84 sameerkhurana10 abhinav95 yash-s20 wyn314 maisyzhang lbxcfx kuonanhong shangeth tejamoy hsouporto avinashshah099 dailyactie elnazsn1988 yyht railsloes ahmed-fau gggyyyee mnabihali manik-hossain islammohamedmosaad hudsonhuang alishahin reinholdm tvuong123 yfliao edwarddixon 5l1v3r1 dankwartrustow shiyuzh2007 librence ananddb90 mustafa-erden xiaofei-wang shuaijiang jianweisun007 hiyoung-asr chandanka90 xixirupan dwangf0 5iding soar-sir huukim136 jzcruiser aditya-shahh wlmsoft dzdydx aleixlahoz kachio vanova s3prl jordiluque ishine swhan9873 dungnguyen83 b06901052 shleee47 akiskefalas zqs01 nicholaslea salmedina benoitwang kkho9654 ramfalas choieastsea chentsizhen

pase's Issues

Unable to load models

Hello, thank you for PASE library

I am unable find model check points from this repo. Is there any repo we should refer to?

Thank you.

No module named 'ahoproc_tools'

When running make_trainset_statistics.py,

The following error pops up:
Traceback (most recent call last): File "make_trainset_statistics.py", line 5, in <module> from pase.transforms import * File "/root/sharedfolder/pase/pase/transforms.py", line 9, in <module> from ahoproc_tools.interpolate import interpolation ModuleNotFoundError: No module named 'ahoproc_tools'

I did a pip install git+https://github.com/santi-pdp/ahoproc_tools@master and it seems to work.

Suggested solution:
Should include ahoproc_tools under your depencies.

Invalid floating-point option Error

I wanted to try out the PASE Kaldi ASR experiment on the TIMIT dataset. The initial training phase was successful. However, during the decoding phase, the code abruptly ended with the following message:

run.pl: job failed, log is in /Project0550/zhiyang/pase/out_folder/dec/scoring/log/best_path.1.1.log

Opening the file:

# lattice-align-phones /Project0550/zhiyang/pase/out_folder/dec/../final.mdl "ark:gunzip -c /Project0550/zhiyang/pase/out_folder/dec/lat.1.gz|" ark:- | lattice-to-ctm-conf --acoustic-scale= --lm-scale=1.0 ark:- /Project0550/zhiyang/pase/out_folder/dec/scoring/1.1.ctm 
# Started at Mon Oct  5 01:28:00 +08 2020
#
lattice-align-phones /Project0550/zhiyang/pase/out_folder/dec/../final.mdl 'ark:gunzip -c /Project0550/zhiyang/pase/out_folder/dec/lat.1.gz|' ark:- 
ERROR (lattice-to-ctm-conf[5.5.809~2-484f57]:ToFloat():parse-options.cc:605) Invalid floating-point option ""

[ Stack-Trace: ]
/Project0550/zhiyang/kaldi/src/lib/libkaldi-base.so(kaldi::MessageLogger::LogMessage() const+0x82c) [0x7fef2a8a62ca]
lattice-to-ctm-conf(kaldi::MessageLogger::LogAndThrow::operator=(kaldi::MessageLogger const&)+0x21) [0x419dc7]
/Project0550/zhiyang/kaldi/src/lib/libkaldi-util.so(kaldi::ParseOptions::ToFloat(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)+0x97) [0x7fef2aad0079]
/Project0550/zhiyang/kaldi/src/lib/libkaldi-util.so(kaldi::ParseOptions::SetOption(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, bool)+0x3fc) [0x7fef2aad3db0]
/Project0550/zhiyang/kaldi/src/lib/libkaldi-util.so(kaldi::ParseOptions::Read(int, char const* const*)+0x37e) [0x7fef2aad4a66]
lattice-to-ctm-conf(main+0x37f) [0x4176b5]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf0) [0x7fef299f9830]
lattice-to-ctm-conf(_start+0x29) [0x417269]

kaldi::KaldiFatalError# Accounting: time=1 threads=1
# Ended (code 255) at Mon Oct  5 01:28:01 +08 2020, elapsed time 1 seconds

Any ideas?

input file to segment the dataset

Hi, how is the following file generated?
https://raw.githubusercontent.com/santi-pdp/pase/master/data/libri_all_tr.lst

This is the input file while calling the /data/prep/prepare_segmented_dataset_libri.py file.

Load pretrrained SincNet

How can I load pertained SincNet weight to your code (it was trained from official repo)?
I have tried to get it from code but failed.
Thanks in advance

Create chime5segment

Hi @santi-pdp

I have tried to run chime5_utils.py to create chime5segment, but I cannot load data

Could you please give some instructions on how to run this file? and explain two commands
#train_worn = '/disks/data1/pawel/repos/kaldi/egs/chime5/s5/data/train_worn_stereo'
#train_dist = '/disks/data1/pawel/repos/kaldi/egs/chime5/s5/data/train_uall'

Regarding speaker recognition in PASE

I'm a bit confused about the experimental setup for speaker recognition described in the original PASE paper. If my understanding is correct, only 15s * 2484, or 10.35 hours of speech from librispeech is used for pretraining, and only 11s * 109, or 0.33 hours of speech from VCTK is used for fine-tuning. Both numbers seem awfully small...

Retrain PASE with dataset of different sample rate wave files

I am intending to retrain the PASE model with my collected dataset. However, my dataset is collected from different sources so it has different sample rates. After investigated your source code, I see that sometimes you load wave with native sample rate, sometimes resampled to 16000 Khz. Please tell me if I should modify the source code to always loading wave with 16000 Khz. Thanks you a lot

Audio buffer and Padding size problems

Hi,
I am trying to train pase model from scratch and I get the following two errors, Audio buffer is not finite everywhere and Padding size should be less than the corresponding input dimension , while training the model.
To fix the first problem, I tried to add np.nan_to_num(y) before the 706th, but I think this trial is not a good solution.
I have no idea to two problems.
Any suggestion?

Audio buffer is not finite everywhere

Traceback (most recent call last):
File "train.py", line 465, in
train(opts)
File "train.py", line 333, in train
Trainer.train_(dloader, device=device, valid_dataloader=va_dloader)
File "/home/teinhonglo/pase/pase/models/WorkerScheduler/trainer.py", line 223, in train_
batch = next(iterator)
File "/usr/local/bin/.local/lib/python3.5/site-packages/torch/utils/data/dataloader.py", line 345, in next
data = self._next_data()
File "/usr/local/bin/.local/lib/python3.5/site-packages/torch/utils/data/dataloader.py", line 838, in _next_data
return self._process_data(data)
File "/usr/local/bin/.local/lib/python3.5/site-packages/torch/utils/data/dataloader.py", line 881, in _process_data
data.reraise()
File "/usr/local/bin/.local/lib/python3.5/site-packages/torch/_utils.py", line 394, in reraise
raise self.exc_type(msg)
librosa.util.exceptions.ParameterError: Caught ParameterError in DataLoader worker process 8.
Original Traceback (most recent call last):
File "/usr/local/bin/.local/lib/python3.5/site-packages/torch/utils/data/_utils/worker.py", line 178, in _worker_loop
data = fetcher.fetch(index)
File "/usr/local/bin/.local/lib/python3.5/site-packages/torch/utils/data/_utils/fetch.py", line 44, in fetch
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/usr/local/bin/.local/lib/python3.5/site-packages/torch/utils/data/_utils/fetch.py", line 44, in
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/home/teinhonglo/pase/pase/dataset.py", line 492, in getitem
pkg = self.transform(pkg)
File "/usr/local/bin/.local/lib/python3.5/site-packages/torchvision/transforms/transforms.py", line 70, in call
img = t(img)
File "/home/teinhonglo/pase/pase/transforms.py", line 706, in call
hop_length=self.hop,
File "/usr/local/bin/.local/lib/python3.5/site-packages/librosa/feature/spectral.py", line 1442, in mfcc
S = power_to_db(melspectrogram(y=y, sr=sr, **kwargs))
File "/usr/local/bin/.local/lib/python3.5/site-packages/librosa/feature/spectral.py", line 1531, in melspectrogram
power=power)
File "/usr/local/bin/.local/lib/python3.5/site-packages/librosa/core/spectrum.py", line 1557, in _spectrogram
S = np.abs(stft(y, n_fft=n_fft, hop_length=hop_length))**power
File "/usr/local/bin/.local/lib/python3.5/site-packages/librosa/core/spectrum.py", line 161, in stft
util.valid_audio(y)
File "/usr/local/bin/.local/lib/python3.5/site-packages/librosa/util/utils.py", line 170, in valid_audio
raise ParameterError('Audio buffer is not finite everywhere')
librosa.util.exceptions.ParameterError: Audio buffer is not finite everywhere

Padding size should be less than the corresponding input dimension

Epoch 0/10: 5%|#####3 | 242/5205 [05:53<3:40:07, 2.66s/it]
Traceback (most recent call last):
File "train.py", line 465, in
train(opts)
File "train.py", line 333, in train
Trainer.train_(dloader, device=device, valid_dataloader=va_dloader)
File "/home/teinhonglo/pase/pase/models/WorkerScheduler/trainer.py", line 223, in train_
batch = next(iterator)
File "/usr/local/bin/.local/lib/python3.5/site-packages/torch/utils/data/dataloader.py", line 345, in next
data = self._next_data()
File "/usr/local/bin/.local/lib/python3.5/site-packages/torch/utils/data/dataloader.py", line 838, in _next_data
return self._process_data(data)
File "/usr/local/bin/.local/lib/python3.5/site-packages/torch/utils/data/dataloader.py", line 881, in _process_data
data.reraise()
File "/usr/local/bin/.local/lib/python3.5/site-packages/torch/_utils.py", line 394, in reraise
raise self.exc_type(msg)
RuntimeError: Caught RuntimeError in DataLoader worker process 2.
Original Traceback (most recent call last):
File "/usr/local/bin/.local/lib/python3.5/site-packages/torch/utils/data/_utils/worker.py", line 178, in _worker_loop
data = fetcher.fetch(index)
File "/usr/local/bin/.local/lib/python3.5/site-packages/torch/utils/data/_utils/fetch.py", line 44, in fetch
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/home/teinhonglo/.local/lib/python3.5/site-packages/torch/utils/data/_utils/fetch.py", line 44, in
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/home/teinhonglo/pase/pase/dataset.py", line 492, in getitem
pkg = self.transform(pkg)
File "/usr/local/bin/.local/lib/python3.5/site-packages/torchvision/transforms/transforms.py", line 70, in call
img = t(img)
File "/home/teinhonglo/pase/pase/transforms.py", line 427, in call
pkg['chunk_rand'] = self.select_chunk(raw_rand)
File "/home/teinhonglo/pase/pase/transforms.py", line 317, in select_chunk
mode=self.pad_mode).view(-1)
File "/usr/local/bin/.local/lib/python3.5/site-packages/torch/nn/functional.py", line 2868, in pad
return torch._C._nn.reflection_pad1d(input, pad)
RuntimeError: Argument #4: Padding size should be less than the corresponding input dimension, but got: padding (0, 28656) at dimension 2 of input [1, 1, 3344]

Fine tune the pre-trained weights

Hey

I want to import the pre-trained weights and then fine tune on a different language speech data. The repo has steps to train the model from scratch.

I want to fine-tune the already trained encoder-workers model on my speech data, so that I can get better features. Importing just the encoder as nn.Module, won't let me use the workers, and I won't be able to extract features the (self/)unsupervised learning way.

How should I do it?

[EDIT] Also, e.g., I trained the PASE+ model on dataset1. Now after training, get more speech data for training (dataset2). In such a case, I would want to 'fine-tune' the model trained on dataset1, using this new dataset2. And not want to train from scratch on dataset2 + dataset1.

Argument #4: Padding size should be less than the corresponding input dimension, but got: padding (0, 10416) at dimension 2 of input [1, 1, 5584]

The problem occurs when running the command python make_trainset_statistics.py --net_cfg cfg/workers/workers+.cfg ...
{'regr': [{'num_outputs': 1, 'dropout': 0, 'dropout_time': 0.0, 'hidden_layers': 1, 'name': 'cchunk', 'type': 'decoder', 'hidden_size': 64, 'fmaps': [512, 256, 128], 'strides': [4, 4, 10], 'kwidths': [30, 30, 30], 'loss': <pase.losses.ContextualizedLoss object at 0x7fd687ab1410>}, {'num_outputs': 3075, 'dropout': 0, 'hidden_size': 256, 'hidden_layers': 1, 'name': 'lps', 'context': 1, 'r': 7, 'loss': <pase.losses.ContextualizedLoss object at 0x7fd687ab1490>, 'skip': False}, {'num_outputs': 3075, 'dropout': 0, 'hidden_size': 256, 'hidden_layers': 1, 'name': 'lps_long', 'context': 1, 'r': 7, 'transform': {'win': 512}, 'loss': <pase.losses.ContextualizedLoss object at 0x7fd687ab14d0>, 'skip': False}, {'num_outputs': 120, 'dropout': 0, 'hidden_size': 256, 'hidden_layers': 1, 'name': 'fbank', 'context': 1, 'r': 7, 'loss': <pase.losses.ContextualizedLoss object at 0x7fd687ab1510>, 'skip': False}, {'num_outputs': 120, 'dropout': 0, 'hidden_size': 256, 'hidden_layers': 1, 'name': 'fbank_long', 'context': 1, 'r': 7, 'transform': {'win': 1024, 'n_fft': 1024}, 'loss': <pase.losses.ContextualizedLoss object at 0x7fd687ab1550>, 'skip': False}, {'num_outputs': 120, 'dropout': 0, 'hidden_size': 256, 'hidden_layers': 1, 'name': 'gtn', 'context': 1, 'r': 7, 'loss': <pase.losses.ContextualizedLoss object at 0x7fd687ab1590>, 'skip': False}, {'num_outputs': 120, 'dropout': 0, 'hidden_size': 256, 'hidden_layers': 1, 'name': 'gtn_long', 'context': 1, 'r': 7, 'loss': <pase.losses.ContextualizedLoss object at 0x7fd687ab15d0>, 'transform': {'win': 2048}, 'skip': False}, {'num_outputs': 39, 'dropout': 0, 'hidden_size': 256, 'hidden_layers': 1, 'name': 'mfcc', 'context': 1, 'r': 7, 'loss': <pase.losses.ContextualizedLoss object at 0x7fd687ab1610>, 'skip': False}, {'num_outputs': 60, 'dropout': 0, 'hidden_size': 256, 'hidden_layers': 1, 'name': 'mfcc_long', 'context': 1, 'r': 7, 'transform': {'win': 2048, 'order': 20}, 'loss': <pase.losses.ContextualizedLoss object at 0x7fd687aad550>, 'skip': False}, {'num_outputs': 12, 'dropout': 0, 'hidden_size': 256, 'hidden_layers': 1, 'name': 'prosody', 'context': 1, 'r': 7, 'loss': <pase.losses.ContextualizedLoss object at 0x7fd687aad450>, 'skip': False}], 'cls': [{'num_outputs': 1, 'dropout': 0, 'hidden_size': 256, 'hidden_layers': 1, 'name': 'mi', 'loss': <pase.losses.ContextualizedLoss object at 0x7fd687aad210>, 'skip': False, 'keys': ['chunk', 'chunk_ctxt', 'chunk_rand']}, {'num_outputs': 1, 'dropout': 0, 'hidden_size': 256, 'hidden_layers': 1, 'name': 'cmi', 'augment': True, 'loss': <pase.losses.ContextualizedLoss object at 0x7fd687aad1d0>, 'skip': False, 'keys': ['chunk', 'chunk_ctxt', 'chunk_rand']}]}
Found 2445650 speakers info
Found 1980000 files in train split
Found 1980000 speakers in train split
Traceback (most recent call last):
File "make_trainset_statistics.py", line 165, in
extract_stats(opts)
File "make_trainset_statistics.py", line 86, in extract_stats
for bidx, batch in enumerate(dloader, start=1):
File "/data/app/anaconda3/envs/pytorch-1.1/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 582, in next
return self._process_next_batch(batch)
File "/data/app/anaconda3/envs/pytorch-1.1/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 608, in _process_next_batch
raise batch.exc_type(batch.exc_msg)
RuntimeError: Traceback (most recent call last):
File "/data/app/anaconda3/envs/pytorch-1.1/lib/python3.7/site-packages/torch/utils/data/_utils/worker.py", line 99, in _worker_loop
samples = collate_fn([dataset[i] for i in batch_indices])
File "/data/app/anaconda3/envs/pytorch-1.1/lib/python3.7/site-packages/torch/utils/data/_utils/worker.py", line 99, in
samples = collate_fn([dataset[i] for i in batch_indices])
File "/data/app/ronlian/pase/pase/dataset.py", line 304, in getitem
pkg = self.transform(pkg)
File "/data/app/anaconda3/envs/pytorch-1.1/lib/python3.7/site-packages/torchvision/transforms/transforms.py", line 61, in call
img = t(img)
File "/data/app/ronlian/pase/pase/transforms.py", line 427, in call
pkg['chunk_rand'] = self.select_chunk(raw_rand)
File "/data/app/ronlian/pase/pase/transforms.py", line 317, in select_chunk
mode=self.pad_mode).view(-1)
File "/data/app/anaconda3/envs/pytorch-1.1/lib/python3.7/site-packages/torch/nn/functional.py", line 2805, in pad
ret = torch._C._nn.reflection_pad1d(input, pad)
RuntimeError: Argument #4: Padding size should be less than the corresponding input dimension, but got: padding (0, 10416) at dimension 2 of input [1, 1, 5584]

ASR experiment on TIMIT dataset

Hi, thanks for your nice work.

I am trying ASR experiment on TIMIT dataset, but I got an error message as the below.

epoch=20 loss_tr=0.859986 err_tr=0.273605 loss_te=1.420085 err_te=0.400905 lr=0.000562
epoch=21 loss_tr=0.856079 err_tr=0.271644 loss_te=1.421840 err_te=0.401584 lr=0.000281
epoch=22 loss_tr=0.855693 err_tr=0.271957 loss_te=1.419042 err_te=0.400856 lr=0.000141
epoch=23 loss_tr=0.854678 err_tr=0.271677 loss_te=1.419402 err_te=0.400407 lr=0.000070
BEST ERR=0.400407
BEST ACC=0.599593
WARNING: QRNN ignores bidirectional flag
Waveform reading...
Computing PASE features...
Decoding...
kaldi_decoding_scripts//decode_dnn.sh /home/sysadmin/ncr/pase/pase-test/ASR/output/dec_cfg.ini ./output/dec "./output/post.ark"
Traceback (most recent call last):
  File "run_TIMIT_full_decoding.py", line 559, in <module>
    run_shell(cmd_decode)
  File "/home/sysadmin/ncr/pase/pase-test/ASR/utils.py", line 50, in run_shell
    p = subprocess.Popen(cmd, stdout=subprocess.PIPE, stderr=subprocess.PIPE,shell=True)
  File "/home/sysadmin/miniconda/envs/pase/lib/python3.7/subprocess.py", line 800, in __init__
    restore_signals, start_new_session)
  File "/home/sysadmin/miniconda/envs/pase/lib/python3.7/subprocess.py", line 1482, in _execute_child
    restore_signals, start_new_session, preexec_fn)
OSError: [Errno 12] Cannot allocate memory

Do you have any idea about this?

Thanks so much.

TypeError: ('Unrecognized rnn type: ', 'qrnn')

I am following the example set in the README, and I am getting the following error. Is this familiar to anyone?

In [1]: from pase.models.frontend import wf_builder                                                                                            

In [2]: pase = wf_builder('pase_models/cfg/frontend/PASE+.cfg').eval()                                                                         
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-2-8accba11a31c> in <module>
----> 1 pase = wf_builder('pase_models/cfg/frontend/PASE+.cfg').eval()

/opt/conda/lib/python3.6/site-packages/PASE-0.1.1.dev0-py3.6.egg/pase/models/frontend.py in wf_builder(cfg_path)
    21             with open(cfg_path, 'r') as cfg_f:
    22                 cfg = json.load(cfg_f)
---> 23                 return wf_builder(cfg)
    24         elif isinstance(cfg_path, dict):
    25             if "name" in cfg_path.keys():

/opt/conda/lib/python3.6/site-packages/PASE-0.1.1.dev0-py3.6.egg/pase/models/frontend.py in wf_builder(cfg_path)
    34                     raise TypeError('Unrecognized frontend type: ', model_name)
    35             else:
---> 36                 return WaveFe(**cfg_path)
    37         else:
    38             TypeError('Unexpected config for WaveFe')

/opt/conda/lib/python3.6/site-packages/PASE-0.1.1.dev0-py3.6.egg/pase/models/frontend.py in __init__(self, num_inputs, sincnet, kwidths, strides, dilations, fmaps, norm_type, pad_mode, sr, emb_dim, rnn_dim, activation, rnn_pool, rnn_layers, rnn_dropout, rnn_type, vq_K, vq_beta, vq_gamma, norm_out, tanh_out, resblocks, denseskips, densemerge, name)
   192                                        rnn_type=rnn_type,
   193                                        bidirectional=True,
--> 194                                        dropout=rnn_dropout)
   195             self.W = nn.Conv1d(rnn_dim, emb_dim, 1)
   196         else:

/opt/conda/lib/python3.6/site-packages/PASE-0.1.1.dev0-py3.6.egg/pase/models/modules.py in build_rnn_block(in_size, rnn_size, rnn_layers, rnn_type, bidirectional, dropout, use_cuda)
    57                                             bidirectional=bidirectional)
    58     else:
---> 59         raise TypeError('Unrecognized rnn type: ', rnn_type)
    60     return rnn
    61 

TypeError: ('Unrecognized rnn type: ', 'qrnn')

Question about make_trainset_statistics.py

Hello,

What is the role of the statistics computed with make_trainset_statistics.py in the training?

Thank you.

Training PASE architecture for only Speaker ID using Librispeech data

Hi Mirco, Santi,
Thanks again for this great contributions. I had a look at codes and paper. The architecture is interesting. I want to train this architecture on Librispeech for speaker ID in same say as SincNet is trained. What will be the best way to do it. Assume I have all training and test data prepared as per the protocols of SincNet paper. I want to extract supervised Bottleneck features after it is trained to see how overall FER compares with original SincNet.

About Making the Dataset Configuration File

Where can I obtain the necessary files in this step (data/LibriSpeed/libri_dict.npy data/LibriSpeed/libri_te.scp data/LibriSpeed/libri_tr. scp)

Using PASE for speech emotion recognition

Hi,
Firstly thanks for such great work!!!!!!
I have trained the base PASE model on my custom data based on the different number of speakers. Now I want to use it for emotion recognition on a different dataset consisting of different emotions. So I want to know in detail about the emorec folder which has the script for emotion recognition. I tried run_IEMOCAP_fast.py on my emotion data which is basically for evalutaion. how to use the train.py i.e what parameters I need to pass it to train on my own data for emotion. I have the training, test files and a JSON file having a key as file name and value as a label .

Training PASE architecture for detecting fake audios

Hi, Thanks again for this great contributions!
I had read the codes and paper. Using self-supervised learning in this task is really amazing.
When I want to train this model for detecting fake audios using ASVspoof2019 LA dataset, I find some problems.
There are countless kinds of fake audios from different people. If I simply using 1 for fake audios and 0 for bonafide ones, Is it appropriate? or if there might be some appropriate ways for me to handle this problem ?

labels=load_label()
dic=dict()

for item in labels:
    item=item.split(" ")
    if item[-1]=="bonafide":
        dic[item[0]+".flac"]=0
    else:
        dic[item[0]+".flac"]=1
np.save("ASVspoof2019_dict.npy",dic)

Train an Emotion Detection Model

Hello @mravanelli @santi-pdp,
Can you please tell how to run emorec/train.py file?
What all files (like configuration files etc.) do I need for it? Can you please provide samples for the same?

Question about SimpleAdditive transform

Hello, thanks for the good code, I really appreciate it.
I'm wondering whether the SimpleAdditive noise transform
use VAD to calculate SNR, or just calculate the global SNR
and ignoring the silence region in speech?

Training pase+:RuntimeError: Legacy autograd function with non-static forward method is deprecated.

Hello, I got this error while training pase+ from scratch:
RuntimeError: Legacy autograd function with non-static forward method is deprecated. Please use new-style autograd function with static forward method. (Example: https://pytorch.org/docs/stable/autograd.html#torch.autograd.Function)

Any help?
Thank you.

LibriSpeechSegTupleWavDataset neighbor has bug?

neighbors = self.neighbor_prefixes[prefix]
neighbors.remove(uttname)

I found the code in getitem function in LibriSpeechSegTupleWavDataset just remove the uttname in the neighbors, and it will remove it in the self.neighbor_prefixes as well, and finally the neighbors is zero. I dont know it's a bug or you design it on purpose.

Besides, I found the code unsupervised_data_cfg_librispeech.py process the speaker is strange, if the valid or test split have the same speaker, it's not included in both "data_cfg['speakers']" and "data_cfg['valid']['speakers']", but the valid set comes from the training scp, which means they will have the same speaker, and finally the "data_cfg['valid']['speakers']" is empty.

Trying to load the pretained model

I run the following code in google colab (Cuda 10.0)

from pase.models.frontend import wf_builder
pase = wf_builder('cfg/frontend/PASE+.cfg').eval()
pase.load_pretrained('FE_e199.ckpt', load_last=True, verbose=True)

# Now we can forward waveforms as Torch tensors
import torch
x = torch.randn(1, 1, 100000) # example with random noise to check shape
# y size will be (1, 256, 625), which are 625 frames of 256 dims each
y = pase(x)

I tried both pase(x) and pase(x.cuda())

For pase(x) I get the following error

AssertionError                            Traceback (most recent call last)
<ipython-input-11-eae0f995c36c> in <module>()
      2 x = torch.randn(1, 1, 100000) # example with random noise to check shape
      3 # y size will be (1, 256, 625), which are 625 frames of 256 dims each
----> 4 y = pase(x)

7 frames
/usr/local/lib/python3.6/dist-packages/torchqrnn/forget_mult.py in forward(self, f, x, hidden_init, use_cuda)
    173         use_cuda = use_cuda and torch.cuda.is_available()
    174         # Ensure the user is aware when ForgetMult is not GPU version as it's far faster
--> 175         if use_cuda: assert f.is_cuda and x.is_cuda, 'GPU ForgetMult with fast element-wise CUDA kernel requested but tensors not on GPU'
    176         ###
    177         # Avoiding 'RuntimeError: expected a Variable argument, but got NoneType' when hidden_init is None

AssertionError: GPU ForgetMult with fast element-wise CUDA kernel requested but tensors not on GPU

for pase(x.cuda()) I get the following error

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-12-5f65b8d305f3> in <module>()
      2 x = torch.randn(1, 1, 100000) # example with random noise to check shape
      3 # y size will be (1, 256, 625), which are 625 frames of 256 dims each
----> 4 y = pase(x.cuda())

5 frames
/content/pase/pase/models/modules.py in forward(self, waveforms)
    900         band=(high-low)[:,0]
    901 
--> 902         f_times_t_low = torch.matmul(low, self.n_)
    903         f_times_t_high = torch.matmul(high, self.n_)
    904                 # Equivalent of Eq.4 of the reference paper (SPEAKER RECOGNITION FROM RAW WAVEFORM WITH SINCNET).

RuntimeError: Expected object of device type cuda but got device type cpu for argument #1 'self' in call to _th_mm

Any suggestions?

Unable to install the requirnment.txt

Hi @dmitriy-serdyuk @pswietojanski @EdwardDixon @santi-pdp @joansj
while installing using the requirnemt.txt, I am facing multiple issues related to the installation, of which a few are.

can you please help me with this???

Asset

Reproduce Self-Supervised Training on LibriSpeech

I tried to reproduce the self-supervised training by pase+ on librispeech 100h training dataset. I'm just wondering, for contamination part, what is the downsample filter (downsample_irfiles in cfg file) for distortion? Do I suppose to find it in the google drive .tar file?

Train and valid losses

Hi, is it possible for you to share the train and valid losses that were obtained while training the pase+, wrt time?

Right now, my losses are not improving much. And after some 50 epochs I am getting the NAN values. Wanted to see what the correct loss pattern look like. I understand it ll depend on the audio data as well.

load pretrained worker weights

Hi,

I am trying to load the pretrained workers (from #85)

Model that I am using:

from pase.models.pase import *

ps = pase(frontend=None,
      frontend_cfg=frontend_cfg,
      minions_cfg=minions_cfg,
      cls_lst=cls_lst, regr_lst=regr_lst,
      pretrained_ckpt=None,
      name='Pase_base')

To import worker weight (for 1 worker):

for m in ps.classification_workers:
  m.load_pretrained(ckpt_path='/workers/weights_M-mi-M-mi-721872.ckpt', load_last=True, verbose=True)
  break

Error I am getting:

Current Model keys:  5
Current Pt keys:  4
Loading matching keys:  ['minion.blocks.0.W.bias', 'minion.blocks.0.act.weight', 'minion.W.weight', 'minion.W.bias']
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-35-3cc20e1ab527> in <module>()
      1 for m in ps.classification_workers:
----> 2   m.load_pretrained(ckpt_path='/content/workers/weights_M-mi-M-mi-721872.ckpt', load_last=True, verbose=True)
      3   break

1 frames
/content/pase/models/modules.py in load_pretrained_ckpt(self, ckpt_file, load_last, load_opt, verbose)
    289             print('Loading matching keys: ', list(pt_dict.keys()))
    290         if len(pt_dict.keys()) != len(model_dict.keys()):
--> 291             raise ValueError('WARNING: LOADING DIFFERENT NUM OF KEYS')
    292             print('WARNING: LOADING DIFFERENT NUM OF KEYS')
    293         # overwrite entries in existing dict

ValueError: WARNING: LOADING DIFFERENT NUM OF KEYS

Question about the testing dataset

Hi, thanks for this great work and the given repository for experiments. Here, I have a question for the paper:
Are the datasets (TIMIT+rev+noise, DIRHA+rev+noise) used for testing PASE+ available ?
If so, could you please tell me where can I download these wav files ?
Thanks in advance !

Best,
Fuann

some bug

1、in classifiers.py,
ResBasicBlock1D(
64,
64,
kwidth=5,
att=att,
att_heads=att_heads,
att_dropout=att_dropout)
in modules.py,
ResBasicBlock1D has no paramter att、att_heads and so on

Issue in installation

The local installation isn't working. After running command python setup.py install. I am only able to import pase but not pase.models etc.

On running the following python lines:

import pase
help(pase)

I get this output:

NAME
pase

PACKAGE CONTENTS
make_trainset_statistics
pase (package)
precompute_aco_data
setup
train
unsupervised_data_cfg_librispeech

FILE
/notebooks/pase/init.py

I think the there's some problem with the setup. Could you please check it once?

.scp file for train/test set

Hi, I am new to kaldi.

I installed kaldi and tried yesno recipe. There I figured that to generate .scp files, you need to run run.sh, which thereby calls local/create_wav_scp.pl to generate the .scp file.
Also there's something called /kaldi/src/featbin/copy-feats which can also be used (haven't tried it yet).

Is there any other way that one can generate .scp files without installing Kaldi? Because I don't use kaldi for ASR.

facing issue while importing library

Hello @dmitriy-serdyuk @pswietojanski @EdwardDixon @santi-pdp @joansj, i am getting issue while importing this line from pase.models.frontend import wf_builder.

Add new workers to pase or modify some features in a worker

Hello!
I am working on PASE encoder for a specific task and I want to modify the workers or even add new workers to push the extraction of other specific features. For example, in the prosody worker, let's say I want to extract the jitter and the shimmer with the already extracted features (Log(f0), energy,UV,zero crossing rate). To do this, I figured out that I should modify the minion prosody in "transform.py" by adding the extraction of those features. Is there another related script that I should modify based on this? Also is there any way to add importance to one worker than another in the extraction of features?
Thanks!!

Reproduce the supervised training with TIMIT ASR

Thanks for the great work,
I followed the scripts(run_TIMIT_full_decoding.py)as mentioned, then got 16.6 PER,
then I want to further reproduce the PASE+ (Supervised and FineTuned ones) in Table.3 results,
can you give me some hints?

thanks.

Error while setting trans_cache in train.py

While training the python train.py after setting `trans_cache' I am getting the following error

  File "train.py", line 465, in <module>
    train(opts)
  File "train.py", line 333, in train
    Trainer.train_(dloader, device=device, valid_dataloader=va_dloader)
  File "/content/pase/pase/models/WorkerScheduler/trainer.py", line 223, in train_
    batch = next(iterator)
  File "/usr/local/lib/python3.6/dist-packages/torch/utils/data/dataloader.py", line 345, in __next__
    data = self._next_data()
  File "/usr/local/lib/python3.6/dist-packages/torch/utils/data/dataloader.py", line 856, in _next_data
    return self._process_data(data)
  File "/usr/local/lib/python3.6/dist-packages/torch/utils/data/dataloader.py", line 881, in _process_data
    data.reraise()
  File "/usr/local/lib/python3.6/dist-packages/torch/_utils.py", line 394, in reraise
    raise self.exc_type(msg)
KeyError: Caught KeyError in DataLoader worker process 0.
Original Traceback (most recent call last):
  File "/usr/local/lib/python3.6/dist-packages/torch/utils/data/_utils/worker.py", line 178, in _worker_loop
    data = fetcher.fetch(index)
  File "/usr/local/lib/python3.6/dist-packages/torch/utils/data/_utils/fetch.py", line 44, in fetch
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/usr/local/lib/python3.6/dist-packages/torch/utils/data/_utils/fetch.py", line 44, in <listcomp>
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/content/pase/pase/dataset.py", line 496, in __getitem__
    pkg['overlap'] = torch.zeros(pkg['chunk'].shape[-1] // pkg['dec_resolution']).float()
KeyError: 'dec_resolution'

Unable to figure out why pkg['dec_resoultion'] is not defined.

ASR experiment with custom dataset

I have trained the PASE+ encoder on my custom datatset but I couldnt see anything about data preparation for ASR experiment on custom dataset. I have a set of .wav files and their corresponding text transcript. Can you please tell me how to arrange my dataset so that I cun run run_TIMIT_full_decoding.py for my dataset? I could see in the code that there are label files as well . How are these labels file generated ? Any help will be appreciated

Running on multiple GPUs

Thanks for this repository.

I had an issue when trying to run train.py on multiple GPUs. The code gave the following error:

AttributeError: 'DataParallel' object has no attribute 'loss_weight'

It appears that when running on multiple GPU, the model will be wrapped up in DataParallel() object. I did the following modification on the code (pase/models/core.py - line 404), wherever it raised the same issue:
minion.loss_weight --> minion.module.loss_weight

This seems to work fine after this modification.

ERROR: Failed building wheel for pycodec2

I am getting this error

#include "codec2/codec2.h"
^~~~~~~~~~~~~~~~~
compilation terminated.
error: command '/home/deepesh/anaconda3/bin/x86_64-conda_cos6-linux-gnu-cc' failed with exit status 1

Please help me with this.

self-supervised training from scratch

Hi,

Training the self supervised model from scratch takes a lot of time on 1 GPU machine. For the data that I have, it takes ~8hrs to train for 1 epoch.

Apart from increasing the GPU count, do we have any other method to speed up the training?

No module name 'pycodec2'

Hi,

Why doesn't the requirement file list pycodec2 (and codec2)? I get the following error while training the model No module name 'pycodec2'

I installed pycodec2 from here and codec2, but I still get this error

Traceback (most recent call last):
  File "train.py", line 465, in <module>
    train(opts)
  File "train.py", line 272, in train
    dsets, collater_keys = build_dataset_providers(opts, minions_cfg)
  File "train.py", line 191, in build_dataset_providers
    dist_trans = config_distortions(**dtr)
  File "/content/pase/pase/transforms.py", line 123, in config_distortions
    trans.append(Codec2Buffer(report=report, kbps=codec2_kbps))
  File "/content/pase/pase/transforms.py", line 2138, in __init__
    self.c2 = pycodec2.Codec2(kbps)
AttributeError: module 'pycodec2' has no attribute 'Codec2'

Any suggestions?

Training from scracth

I would train pase model from scratch.
I have 2 main issue.
1.how to solve warning
"""

WARNING, path does not exist: KALDI_ROOT=/mnt/matylda5/iveselyk/Tools/kaldi-trunk

(please add 'export KALDI_ROOT=<your_path>' in your $HOME/.profile)

(or run as: KALDI_ROOT=<your_path> python <your_script>.py)

"""
2.how to solve stop Iteration.
Did I do something wrong?
"""
Traceback (most recent call last):
File "E:\Python\Jupyter Notebook\ThaiSpeechEmotion\pase-master\pase\models\WorkerScheduler\trainer.py", line 295, in _eval
batch = next(iterator)
File "E:\Anaconda3\envs\pase-master\lib\site-packages\torch\utils\data\dataloader.py", line 345, in next
data = self._next_data()
File "E:\Anaconda3\envs\pase-master\lib\site-packages\torch\utils\data\dataloader.py", line 831, in _next_data
raise StopIteration
StopIteration

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "train.py", line 465, in
train(opts)
File "train.py", line 333, in train
Trainer.train_(dloader, device=device, valid_dataloader=va_dloader)
File "E:\Python\Jupyter Notebook\ThaiSpeechEmotion\pase-master\pase\models\WorkerScheduler\trainer.py", line 265, in train_
device=device)
File "E:\Python\Jupyter Notebook\ThaiSpeechEmotion\pase-master\pase\models\WorkerScheduler\trainer.py", line 298, in _eval
batch = next(iterator)
File "E:\Anaconda3\envs\pase-master\lib\site-packages\torch\utils\data\dataloader.py", line 345, in next
data = self._next_data()
File "E:\Anaconda3\envs\pase-master\lib\site-packages\torch\utils\data\dataloader.py", line 831, in _next_data
raise StopIteration
StopIteration

"""
Thanks in advance for any help that you are able to provide.

(pase-master) E:\Python\Jupyter Notebook\ThaiSpeechEmotion\pase-master>python -u train.py --batch_size 32 --epoch 150 --save_path pase_ckpt --num_workers 4 --net_cfg cfg/workers/workers.cfg --fe_cfg cfg/frontend/PASE.cfg --data_cfg data/librispeech_data.cfg --min_lr 0.0005 --fe_lr 0.0005 --data_root E:/SpeechEmotionDataSet/ThaiEmotionDB --stats data/librispeech_stats.pkl --lrdec_step 30 --lrdecay 0.5
################################################################################

WARNING, path does not exist: KALDI_ROOT=/mnt/matylda5/iveselyk/Tools/kaldi-trunk

(please add 'export KALDI_ROOT=<your_path>' in your $HOME/.profile)

(or run as: KALDI_ROOT=<your_path> python <your_script>.py)

################################################################################

[!] Using CPU
Seeds initialized to 2
{'regr': [{'num_outputs': 1, 'dropout': 0, 'hidden_layers': 1, 'name': 'cchunk', 'type': 'decoder', 'hidden_size': 64, 'fmaps': [512, 256, 128], 'strides': [4, 4, 10], 'kwidths': [30, 30, 30], 'loss': <pase.losses.ContextualizedLoss object at 0x000001CFDC8DF208>}, {'num_outputs': 1025, 'dropout': 0, 'hidden_size': 256, 'hidden_layers': 1, 'name': 'lps', 'loss': <pase.losses.ContextualizedLoss object at 0x000001CFDC8DF108>, 'transform': {'der_order': 0}, 'skip': False}, {'num_outputs': 20, 'dropout': 0, 'hidden_size': 256, 'hidden_layers': 1, 'name': 'mfcc', 'loss': <pase.losses.ContextualizedLoss object at 0x000001CFDC8DF2C8>, 'transform': {'der_order': 0, 'order': 20}, 'skip': False}, {'num_outputs': 4, 'dropout': 0, 'hidden_size': 256, 'hidden_layers': 1, 'name': 'prosody', 'loss': <pase.losses.ContextualizedLoss object at 0x000001CFDC8DF308>, 'transform': {'der_order': 0}, 'skip': False}], 'cls': [{'num_outputs': 1, 'dropout': 0, 'hidden_size': 256, 'hidden_layers': 1, 'name': 'spc', 'type': 'spc', 'loss': <pase.losses.ContextualizedLoss object at 0x000001CFD9C3EAC8>, 'skip': False}, {'num_outputs': 1, 'dropout': 0, 'hidden_size': 256, 'hidden_layers': 1, 'name': 'mi', 'loss': <pase.losses.ContextualizedLoss object at 0x000001CFDC8DF408>, 'skip': False, 'keys': ['chunk', 'chunk_ctxt', 'chunk_rand']}, {'num_outputs': 1, 'dropout': 0, 'hidden_size': 256, 'hidden_layers': 1, 'name': 'cmi', 'loss': <pase.losses.ContextualizedLoss object at 0x000001CFDC8DF4C8>, 'skip': False, 'keys': ['chunk', 'chunk_ctxt', 'chunk_rand']}]}
Compose(
ToTensor()
MIChunkWav(16000)
LPS(n_fft=2048, hop=160, win=400, device=cpu)
MFCC(order=20, sr=16000)
Prosody(hop=160, win=320, f0_min=60, f0_max=300, sr=16000)
ZNorm(data/librispeech_stats.pkl)
)
Preparing dset for E:/SpeechEmotionDataSet/ThaiEmotionDB
Dataset name <class 'pase.dataset.LibriSpeechSegTupleWavDataset'> and opts LibriSpeechSegTupleWavDataset
Found 2 speakers info
Found 91 files in train split
Found 2 speakers in train split
Found 2 prefixes in utterances
Found 2 speakers info
Found 10 files in valid split
Found 0 speakers in valid split
Found 2 prefixes in utterances
Dataset has a total 0.11004553819444443 hours of training data
<_io.TextIOWrapper name='cfg/frontend/PASE.cfg' mode='r' encoding='cp1252'>
{'kwidths': [251, 20, 11, 11, 11, 11, 11, 11], 'strides': [1, 10, 2, 1, 2, 1, 2, 2], 'fmaps': [64, 64, 128, 128, 256, 256, 512, 512], 'emb_dim': 100, 'norm_out': True}
True
Cls: ['spc', 'mi', 'cmi']
Regr: ['cchunk', 'lps', 'mfcc', 'prosody']
training pase...
pase config ==> {'kwidths': [251, 20, 11, 11, 11, 11, 11, 11], 'strides': [1, 10, 2, 1, 2, 1, 2, 2], 'fmaps': [64, 64, 128, 128, 256, 256, 512, 512], 'emb_dim': 100, 'norm_out': True}
==>concat features from 1 levels
==>input size for workers: 100

name cchunk

Dropout at the inputs disabled, as p=0

name lps

Dropout at the inputs disabled, as p=0.0

name mfcc

Dropout at the inputs disabled, as p=0.0

name prosody

Dropout at the inputs disabled, as p=0.0

name spc

==================================================
name spc

num_inputs: 100
ctxt_frames: 5
num_inputs: 600
Dropout at the inputs disabled, as p=0.0

name mi

==================================================
name mi

Dropout at the inputs disabled, as p=0.0

name cmi

==================================================
name cmi

Dropout at the inputs disabled, as p=0.0
Using step LR Scheduler for frontend!
Using step LR Scheduler for spc!
Using step LR Scheduler for mi!
Using step LR Scheduler for cmi!
Using step LR Scheduler for cchunk!
Using step LR Scheduler for lps!
Using step LR Scheduler for mfcc!
Using step LR Scheduler for prosody!
Use tenoserboard: True
pase(
(frontend): WaveFe(
(blocks): ModuleList(
(0): FeBlock(
(conv): SincConv_fast()
(norm): BatchNorm1d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(act): PReLU(num_parameters=64)
)
(1): FeBlock(
(conv): Conv1d(64, 64, kernel_size=(20,), stride=(10,))
(norm): BatchNorm1d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(act): PReLU(num_parameters=64)
)
(2): FeBlock(
(conv): Conv1d(64, 128, kernel_size=(11,), stride=(2,))
(norm): BatchNorm1d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(act): PReLU(num_parameters=128)
)
(3): FeBlock(
(conv): Conv1d(128, 128, kernel_size=(11,), stride=(1,))
(norm): BatchNorm1d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(act): PReLU(num_parameters=128)
)
(4): FeBlock(
(conv): Conv1d(128, 256, kernel_size=(11,), stride=(2,))
(norm): BatchNorm1d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(act): PReLU(num_parameters=256)
)
(5): FeBlock(
(conv): Conv1d(256, 256, kernel_size=(11,), stride=(1,))
(norm): BatchNorm1d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(act): PReLU(num_parameters=256)
)
(6): FeBlock(
(conv): Conv1d(256, 512, kernel_size=(11,), stride=(2,))
(norm): BatchNorm1d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(act): PReLU(num_parameters=512)
)
(7): FeBlock(
(conv): Conv1d(512, 512, kernel_size=(11,), stride=(2,))
(norm): BatchNorm1d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(act): PReLU(num_parameters=512)
)
)
(W): Conv1d(512, 100, kernel_size=(1,), stride=(1,))
(norm_out): BatchNorm1d(100, eps=1e-05, momentum=0.1, affine=False, track_running_stats=True)
)
(regression_workers): ModuleList(
(0): DecoderMinion(
(blocks): ModuleList(
(0): GDeconv1DBlock(
(deconv): ConvTranspose1d(100, 512, kernel_size=(30,), stride=(4,), padding=(13,))
(act): PReLU(num_parameters=512)
)
(1): GDeconv1DBlock(
(deconv): ConvTranspose1d(512, 256, kernel_size=(30,), stride=(4,), padding=(13,))
(act): PReLU(num_parameters=256)
)
(2): GDeconv1DBlock(
(deconv): ConvTranspose1d(256, 128, kernel_size=(30,), stride=(10,), padding=(10,))
(act): PReLU(num_parameters=128)
)
(3): MLPBlock(
(W): Conv1d(128, 64, kernel_size=(1,), stride=(1,))
(din): PatternedDropout()
(act): PReLU(num_parameters=64)
(dout): Dropout(p=0, inplace=False)
)
)
(W): Conv1d(64, 1, kernel_size=(1,), stride=(1,))
)
(1): MLPMinion(
(blocks): ModuleList(
(0): MLPBlock(
(W): Conv1d(100, 256, kernel_size=(1,), stride=(1,))
(din): PatternedDropout()
(act): PReLU(num_parameters=256)
(dout): Dropout(p=0, inplace=False)
)
)
(W): Conv1d(256, 1025, kernel_size=(1,), stride=(1,))
)
(2): MLPMinion(
(blocks): ModuleList(
(0): MLPBlock(
(W): Conv1d(100, 256, kernel_size=(1,), stride=(1,))
(din): PatternedDropout()
(act): PReLU(num_parameters=256)
(dout): Dropout(p=0, inplace=False)
)
)
(W): Conv1d(256, 20, kernel_size=(1,), stride=(1,))
)
(3): MLPMinion(
(blocks): ModuleList(
(0): MLPBlock(
(W): Conv1d(100, 256, kernel_size=(1,), stride=(1,))
(din): PatternedDropout()
(act): PReLU(num_parameters=256)
(dout): Dropout(p=0, inplace=False)
)
)
(W): Conv1d(256, 4, kernel_size=(1,), stride=(1,))
)
)
(classification_workers): ModuleList(
(0): SPC(
(minion): SPCMinion(
(blocks): ModuleList(
(0): MLPBlock(
(W): Conv1d(600, 256, kernel_size=(1,), stride=(1,))
(din): PatternedDropout()
(act): PReLU(num_parameters=256)
(dout): Dropout(p=0, inplace=False)
)
)
(W): Conv1d(256, 1, kernel_size=(1,), stride=(1,))
)
)
(1): LIM(
(minion): MLPMinion(
(blocks): ModuleList(
(0): MLPBlock(
(W): Conv1d(200, 256, kernel_size=(1,), stride=(1,))
(din): PatternedDropout()
(act): PReLU(num_parameters=256)
(dout): Dropout(p=0, inplace=False)
)
)
(W): Conv1d(256, 1, kernel_size=(1,), stride=(1,))
)
)
(2): GIM(
(minion): MLPMinion(
(blocks): ModuleList(
(0): MLPBlock(
(W): Conv1d(200, 256, kernel_size=(1,), stride=(1,))
(din): PatternedDropout()
(act): PReLU(num_parameters=256)
(dout): Dropout(p=0, inplace=False)
)
)
(W): Conv1d(256, 1, kernel_size=(1,), stride=(1,))
)
)
)
)

FeBlock(
(conv): SincConv_fast()
(norm): BatchNorm1d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(act): PReLU(num_parameters=64)
)
Num params: 320

FeBlock(
(conv): Conv1d(64, 64, kernel_size=(20,), stride=(10,))
(norm): BatchNorm1d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(act): PReLU(num_parameters=64)
)
Num params: 82176

FeBlock(
(conv): Conv1d(64, 128, kernel_size=(11,), stride=(2,))
(norm): BatchNorm1d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(act): PReLU(num_parameters=128)
)
Num params: 90624

FeBlock(
(conv): Conv1d(128, 128, kernel_size=(11,), stride=(1,))
(norm): BatchNorm1d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(act): PReLU(num_parameters=128)
)
Num params: 180736

FeBlock(
(conv): Conv1d(128, 256, kernel_size=(11,), stride=(2,))
(norm): BatchNorm1d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(act): PReLU(num_parameters=256)
)
Num params: 361472

FeBlock(
(conv): Conv1d(256, 256, kernel_size=(11,), stride=(1,))
(norm): BatchNorm1d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(act): PReLU(num_parameters=256)
)
Num params: 721920

FeBlock(
(conv): Conv1d(256, 512, kernel_size=(11,), stride=(2,))
(norm): BatchNorm1d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(act): PReLU(num_parameters=512)
)
Num params: 1443840

FeBlock(
(conv): Conv1d(512, 512, kernel_size=(11,), stride=(2,))
(norm): BatchNorm1d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(act): PReLU(num_parameters=512)
)
Num params: 2885632

WaveFe total params: 5818020
Frontend params: 5818020

Beginning training...
Batches per epoch: 12
Loss schedule policy: base
Reading latest checkpoint from pase_ckpt\PASE-checkpoints...
[!] No checkpoint found in pase_ckpt
################################################################################

WARNING, path does not exist: KALDI_ROOT=/mnt/matylda5/iveselyk/Tools/kaldi-trunk

(please add 'export KALDI_ROOT=<your_path>' in your $HOME/.profile)

(or run as: KALDI_ROOT=<your_path> python <your_script>.py)

################################################################################

WARNING, path does not exist: KALDI_ROOT=/mnt/matylda5/iveselyk/Tools/kaldi-trunk

(please add 'export KALDI_ROOT=<your_path>' in your $HOME/.profile)

(or run as: KALDI_ROOT=<your_path> python <your_script>.py)

################################################################################

WARNING, path does not exist: KALDI_ROOT=/mnt/matylda5/iveselyk/Tools/kaldi-trunk

(please add 'export KALDI_ROOT=<your_path>' in your $HOME/.profile)

(or run as: KALDI_ROOT=<your_path> python <your_script>.py)

################################################################################

WARNING, path does not exist: KALDI_ROOT=/mnt/matylda5/iveselyk/Tools/kaldi-trunk

(please add 'export KALDI_ROOT=<your_path>' in your $HOME/.profile)

(or run as: KALDI_ROOT=<your_path> python <your_script>.py)

################################################################################

Epoch 0/150: 17%|███████████████████████████████▏ | 2/12 [00:59<05:07, 30.71s/it]################################################################################

WARNING, path does not exist: KALDI_ROOT=/mnt/matylda5/iveselyk/Tools/kaldi-trunk

(please add 'export KALDI_ROOT=<your_path>' in your $HOME/.profile)

(or run as: KALDI_ROOT=<your_path> python <your_script>.py)

################################################################################

WARNING, path does not exist: KALDI_ROOT=/mnt/matylda5/iveselyk/Tools/kaldi-trunk

(please add 'export KALDI_ROOT=<your_path>' in your $HOME/.profile)

(or run as: KALDI_ROOT=<your_path> python <your_script>.py)

################################################################################

WARNING, path does not exist: KALDI_ROOT=/mnt/matylda5/iveselyk/Tools/kaldi-trunk

(please add 'export KALDI_ROOT=<your_path>' in your $HOME/.profile)

(or run as: KALDI_ROOT=<your_path> python <your_script>.py)

################################################################################

WARNING, path does not exist: KALDI_ROOT=/mnt/matylda5/iveselyk/Tools/kaldi-trunk

(please add 'export KALDI_ROOT=<your_path>' in your $HOME/.profile)

(or run as: KALDI_ROOT=<your_path> python <your_script>.py)

################################################################################

Epoch 0/150: 33%|██████████████████████████████████████████████████████████████▎ | 4/12 [02:12<04:22, 32.83s/it]################################################################################

WARNING, path does not exist: KALDI_ROOT=/mnt/matylda5/iveselyk/Tools/kaldi-trunk

(please add 'export KALDI_ROOT=<your_path>' in your $HOME/.profile)

(or run as: KALDI_ROOT=<your_path> python <your_script>.py)

################################################################################

WARNING, path does not exist: KALDI_ROOT=/mnt/matylda5/iveselyk/Tools/kaldi-trunk

(please add 'export KALDI_ROOT=<your_path>' in your $HOME/.profile)

(or run as: KALDI_ROOT=<your_path> python <your_script>.py)

################################################################################

WARNING, path does not exist: KALDI_ROOT=/mnt/matylda5/iveselyk/Tools/kaldi-trunk

(please add 'export KALDI_ROOT=<your_path>' in your $HOME/.profile)

(or run as: KALDI_ROOT=<your_path> python <your_script>.py)

################################################################################

WARNING, path does not exist: KALDI_ROOT=/mnt/matylda5/iveselyk/Tools/kaldi-trunk

(please add 'export KALDI_ROOT=<your_path>' in your $HOME/.profile)

(or run as: KALDI_ROOT=<your_path> python <your_script>.py)

################################################################################

Epoch 0/150: 50%|█████████████████████████████████████████████████████████████████████████████████████████████▌ | 6/12 [03:22<03:14, 32.43s/it]################################################################################

WARNING, path does not exist: KALDI_ROOT=/mnt/matylda5/iveselyk/Tools/kaldi-trunk

(please add 'export KALDI_ROOT=<your_path>' in your $HOME/.profile)

(or run as: KALDI_ROOT=<your_path> python <your_script>.py)

################################################################################

WARNING, path does not exist: KALDI_ROOT=/mnt/matylda5/iveselyk/Tools/kaldi-trunk

(please add 'export KALDI_ROOT=<your_path>' in your $HOME/.profile)

(or run as: KALDI_ROOT=<your_path> python <your_script>.py)

################################################################################

WARNING, path does not exist: KALDI_ROOT=/mnt/matylda5/iveselyk/Tools/kaldi-trunk

(please add 'export KALDI_ROOT=<your_path>' in your $HOME/.profile)

(or run as: KALDI_ROOT=<your_path> python <your_script>.py)

################################################################################

WARNING, path does not exist: KALDI_ROOT=/mnt/matylda5/iveselyk/Tools/kaldi-trunk

(please add 'export KALDI_ROOT=<your_path>' in your $HOME/.profile)

(or run as: KALDI_ROOT=<your_path> python <your_script>.py)

################################################################################

Epoch 0/150: 58%|██████████████████████████████████████████████████████████████████████████████████████████████████████Epoch 0/150: 67%|██████████████████████████████████████████████████████████████████████████████████████████████████████Epoch 0/150: 67%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▋ | 8/12 [04:30<02:09, 32.30s/it]################################################################################

WARNING, path does not exist: KALDI_ROOT=/mnt/matylda5/iveselyk/Tools/kaldi-trunk

(please add 'export KALDI_ROOT=<your_path>' in your $HOME/.profile)

(or run as: KALDI_ROOT=<your_path> python <your_script>.py)

################################################################################

WARNING, path does not exist: KALDI_ROOT=/mnt/matylda5/iveselyk/Tools/kaldi-trunk

(please add 'export KALDI_ROOT=<your_path>' in your $HOME/.profile)

(or run as: KALDI_ROOT=<your_path> python <your_script>.py)

################################################################################

WARNING, path does not exist: KALDI_ROOT=/mnt/matylda5/iveselyk/Tools/kaldi-trunk

(please add 'export KALDI_ROOT=<your_path>' in your $HOME/.profile)

(or run as: KALDI_ROOT=<your_path> python <your_script>.py)

################################################################################

WARNING, path does not exist: KALDI_ROOT=/mnt/matylda5/iveselyk/Tools/kaldi-trunk

(please add 'export KALDI_ROOT=<your_path>' in your $HOME/.profile)

(or run as: KALDI_ROOT=<your_path> python <your_script>.py)

################################################################################

Epoch 0/150: 75%|██████████████████████████████████████████████████████████████████████████████████████████████████████Epoch 0/150: 75%|██████████████████████████████████████████████████████████████████████████████████████████████████████Epoch 0/150: 83%|██████████████████████████████████████████████████████████████████████████████████████████████████████Epoch 0/150: 83%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████ | 10/12 [05:34<01:01, 30.88s/it]################################################################################

WARNING, path does not exist: KALDI_ROOT=/mnt/matylda5/iveselyk/Tools/kaldi-trunk

(please add 'export KALDI_ROOT=<your_path>' in your $HOME/.profile)

(or run as: KALDI_ROOT=<your_path> python <your_script>.py)

################################################################################

WARNING, path does not exist: KALDI_ROOT=/mnt/matylda5/iveselyk/Tools/kaldi-trunk

(please add 'export KALDI_ROOT=<your_path>' in your $HOME/.profile)

(or run as: KALDI_ROOT=<your_path> python <your_script>.py)

################################################################################

WARNING, path does not exist: KALDI_ROOT=/mnt/matylda5/iveselyk/Tools/kaldi-trunk

(please add 'export KALDI_ROOT=<your_path>' in your $HOME/.profile)

(or run as: KALDI_ROOT=<your_path> python <your_script>.py)

################################################################################

WARNING, path does not exist: KALDI_ROOT=/mnt/matylda5/iveselyk/Tools/kaldi-trunk

(please add 'export KALDI_ROOT=<your_path>' in your $HOME/.profile)

(or run as: KALDI_ROOT=<your_path> python <your_script>.py)

################################################################################

Epoch 0/150: 92%|██████████████████████████████████████████████████████████████████████████████████████████████████████Epoch 0/150: 92%|██████████████████████████████████████████████████████████████████████████████████████████████████████ ==================================================
Epoch 0/150: 92%|██████████████████████████████████████████████████████████████████████████████████████████████████████ Batch 12/12 (Epoch 0) step: 12:
Epoch 0/150: 92%|██████████████████████████████████████████████████████████████████████████████████████████████████████ spc, learning rate = 0.00050000, loss = 0.7067
Epoch 0/150: 92%|██████████████████████████████████████████████████████████████████████████████████████████████████████ mi, learning rate = 0.00050000, loss = 0.7009
Epoch 0/150: 92%|██████████████████████████████████████████████████████████████████████████████████████████████████████ cmi, learning rate = 0.00050000, loss = 0.7032
Epoch 0/150: 92%|██████████████████████████████████████████████████████████████████████████████████████████████████████ cchunk, learning rate = 0.00050000, loss = 0.1004
Epoch 0/150: 92%|██████████████████████████████████████████████████████████████████████████████████████████████████████ lps, learning rate = 0.00050000, loss = 32.5080
Epoch 0/150: 92%|██████████████████████████████████████████████████████████████████████████████████████████████████████ mfcc, learning rate = 0.00050000, loss = 28.9927
Epoch 0/150: 92%|██████████████████████████████████████████████████████████████████████████████████████████████████████ prosody, learning rate = 0.00050000, loss = 13.5776
Epoch 0/150: 92%|██████████████████████████████████████████████████████████████████████████████████████████████████████ total, learning rate = 0.00050000, loss = 77.2894
Epoch 0/150: 92%|██████████████████████████████████████████████████████████████████████████████████████████████████████Epoch 0/150: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 12/12 [06:36<00:00, 30.26s/it]

Beginning evaluation...
################################################################################

WARNING, path does not exist: KALDI_ROOT=/mnt/matylda5/iveselyk/Tools/kaldi-trunk

(please add 'export KALDI_ROOT=<your_path>' in your $HOME/.profile)

(or run as: KALDI_ROOT=<your_path> python <your_script>.py)

################################################################################

WARNING, path does not exist: KALDI_ROOT=/mnt/matylda5/iveselyk/Tools/kaldi-trunk

(please add 'export KALDI_ROOT=<your_path>' in your $HOME/.profile)

(or run as: KALDI_ROOT=<your_path> python <your_script>.py)

################################################################################

WARNING, path does not exist: KALDI_ROOT=/mnt/matylda5/iveselyk/Tools/kaldi-trunk

(please add 'export KALDI_ROOT=<your_path>' in your $HOME/.profile)

(or run as: KALDI_ROOT=<your_path> python <your_script>.py)

################################################################################

WARNING, path does not exist: KALDI_ROOT=/mnt/matylda5/iveselyk/Tools/kaldi-trunk

(please add 'export KALDI_ROOT=<your_path>' in your $HOME/.profile)

(or run as: KALDI_ROOT=<your_path> python <your_script>.py)

################################################################################

Eval: 1/2: 0%| | 0/1 [00:00<?, ?it/s]################################################################################

WARNING, path does not exist: KALDI_ROOT=/mnt/matylda5/iveselyk/Tools/kaldi-trunk

(please add 'export KALDI_ROOT=<your_path>' in your $HOME/.profile)

(or run as: KALDI_ROOT=<your_path> python <your_script>.py)

################################################################################

WARNING, path does not exist: KALDI_ROOT=/mnt/matylda5/iveselyk/Tools/kaldi-trunk

(please add 'export KALDI_ROOT=<your_path>' in your $HOME/.profile)

(or run as: KALDI_ROOT=<your_path> python <your_script>.py)

################################################################################

WARNING, path does not exist: KALDI_ROOT=/mnt/matylda5/iveselyk/Tools/kaldi-trunk

(please add 'export KALDI_ROOT=<your_path>' in your $HOME/.profile)

(or run as: KALDI_ROOT=<your_path> python <your_script>.py)

################################################################################

WARNING, path does not exist: KALDI_ROOT=/mnt/matylda5/iveselyk/Tools/kaldi-trunk

(please add 'export KALDI_ROOT=<your_path>' in your $HOME/.profile)

(or run as: KALDI_ROOT=<your_path> python <your_script>.py)

################################################################################

Traceback (most recent call last):
File "E:\Python\Jupyter Notebook\ThaiSpeechEmotion\pase-master\pase\models\WorkerScheduler\trainer.py", line 295, in _eval
batch = next(iterator)
File "E:\Anaconda3\envs\pase-master\lib\site-packages\torch\utils\data\dataloader.py", line 345, in next
data = self._next_data()
File "E:\Anaconda3\envs\pase-master\lib\site-packages\torch\utils\data\dataloader.py", line 831, in _next_data
raise StopIteration
StopIteration

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "train.py", line 465, in
train(opts)
File "train.py", line 333, in train
Trainer.train_(dloader, device=device, valid_dataloader=va_dloader)
File "E:\Python\Jupyter Notebook\ThaiSpeechEmotion\pase-master\pase\models\WorkerScheduler\trainer.py", line 265, in train_
device=device)
File "E:\Python\Jupyter Notebook\ThaiSpeechEmotion\pase-master\pase\models\WorkerScheduler\trainer.py", line 298, in _eval
batch = next(iterator)
File "E:\Anaconda3\envs\pase-master\lib\site-packages\torch\utils\data\dataloader.py", line 345, in next
data = self._next_data()
File "E:\Anaconda3\envs\pase-master\lib\site-packages\torch\utils\data\dataloader.py", line 831, in _next_data
raise StopIteration
StopIteration

During handling of the above exception, another exception occurred:

Epoch 0/150: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 24/24 [01:16<00:00, 2.64s/it]

Beginning evaluation...
Eval: 1/3: 0%| | 0/2 [00:00<?, ?it/s]
Traceback (most recent call last):
File "/nfs/cold_project/sunjianwei/research/pase/pase/models/WorkerScheduler/trainer.py", line 295, in _eval
batch = next(iterator)
File "/home/luban/sunjianwei/tf2.0_py3.6_all_luban/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 345, in next
data = self._next_data()
File "/home/luban/sunjianwei/tf2.0_py3.6_all_luban/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 831, in _next_data
raise StopIteration
StopIteration

During handling of the above exception, another exception occurred:

Question about distortion file

HI,

I'm trying to fine-tune PASE+ model on my own dataset, but it seems that I'm getting this error for the training script. I was able to correctly produce the stats file and .scp files with the provided python script.

Here's my output from my train.py.

[!] Using CPU Seeds initialized to 2 {'regr': [{'num_outputs': 1, 'dropout': 0, 'dropout_time': 0.0, 'hidden_layers': 1, 'name': 'cchunk', 'type': 'decoder', 'hidden_size': 64, 'fmaps': [512, 256, 128], 'strides': [4, 4, 10], 'kwidths': [30, 30, 30], 'loss': <pase.losses.ContextualizedLoss object at 0x2b3d9d6a49d0>}, {'num_outputs': 3075, 'dropout': 0, 'hidden_size': 256, 'hidden_layers': 1, 'name': 'lps', 'context': 1, 'r': 7, 'loss': <pase.losses.ContextualizedLoss object at 0x2b3d9d58bc10>, 'skip': False}, {'num_outputs': 3075, 'dropout': 0, 'hidden_size': 256, 'hidden_layers': 1, 'name': 'lps_long', 'context': 1, 'r': 7, 'transform': {'win': 512}, 'loss': <pase.losses.ContextualizedLoss object at 0x2b3d9d6a4a50>, 'skip': False}, {'num_outputs': 120, 'dropout': 0, 'hidden_size': 256, 'hidden_layers': 1, 'name': 'fbank', 'context': 1, 'r': 7, 'loss': <pase.losses.ContextualizedLoss object at 0x2b3d9d6a4a90>, 'skip': False}, {'num_outputs': 120, 'dropout': 0, 'hidden_size': 256, 'hidden_layers': 1, 'name': 'fbank_long', 'context': 1, 'r': 7, 'transform': {'win': 1024, 'n_fft': 1024}, 'loss': <pase.losses.ContextualizedLoss object at 0x2b3d9d6a4ad0>, 'skip': False}, {'num_outputs': 120, 'dropout': 0, 'hidden_size': 256, 'hidden_layers': 1, 'name': 'gtn', 'context': 1, 'r': 7, 'loss': <pase.losses.ContextualizedLoss object at 0x2b3d9d6a4b10>, 'skip': False}, {'num_outputs': 120, 'dropout': 0, 'hidden_size': 256, 'hidden_layers': 1, 'name': 'gtn_long', 'context': 1, 'r': 7, 'loss': <pase.losses.ContextualizedLoss object at 0x2b3d9d6a4b50>, 'transform': {'win': 2048}, 'skip': False}, {'num_outputs': 39, 'dropout': 0, 'hidden_size': 256, 'hidden_layers': 1, 'name': 'mfcc', 'context': 1, 'r': 7, 'loss': <pase.losses.ContextualizedLoss object at 0x2b3d9d6a4b90>, 'skip': False}, {'num_outputs': 60, 'dropout': 0, 'hidden_size': 256, 'hidden_layers': 1, 'name': 'mfcc_long', 'context': 1, 'r': 7, 'transform': {'win': 2048, 'order': 20}, 'loss': <pase.losses.ContextualizedLoss object at 0x2b3d9d6a4bd0>, 'skip': False}, {'num_outputs': 12, 'dropout': 0, 'hidden_size': 256, 'hidden_layers': 1, 'name': 'prosody', 'context': 1, 'r': 7, 'loss': <pase.losses.ContextualizedLoss object at 0x2b3d9d6a4c10>, 'skip': False}], 'cls': [{'num_outputs': 1, 'dropout': 0, 'hidden_size': 256, 'hidden_layers': 1, 'name': 'mi', 'loss': <pase.losses.ContextualizedLoss object at 0x2b3d9d6a4cd0>, 'skip': False, 'keys': ['chunk', 'chunk_ctxt', 'chunk_rand']}, {'num_outputs': 1, 'dropout': 0, 'hidden_size': 256, 'hidden_layers': 1, 'name': 'cmi', 'augment': True, 'loss': <pase.losses.ContextualizedLoss object at 0x2b3d9d6a4d90>, 'skip': False, 'keys': ['chunk', 'chunk_ctxt', 'chunk_rand']}]} Compose( ToTensor() MIChunkWav(32000) LPS(n_fft=2048, hop=160, win=400, device=cpu) LPS(n_fft=2048, hop=160, win=512, device=cpu) FBanks(n_fft=512, n_filters=40, hop=160, win=400 FBanks(n_fft=1024, n_filters=40, hop=160, win=1024 Gammatone(f_min=500, n_channels=40, hop=160, win=400) Gammatone(f_min=500, n_channels=40, hop=160, win=2048) MFCC(order=13, sr=16000) MFCC(order=20, sr=16000) Prosody(hop=160, win=320, f0_min=60, f0_max=300, sr=16000) ZNorm(data/PARK_stats.pkl) ) Preparing dset for <MY DATASET FOLDER> Found 0 *.npy ir_files in data/omologo_revs_bin
It seems that the issue is that there is no file called omologo_revs_bin inside data? If so, is it possible to get it?

Thank you in advance!

Request to share the pre-trained PASE model (FE_e149)

Please share the the pre-trained PASE model (FE_e149) on LibriSpeech dataset, if possible.

I have this issue too. I encountered a cuda mismatch after downloading Fastai 2.3.1

          I have this issue too. It has to do with the version of PyTorch. If you don't need the most recent version of pytorch, you can downgrade see (see [issue 29 ](https://github.com/salesforce/pytorch-qrnn/issues/29#issuecomment-660058989) from pytorch-qrnn repo).

A workaround I'm trying without downgrading pytorch is to use fastai's implementation of QRNN. I've created a fork of the PASE repo. pip install ninja and then upgrade or install fastai pip install -U fastai. The in pase.models.modules.py import QRNN from fastai and update line 55 when the QRNN module is called.
This works for the inference example presented in the README of the repo

from pase.models.frontend import wf_builder
pase = wf_builder('cfg/frontend/PASE+.cfg').eval()
pase.load_pretrained('FE_e199.ckpt', load_last=True, verbose=True)
pase.cuda()
# Now we can forward waveforms as Torch tensors
import torch
x = torch.randn(1, 1, 100000) # example with random noise to check shape
# y size will be (1, 256, 625), which are 625 frames of 256 dims each
y = pase(x.cuda(), device='cuda')

Disclaimer: The output dimensions match. I haven't checked whether the values match because that would require downgrading pytorch. I think for inference, this should work.

Originally posted by @kachiO in #114 (comment)

And now my cuda version is v11.6. How can I adapt to the construction of this QRNN.
The error is as follows： File "C:\Users\Administrator\Desktop\ggh\CarbonPrice\CarbonPrice\venv\lib\site-packages\torch\nn\modules\module.py", line 491, in cuda
return self._apply(lambda t: t.cuda(device))
File "C:\Users\Administrator\Desktop\ggh\CarbonPrice\CarbonPrice\venv\lib\site-packages\torch\nn\modules\module.py", line 387, in _apply
module._apply(fn)
File "C:\Users\Administrator\Desktop\ggh\CarbonPrice\CarbonPrice\venv\lib\site-packages\torch\nn\modules\module.py", line 387, in _apply
module._apply(fn)
File "C:\Users\Administrator\Desktop\ggh\CarbonPrice\CarbonPrice\venv\lib\site-packages\torch\nn\modules\module.py", line 387, in _apply
module._apply(fn)
File "C:\Users\Administrator\Desktop\ggh\CarbonPrice\CarbonPrice\venv\lib\site-packages\torch\nn\modules\module.py", line 409, in _apply
param_applied = fn(param)
File "C:\Users\Administrator\Desktop\ggh\CarbonPrice\CarbonPrice\venv\lib\site-packages\torch\nn\modules\module.py", line 491, in
return self.apply(lambda t: t.cuda(device))
File "C:\Users\Administrator\Desktop\ggh\CarbonPrice\CarbonPrice\venv\lib\site-packages\torch\cuda_init.py", line 164, in _lazy_init
raise AssertionError("Torch not compiled with CUDA enabled")
AssertionError: Torch not compiled with CUDA enabled

‘’StopIteration‘’ occurs during verification after training an epoch

There was a problem loading the validation set.
There is data in the verification set. I don't know why speakers are displayed as 0

santi-pdp / pase Goto Github PK

pase's People

Contributors

Stargazers

Watchers

Forkers

pase's Issues

Audio buffer is not finite everywhere

Padding size should be less than the corresponding input dimension

WARNING, path does not exist: KALDI_ROOT=/mnt/matylda5/iveselyk/Tools/kaldi-trunk

(please add 'export KALDI_ROOT=<your_path>' in your $HOME/.profile)

(or run as: KALDI_ROOT=<your_path> python <your_script>.py)

WARNING, path does not exist: KALDI_ROOT=/mnt/matylda5/iveselyk/Tools/kaldi-trunk

(please add 'export KALDI_ROOT=<your_path>' in your $HOME/.profile)

(or run as: KALDI_ROOT=<your_path> python <your_script>.py)

name cchunk

Dropout at the inputs disabled, as p=0

name lps

Dropout at the inputs disabled, as p=0.0

name mfcc

Dropout at the inputs disabled, as p=0.0

name prosody

Dropout at the inputs disabled, as p=0.0

name spc

================================================== name spc

num_inputs: 100 ctxt_frames: 5 num_inputs: 600 Dropout at the inputs disabled, as p=0.0

name mi

================================================== name mi

Dropout at the inputs disabled, as p=0.0

name cmi

================================================== name cmi

FeBlock( (conv): SincConv_fast() (norm): BatchNorm1d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (act): PReLU(num_parameters=64) ) Num params: 320

FeBlock( (conv): Conv1d(64, 64, kernel_size=(20,), stride=(10,)) (norm): BatchNorm1d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (act): PReLU(num_parameters=64) ) Num params: 82176

FeBlock( (conv): Conv1d(64, 128, kernel_size=(11,), stride=(2,)) (norm): BatchNorm1d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (act): PReLU(num_parameters=128) ) Num params: 90624

FeBlock( (conv): Conv1d(128, 128, kernel_size=(11,), stride=(1,)) (norm): BatchNorm1d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (act): PReLU(num_parameters=128) ) Num params: 180736

FeBlock( (conv): Conv1d(128, 256, kernel_size=(11,), stride=(2,)) (norm): BatchNorm1d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (act): PReLU(num_parameters=256) ) Num params: 361472

FeBlock( (conv): Conv1d(256, 256, kernel_size=(11,), stride=(1,)) (norm): BatchNorm1d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (act): PReLU(num_parameters=256) ) Num params: 721920

FeBlock( (conv): Conv1d(256, 512, kernel_size=(11,), stride=(2,)) (norm): BatchNorm1d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (act): PReLU(num_parameters=512) ) Num params: 1443840

FeBlock( (conv): Conv1d(512, 512, kernel_size=(11,), stride=(2,)) (norm): BatchNorm1d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (act): PReLU(num_parameters=512) ) Num params: 2885632

WaveFe total params: 5818020 Frontend params: 5818020

WARNING, path does not exist: KALDI_ROOT=/mnt/matylda5/iveselyk/Tools/kaldi-trunk

(please add 'export KALDI_ROOT=<your_path>' in your $HOME/.profile)

(or run as: KALDI_ROOT=<your_path> python <your_script>.py)

WARNING, path does not exist: KALDI_ROOT=/mnt/matylda5/iveselyk/Tools/kaldi-trunk

(please add 'export KALDI_ROOT=<your_path>' in your $HOME/.profile)

(or run as: KALDI_ROOT=<your_path> python <your_script>.py)

WARNING, path does not exist: KALDI_ROOT=/mnt/matylda5/iveselyk/Tools/kaldi-trunk

(please add 'export KALDI_ROOT=<your_path>' in your $HOME/.profile)

(or run as: KALDI_ROOT=<your_path> python <your_script>.py)

WARNING, path does not exist: KALDI_ROOT=/mnt/matylda5/iveselyk/Tools/kaldi-trunk

(please add 'export KALDI_ROOT=<your_path>' in your $HOME/.profile)

(or run as: KALDI_ROOT=<your_path> python <your_script>.py)

WARNING, path does not exist: KALDI_ROOT=/mnt/matylda5/iveselyk/Tools/kaldi-trunk

(please add 'export KALDI_ROOT=<your_path>' in your $HOME/.profile)

(or run as: KALDI_ROOT=<your_path> python <your_script>.py)

WARNING, path does not exist: KALDI_ROOT=/mnt/matylda5/iveselyk/Tools/kaldi-trunk

(please add 'export KALDI_ROOT=<your_path>' in your $HOME/.profile)

(or run as: KALDI_ROOT=<your_path> python <your_script>.py)

WARNING, path does not exist: KALDI_ROOT=/mnt/matylda5/iveselyk/Tools/kaldi-trunk

(please add 'export KALDI_ROOT=<your_path>' in your $HOME/.profile)

(or run as: KALDI_ROOT=<your_path> python <your_script>.py)

WARNING, path does not exist: KALDI_ROOT=/mnt/matylda5/iveselyk/Tools/kaldi-trunk

(please add 'export KALDI_ROOT=<your_path>' in your $HOME/.profile)

(or run as: KALDI_ROOT=<your_path> python <your_script>.py)

WARNING, path does not exist: KALDI_ROOT=/mnt/matylda5/iveselyk/Tools/kaldi-trunk

(please add 'export KALDI_ROOT=<your_path>' in your $HOME/.profile)

(or run as: KALDI_ROOT=<your_path> python <your_script>.py)

WARNING, path does not exist: KALDI_ROOT=/mnt/matylda5/iveselyk/Tools/kaldi-trunk

(please add 'export KALDI_ROOT=<your_path>' in your $HOME/.profile)

(or run as: KALDI_ROOT=<your_path> python <your_script>.py)

WARNING, path does not exist: KALDI_ROOT=/mnt/matylda5/iveselyk/Tools/kaldi-trunk

(please add 'export KALDI_ROOT=<your_path>' in your $HOME/.profile)

(or run as: KALDI_ROOT=<your_path> python <your_script>.py)

WARNING, path does not exist: KALDI_ROOT=/mnt/matylda5/iveselyk/Tools/kaldi-trunk

(please add 'export KALDI_ROOT=<your_path>' in your $HOME/.profile)

(or run as: KALDI_ROOT=<your_path> python <your_script>.py)

WARNING, path does not exist: KALDI_ROOT=/mnt/matylda5/iveselyk/Tools/kaldi-trunk

(please add 'export KALDI_ROOT=<your_path>' in your $HOME/.profile)

(or run as: KALDI_ROOT=<your_path> python <your_script>.py)

WARNING, path does not exist: KALDI_ROOT=/mnt/matylda5/iveselyk/Tools/kaldi-trunk

==================================================
name spc

num_inputs: 100
ctxt_frames: 5
num_inputs: 600
Dropout at the inputs disabled, as p=0.0

==================================================
name mi

==================================================
name cmi

FeBlock(
(conv): SincConv_fast()
(norm): BatchNorm1d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(act): PReLU(num_parameters=64)
)
Num params: 320

FeBlock(
(conv): Conv1d(64, 64, kernel_size=(20,), stride=(10,))
(norm): BatchNorm1d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(act): PReLU(num_parameters=64)
)
Num params: 82176

FeBlock(
(conv): Conv1d(64, 128, kernel_size=(11,), stride=(2,))
(norm): BatchNorm1d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(act): PReLU(num_parameters=128)
)
Num params: 90624

FeBlock(
(conv): Conv1d(128, 128, kernel_size=(11,), stride=(1,))
(norm): BatchNorm1d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(act): PReLU(num_parameters=128)
)
Num params: 180736

FeBlock(
(conv): Conv1d(128, 256, kernel_size=(11,), stride=(2,))
(norm): BatchNorm1d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(act): PReLU(num_parameters=256)
)
Num params: 361472

FeBlock(
(conv): Conv1d(256, 256, kernel_size=(11,), stride=(1,))
(norm): BatchNorm1d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(act): PReLU(num_parameters=256)
)
Num params: 721920

FeBlock(
(conv): Conv1d(256, 512, kernel_size=(11,), stride=(2,))
(norm): BatchNorm1d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(act): PReLU(num_parameters=512)
)
Num params: 1443840

FeBlock(
(conv): Conv1d(512, 512, kernel_size=(11,), stride=(2,))
(norm): BatchNorm1d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(act): PReLU(num_parameters=512)
)
Num params: 2885632

WaveFe total params: 5818020
Frontend params: 5818020