kaituoxu / listen-attend-spell Goto Github PK
View Code? Open in Web Editor NEWA PyTorch implementation of Listen, Attend and Spell (LAS), an End-to-End ASR framework.
A PyTorch implementation of Listen, Attend and Spell (LAS), an End-to-End ASR framework.
I downloaded the data_aishell.tgz file from opensrl website. It showed to be around 15GB in size. Later on, when I tried to extract files from it, it displays an error that says : "Invalid Compressed Data: Unable to Inflate". I used WinZip for the purpose. Can someone please help me with this?
I can proceed only when I have extracted the data. Thanks in advance.
Hi. @kaituoxu
thanks to your good project.
if i don't use bucketing input data, shouldn't i use time resolution?
i think that time resolution makes unbalance split if not bucketing.
Can we make use of the TIMIT dataset instead of the Aishell dataset. If so, what vocabulary dictionary should we make use in doing so? I am having some problems with the aishell dataset. That is why I have to proceed with the TImit dataset. Moreover, I need the English dataset to better understand the models.
非常感谢你分享的代码!我用你的代码跑了一下实验,有些文件不是unix编码风格,需要使用set ff=unix命令转换。你建议使用python3来运行代码,但是text2token.py是python2风格,导致之后data2json.sh的输出json文件是空。请问该如何修改代码?感谢
Hello again!
Even if I didn't succeed with training and decoding stages on original dataset, I decided try to run run.sh on my own data (annotations in russian), but I organised it exactly the same way that aishell data organised.
I created dir rus_data with transcript and wav dirs inside. I put inside trancript dir txt file in format: filename-without-extention, space, transcription
In wav dir I created train, test and dev dirs, and puted inside them dirs with wavs.
After that I changed pathes in run.sh that aishell_data_prep.sh requires.
But run.sh throws the error:
Stage 0: Data Preparation
Error: local/aishell_data_prep.sh requires two directory arguments
if I run separately aishell_data_prep.sh with 2 pathes I face the same error.
Do you have any ideas why is it? What should I change?
Thank you for sharing your code.
But I have some trouble in stage=3.
TypeError lstm() received an invalid combination of arguments -got(Tensor,Tensor,tuple,list,bool,int,float,bool,int) but expected one of (Tensor data, Tensor batch_sizes,tuple of Tensor hs,tuple of paras, bool has_biases, int num_layers, float dropout, bool train, bool bidirectional)
Have you met this problem before?Thanks again.
Hi
I tried to train model from previous checkpoint
For example, I trained the model during 100 epochs and got the final.pth.tar file.
I put the abs path to it in the run.sh in lines:
...
# logging and visualize
checkpoint=0
continue_from="/home/karina/Listen-Attend-Spell/egs/aishell/exp/train_in240_hidden256_e3_lstm_drop0.2_dot_emb512_hidden512_d1_epoch100_norm5_bs64_mli800_mlo150_adam_lr1e-3_mmt0_l21e-5_delta/final.pth.tar"
print_freq=10
visdom=0
visdom_id="LAS Training"
...
but training exiting with this log:
# train.py --train_json dump/train/deltatrue/data.json --valid_json dump/dev/deltatrue/data.json --dict data/lang_1char/train_chars.txt --einput 240 --ehidden 256 --elayer 3 --edropout 0.2 --ebidirectional 1 --etype lstm --atype dot --dembed 512 --dhidden 512 --dlayer 1 --epochs 10 --half_lr 1 --early_stop 0 --max_norm 5 --batch_size 64 --maxlen_in 800 --maxlen_out 150 --optimizer adam --lr 1e-3 --momentum 0 --l2 1e-5 --save_folder exp/train_in240_hidden256_e3_lstm_drop0.2_dot_emb512_hidden512_d1_epoch10_norm5_bs64_mli800_mlo150_adam_lr1e-3_mmt0_l21e-5_delta --checkpoint 1 --continue_from /home/karina/Listen-Attend-Spell/egs/aishell/exp/train_in240_hidden256_e3_lstm_drop0.2_dot_emb512_hidden512_d1_epoch100_norm5_bs64_mli800_mlo150_adam_lr1e-3_mmt0_l21e-5_delta/final.pth.tar --print_freq 10 --visdom 0 --visdom_id "LAS Training"
# Started at Fri Sep 13 03:00:41 MSK 2019
#
Namespace(atype='dot', batch_size=64, checkpoint=1, continue_from='/home/karina/Listen-Attend-Spell/egs/aishell/exp/train_in240_hidden256_e3_lstm_drop0.2_dot_emb512_hidden512_d1_epoch100_norm5_bs64_mli800_mlo150_adam_lr1e-3_mmt0_l21e-5_delta/final.pth.tar', dembed=512, dhidden=512, dict='data/lang_1char/train_chars.txt', dlayer=1, early_stop=0, ebidirectional=1, edropout=0.2, ehidden=256, einput=240, elayer=3, epochs=10, etype='lstm', half_lr=1, l2=1e-05, lr=0.001, max_norm=5.0, maxlen_in=800, maxlen_out=150, model_path='final.pth.tar', momentum=0.0, num_workers=4, optimizer='adam', print_freq=10, save_folder='exp/train_in240_hidden256_e3_lstm_drop0.2_dot_emb512_hidden512_d1_epoch10_norm5_bs64_mli800_mlo150_adam_lr1e-3_mmt0_l21e-5_delta', train_json='dump/train/deltatrue/data.json', valid_json='dump/dev/deltatrue/data.json', visdom=0, visdom_id='LAS Training')
Seq2Seq(
(encoder): Encoder(
(rnn): LSTM(240, 256, num_layers=3, batch_first=True, dropout=0.2, bidirectional=True)
)
(decoder): Decoder(
(embedding): Embedding(38, 512)
(rnn): ModuleList(
(0): LSTMCell(1024, 512)
)
(attention): DotProductAttention()
(mlp): Sequential(
(0): Linear(in_features=1024, out_features=512, bias=True)
(1): Tanh()
(2): Linear(in_features=512, out_features=38, bias=True)
)
)
)
Loading checkpoint model /home/karina/Listen-Attend-Spell/egs/aishell/exp/train_in240_hidden256_e3_lstm_drop0.2_dot_emb512_hidden512_d1_epoch100_norm5_bs64_mli800_mlo150_adam_lr1e-3_mmt0_l21e-5_delta/final.pth.tar
Traceback (most recent call last):
File "/home/karina/Listen-Attend-Spell/egs/aishell/../../src/bin/train.py", line 146, in <module>
main(args)
File "/home/karina/Listen-Attend-Spell/egs/aishell/../../src/bin/train.py", line 139, in main
solver = Solver(data, model, optimizier, args)
File "/home/karina/Listen-Attend-Spell/src/solver/solver.py", line 43, in __init__
self._reset()
File "/home/karina/Listen-Attend-Spell/src/solver/solver.py", line 53, in _reset
self.tr_loss[:self.start_epoch] = package['tr_loss'][:self.start_epoch]
RuntimeError: The expanded size of the tensor (10) must match the existing size (13) at non-singleton dimension 0. Target sizes: [10]. Tensor sizes: [13]
# Accounting: time=4 threads=1
# Ended (code 1) at Fri Sep 13 03:00:45 MSK 2019, elapsed time 4 seconds
what object can give this tensor size problem?
do I correctly use training from checkpoint?
In the original LAS paper, we can read :
In our model, we stack 3 pBLSTMs on top of the bottom BLSTM layer to reduce the time resolution
2
3 = 8 times. This allows the attention model (see next section) to extract the relevant information
from a smaller number of times steps.
My understanding is that this temporal squishing is performed by the encoder. However, when I pass a tensor of size [4, 963, 128]
along to the length tensor, to the encoder (bs = 4, max_length = 963, num_features = 128'), I get an output of size
[4, 963, 1024]` 1024 is the size of the hidden layer so this makes sense but 963 is the original max_length. Is this supposed to happen ? Should the length be smaller after passing through the encoder, or is this perfectly normal and something happens in the decoder?
I do not know how to transform the VCTK data into kaldi format; the model seems to be a fit for aishell which is another (mandarin) corpus.
Here is more detail: https://docs.google.com/document/d/1jkljv9BlOkVwP7E78EpSh7nSiYqrY2tXXyWOduPD74g/edit#
Hi~ Have you tried to train on librispeech? If you have tried, could you tell me the WER on test? Thank you!
Hi! I managed to train LAS on aishell data without errors. This is the end of the log:
Epoch 20 | Iter 441 | Average Loss 0.406 | Current Loss 0.505424 | 64.8 ms/batch
Epoch 20 | Iter 451 | Average Loss 0.409 | Current Loss 0.383116 | 64.1 ms/batch
-------------------------------------------------------------------------------------
Valid Summary | End of Epoch 20 | Time 956.81s | Valid Loss 0.410
-------------------------------------------------------------------------------------
Learning rate adjusted to: 0.000000
Find better validated model, saving to exp/train_in240_hidden256_e3_lstm_drop0.2_dot_emb512_hidden512_d1_epoch20_norm5_bs32_mli800_mlo150_adam_lr1e-3_mmt0_l21e-5_delta/final.pth.tar
# Accounting: time=21312 threads=1
# Ended (code 0) at Fri Aug 30 17:15:39 MSK 2019, elapsed time 21312 seconds
but decoding stage gave an error:
Stage 4: Decoding
run.pl: job failed, log is in exp/train_in240_hidden256_e3_lstm_drop0.2_dot_emb512_hidden512_d1_epoch20_norm5_bs32_mli800_mlo150_adam_lr1e-3_mmt0_l21e-5_delta/decode_test_beam30_nbest1_ml100/decode.log
2019-08-30 17:15:39,608 (json2trn:24) INFO: reading exp/train_in240_hidden256_e3_lstm_drop0.2_dot_emb512_hidden512_d1_epoch20_norm5_bs32_mli800_mlo150_adam_lr1e-3_mmt0_l21e-5_delta/decode_test_beam30_nbest1_ml100/data.json
Traceback (most recent call last):
File “/home/karina/Listen-Attend-Spell/egs/aishell/../../src/utils/json2trn.py”, line 25, in <module>
with open(args.json, ‘r’) as f:
FileNotFoundError: [Errno 2] No such file or directory: ‘exp/train_in240_hidden256_e3_lstm_drop0.2_dot_emb512_hidden512_d1_epoch20_norm5_bs32_mli800_mlo150_adam_lr1e-3_mmt0_l21e-5_delta/decode_test_beam30_nbest1_ml100/data.json’
write a CER (or TER) result in exp/train_in240_hidden256_e3_lstm_drop0.2_dot_emb512_hidden512_d1_epoch20_norm5_bs32_mli800_mlo150_adam_lr1e-3_mmt0_l21e-5_delta/decode_test_beam30_nbest1_ml100/result.txt
| SPKR | # Snt # Wrd | Corr Sub Del Ins Err S.Err |
| Sum/Avg | 0 0 | 0.0 0.0 0.0 0.0 0.0 0.0 |
I don't understand why there are no some file in that directory. I thought everything that run.pl need are generated by themself there
Hello! I tried to run run.sh for aishell dataset to test the usage of your code/algo and I didn't meet success in this task. I faced a problem on a stage 3:
Stage 3: Network Training
run.pl: job failed, log is in exp/train_in240_hidden256_e3_lstm_drop0.2_dot_emb512_hidden512_d1_epoch20_norm5_bs32_mli800_mlo150_adam_lr1e-3_mmt0_l21e-5_delta/train.log
Stage 4: Decoding
run.pl: job failed, log is in exp/train_in240_hidden256_e3_lstm_drop0.2_dot_emb512_hidden512_d1_epoch20_norm5_bs32_mli800_mlo150_adam_lr1e-3_mmt0_l21e-5_delta/decode_test_beam30_nbest1_ml100/decode.log
2019-08-23 20:05:08,030 (json2trn:24) INFO: reading exp/train_in240_hidden256_e3_lstm_drop0.2_dot_emb512_hidden512_d1_epoch20_norm5_bs32_mli800_mlo150_adam_lr1e-3_mmt0_l21e-5_delta/decode_test_beam30_nbest1_ml100/data.json
Traceback (most recent call last):
File "/home/karina/Listen-Attend-Spell/egs/aishell/../../src/utils/json2trn.py", line 25, in <module>
with open(args.json, 'r') as f:
IOError: [Errno 2] No such file or directory: 'exp/train_in240_hidden256_e3_lstm_drop0.2_dot_emb512_hidden512_d1_epoch20_norm5_bs32_mli800_mlo150_adam_lr1e-3_mmt0_l21e-5_delta/decode_test_beam30_nbest1_ml100/data.json'
cp: cannot stat 'exp/train_in240_hidden256_e3_lstm_drop0.2_dot_emb512_hidden512_d1_epoch20_norm5_bs32_mli800_mlo150_adam_lr1e-3_mmt0_l21e-5_delta/decode_test_beam30_nbest1_ml100/ref.trn': No such file or directory
cp: cannot stat 'exp/train_in240_hidden256_e3_lstm_drop0.2_dot_emb512_hidden512_d1_epoch20_norm5_bs32_mli800_mlo150_adam_lr1e-3_mmt0_l21e-5_delta/decode_test_beam30_nbest1_ml100/hyp.trn': No such file or directory
Traceback (most recent call last):
File "/home/karina/Listen-Attend-Spell/egs/aishell/../../src/utils/filt.py", line 21, in <module>
with open(args.infile) as textfile:
IOError: [Errno 2] No such file or directory: 'exp/train_in240_hidden256_e3_lstm_drop0.2_dot_emb512_hidden512_d1_epoch20_norm5_bs32_mli800_mlo150_adam_lr1e-3_mmt0_l21e-5_delta/decode_test_beam30_nbest1_ml100/ref.trn.org'
Traceback (most recent call last):
File "/home/karina/Listen-Attend-Spell/egs/aishell/../../src/utils/filt.py", line 21, in <module>
with open(args.infile) as textfile:
IOError: [Errno 2] No such file or directory: 'exp/train_in240_hidden256_e3_lstm_drop0.2_dot_emb512_hidden512_d1_epoch20_norm5_bs32_mli800_mlo150_adam_lr1e-3_mmt0_l21e-5_delta/decode_test_beam30_nbest1_ml100/hyp.trn.org'
write a CER (or TER) result in exp/train_in240_hidden256_e3_lstm_drop0.2_dot_emb512_hidden512_d1_epoch20_norm5_bs32_mli800_mlo150_adam_lr1e-3_mmt0_l21e-5_delta/decode_test_beam30_nbest1_ml100/result.txt
| SPKR | # Snt # Wrd | Corr Sub Del Ins Err S.Err |
| Sum/Avg | 0 0 | 0.0 0.0 0.0 0.0 0.0 0.0 |
what should I do to fix it?
Thank you for sharing your code, But lexcion is missing in your Github.
I wonder if it is convenient for you to show the format of these two files?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.