GithubHelp home page GithubHelp logo

kaituoxu / listen-attend-spell Goto Github PK

View Code? Open in Web Editor NEW
201.0 201.0 56.0 670 KB

A PyTorch implementation of Listen, Attend and Spell (LAS), an End-to-End ASR framework.

Shell 19.38% Python 79.84% Makefile 0.78%
asr end-to-end listen-attend-and-spell pytorch

listen-attend-spell's People

Contributors

kaituoxu avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

listen-attend-spell's Issues

Unable to extract aishell data

I downloaded the data_aishell.tgz file from opensrl website. It showed to be around 15GB in size. Later on, when I tried to extract files from it, it displays an error that says : "Invalid Compressed Data: Unable to Inflate". I used WinZip for the purpose. Can someone please help me with this?
I can proceed only when I have extracted the data. Thanks in advance.

Time Resolution Question.

Hi. @kaituoxu
thanks to your good project.
if i don't use bucketing input data, shouldn't i use time resolution?
i think that time resolution makes unbalance split if not bucketing.

Alternate dataset

Can we make use of the TIMIT dataset instead of the Aishell dataset. If so, what vocabulary dictionary should we make use in doing so? I am having some problems with the aishell dataset. That is why I have to proceed with the TImit dataset. Moreover, I need the English dataset to better understand the models.

python编码问题

非常感谢你分享的代码!我用你的代码跑了一下实验,有些文件不是unix编码风格,需要使用set ff=unix命令转换。你建议使用python3来运行代码,但是text2token.py是python2风格,导致之后data2json.sh的输出json文件是空。请问该如何修改代码?感谢

can't run aishell_data_prep.sh on original data, organised the same way with aishell

Hello again!

Even if I didn't succeed with training and decoding stages on original dataset, I decided try to run run.sh on my own data (annotations in russian), but I organised it exactly the same way that aishell data organised.

I created dir rus_data with transcript and wav dirs inside. I put inside trancript dir txt file in format: filename-without-extention, space, transcription
In wav dir I created train, test and dev dirs, and puted inside them dirs with wavs.

After that I changed pathes in run.sh that aishell_data_prep.sh requires.

But run.sh throws the error:

Stage 0: Data Preparation
Error: local/aishell_data_prep.sh requires two directory arguments

if I run separately aishell_data_prep.sh with 2 pathes I face the same error.

Do you have any ideas why is it? What should I change?

Mistakes in training

Hello!
For the first time running LAS, I want to use your script to learn. The following errors have been made in the training of Python 3.6 and torch 1.7. Is it because which of the two versions is too high?

image

Looking forward to your reply
Thank you!

Some error in stage=3

Thank you for sharing your code.
But I have some trouble in stage=3.
TypeError lstm() received an invalid combination of arguments -got(Tensor,Tensor,tuple,list,bool,int,float,bool,int) but expected one of (Tensor data, Tensor batch_sizes,tuple of Tensor hs,tuple of paras, bool has_biases, int num_layers, float dropout, bool train, bool bidirectional)
Have you met this problem before?Thanks again.

train from previous checkpoint

Hi

I tried to train model from previous checkpoint

For example, I trained the model during 100 epochs and got the final.pth.tar file.
I put the abs path to it in the run.sh in lines:

...
# logging and visualize
checkpoint=0
continue_from="/home/karina/Listen-Attend-Spell/egs/aishell/exp/train_in240_hidden256_e3_lstm_drop0.2_dot_emb512_hidden512_d1_epoch100_norm5_bs64_mli800_mlo150_adam_lr1e-3_mmt0_l21e-5_delta/final.pth.tar"
print_freq=10
visdom=0
visdom_id="LAS Training"
...

but training exiting with this log:

# train.py --train_json dump/train/deltatrue/data.json --valid_json dump/dev/deltatrue/data.json --dict data/lang_1char/train_chars.txt --einput 240 --ehidden 256 --elayer 3 --edropout 0.2 --ebidirectional 1 --etype lstm --atype dot --dembed 512 --dhidden 512 --dlayer 1 --epochs 10 --half_lr 1 --early_stop 0 --max_norm 5 --batch_size 64 --maxlen_in 800 --maxlen_out 150 --optimizer adam --lr 1e-3 --momentum 0 --l2 1e-5 --save_folder exp/train_in240_hidden256_e3_lstm_drop0.2_dot_emb512_hidden512_d1_epoch10_norm5_bs64_mli800_mlo150_adam_lr1e-3_mmt0_l21e-5_delta --checkpoint 1 --continue_from /home/karina/Listen-Attend-Spell/egs/aishell/exp/train_in240_hidden256_e3_lstm_drop0.2_dot_emb512_hidden512_d1_epoch100_norm5_bs64_mli800_mlo150_adam_lr1e-3_mmt0_l21e-5_delta/final.pth.tar --print_freq 10 --visdom 0 --visdom_id "LAS Training" 
# Started at Fri Sep 13 03:00:41 MSK 2019
#
Namespace(atype='dot', batch_size=64, checkpoint=1, continue_from='/home/karina/Listen-Attend-Spell/egs/aishell/exp/train_in240_hidden256_e3_lstm_drop0.2_dot_emb512_hidden512_d1_epoch100_norm5_bs64_mli800_mlo150_adam_lr1e-3_mmt0_l21e-5_delta/final.pth.tar', dembed=512, dhidden=512, dict='data/lang_1char/train_chars.txt', dlayer=1, early_stop=0, ebidirectional=1, edropout=0.2, ehidden=256, einput=240, elayer=3, epochs=10, etype='lstm', half_lr=1, l2=1e-05, lr=0.001, max_norm=5.0, maxlen_in=800, maxlen_out=150, model_path='final.pth.tar', momentum=0.0, num_workers=4, optimizer='adam', print_freq=10, save_folder='exp/train_in240_hidden256_e3_lstm_drop0.2_dot_emb512_hidden512_d1_epoch10_norm5_bs64_mli800_mlo150_adam_lr1e-3_mmt0_l21e-5_delta', train_json='dump/train/deltatrue/data.json', valid_json='dump/dev/deltatrue/data.json', visdom=0, visdom_id='LAS Training')
Seq2Seq(
  (encoder): Encoder(
    (rnn): LSTM(240, 256, num_layers=3, batch_first=True, dropout=0.2, bidirectional=True)
  )
  (decoder): Decoder(
    (embedding): Embedding(38, 512)
    (rnn): ModuleList(
      (0): LSTMCell(1024, 512)
    )
    (attention): DotProductAttention()
    (mlp): Sequential(
      (0): Linear(in_features=1024, out_features=512, bias=True)
      (1): Tanh()
      (2): Linear(in_features=512, out_features=38, bias=True)
    )
  )
)
Loading checkpoint model /home/karina/Listen-Attend-Spell/egs/aishell/exp/train_in240_hidden256_e3_lstm_drop0.2_dot_emb512_hidden512_d1_epoch100_norm5_bs64_mli800_mlo150_adam_lr1e-3_mmt0_l21e-5_delta/final.pth.tar
Traceback (most recent call last):
  File "/home/karina/Listen-Attend-Spell/egs/aishell/../../src/bin/train.py", line 146, in <module>
    main(args)
  File "/home/karina/Listen-Attend-Spell/egs/aishell/../../src/bin/train.py", line 139, in main
    solver = Solver(data, model, optimizier, args)
  File "/home/karina/Listen-Attend-Spell/src/solver/solver.py", line 43, in __init__
    self._reset()
  File "/home/karina/Listen-Attend-Spell/src/solver/solver.py", line 53, in _reset
    self.tr_loss[:self.start_epoch] = package['tr_loss'][:self.start_epoch]
RuntimeError: The expanded size of the tensor (10) must match the existing size (13) at non-singleton dimension 0.  Target sizes: [10].  Tensor sizes: [13]
# Accounting: time=4 threads=1
# Ended (code 1) at Fri Sep 13 03:00:45 MSK 2019, elapsed time 4 seconds

what object can give this tensor size problem?
do I correctly use training from checkpoint?

At which point in the model is the temporal dimension of the input feature reduced ?

In the original LAS paper, we can read :

In our model, we stack 3 pBLSTMs on top of the bottom BLSTM layer to reduce the time resolution
2
3 = 8 times. This allows the attention model (see next section) to extract the relevant information
from a smaller number of times steps.

My understanding is that this temporal squishing is performed by the encoder. However, when I pass a tensor of size [4, 963, 128] along to the length tensor, to the encoder (bs = 4, max_length = 963, num_features = 128'), I get an output of size [4, 963, 1024]` 1024 is the size of the hidden layer so this makes sense but 963 is the original max_length. Is this supposed to happen ? Should the length be smaller after passing through the encoder, or is this perfectly normal and something happens in the decoder?

decoding error after successful aishell train

Hi! I managed to train LAS on aishell data without errors. This is the end of the log:

Epoch 20 | Iter 441 | Average Loss 0.406 | Current Loss 0.505424 | 64.8 ms/batch
Epoch 20 | Iter 451 | Average Loss 0.409 | Current Loss 0.383116 | 64.1 ms/batch
-------------------------------------------------------------------------------------
Valid Summary | End of Epoch 20 | Time 956.81s | Valid Loss 0.410
-------------------------------------------------------------------------------------
Learning rate adjusted to: 0.000000
Find better validated model, saving to exp/train_in240_hidden256_e3_lstm_drop0.2_dot_emb512_hidden512_d1_epoch20_norm5_bs32_mli800_mlo150_adam_lr1e-3_mmt0_l21e-5_delta/final.pth.tar
# Accounting: time=21312 threads=1
# Ended (code 0) at Fri Aug 30 17:15:39 MSK 2019, elapsed time 21312 seconds

but decoding stage gave an error:

Stage 4: Decoding
run.pl: job failed, log is in exp/train_in240_hidden256_e3_lstm_drop0.2_dot_emb512_hidden512_d1_epoch20_norm5_bs32_mli800_mlo150_adam_lr1e-3_mmt0_l21e-5_delta/decode_test_beam30_nbest1_ml100/decode.log
2019-08-30 17:15:39,608 (json2trn:24) INFO: reading exp/train_in240_hidden256_e3_lstm_drop0.2_dot_emb512_hidden512_d1_epoch20_norm5_bs32_mli800_mlo150_adam_lr1e-3_mmt0_l21e-5_delta/decode_test_beam30_nbest1_ml100/data.json
Traceback (most recent call last):
 File “/home/karina/Listen-Attend-Spell/egs/aishell/../../src/utils/json2trn.py”, line 25, in <module>
   with open(args.json, ‘r’) as f:
FileNotFoundError: [Errno 2] No such file or directory: ‘exp/train_in240_hidden256_e3_lstm_drop0.2_dot_emb512_hidden512_d1_epoch20_norm5_bs32_mli800_mlo150_adam_lr1e-3_mmt0_l21e-5_delta/decode_test_beam30_nbest1_ml100/data.json’
write a CER (or TER) result in exp/train_in240_hidden256_e3_lstm_drop0.2_dot_emb512_hidden512_d1_epoch20_norm5_bs32_mli800_mlo150_adam_lr1e-3_mmt0_l21e-5_delta/decode_test_beam30_nbest1_ml100/result.txt
|      SPKR        |         # Snt                   # Wrd         |      Corr              Sub              Del              Ins              Err            S.Err      |
|      Sum/Avg     |             0                       0         |       0.0              0.0              0.0              0.0              0.0              0.0      |

I don't understand why there are no some file in that directory. I thought everything that run.pl need are generated by themself there

can't run train and decode stages on aishell dataset

Hello! I tried to run run.sh for aishell dataset to test the usage of your code/algo and I didn't meet success in this task. I faced a problem on a stage 3:

Stage 3: Network Training
run.pl: job failed, log is in exp/train_in240_hidden256_e3_lstm_drop0.2_dot_emb512_hidden512_d1_epoch20_norm5_bs32_mli800_mlo150_adam_lr1e-3_mmt0_l21e-5_delta/train.log
Stage 4: Decoding
run.pl: job failed, log is in exp/train_in240_hidden256_e3_lstm_drop0.2_dot_emb512_hidden512_d1_epoch20_norm5_bs32_mli800_mlo150_adam_lr1e-3_mmt0_l21e-5_delta/decode_test_beam30_nbest1_ml100/decode.log
2019-08-23 20:05:08,030 (json2trn:24) INFO: reading exp/train_in240_hidden256_e3_lstm_drop0.2_dot_emb512_hidden512_d1_epoch20_norm5_bs32_mli800_mlo150_adam_lr1e-3_mmt0_l21e-5_delta/decode_test_beam30_nbest1_ml100/data.json
Traceback (most recent call last):
  File "/home/karina/Listen-Attend-Spell/egs/aishell/../../src/utils/json2trn.py", line 25, in <module>
    with open(args.json, 'r') as f:
IOError: [Errno 2] No such file or directory: 'exp/train_in240_hidden256_e3_lstm_drop0.2_dot_emb512_hidden512_d1_epoch20_norm5_bs32_mli800_mlo150_adam_lr1e-3_mmt0_l21e-5_delta/decode_test_beam30_nbest1_ml100/data.json'
cp: cannot stat 'exp/train_in240_hidden256_e3_lstm_drop0.2_dot_emb512_hidden512_d1_epoch20_norm5_bs32_mli800_mlo150_adam_lr1e-3_mmt0_l21e-5_delta/decode_test_beam30_nbest1_ml100/ref.trn': No such file or directory
cp: cannot stat 'exp/train_in240_hidden256_e3_lstm_drop0.2_dot_emb512_hidden512_d1_epoch20_norm5_bs32_mli800_mlo150_adam_lr1e-3_mmt0_l21e-5_delta/decode_test_beam30_nbest1_ml100/hyp.trn': No such file or directory
Traceback (most recent call last):
  File "/home/karina/Listen-Attend-Spell/egs/aishell/../../src/utils/filt.py", line 21, in <module>
    with open(args.infile) as textfile:
IOError: [Errno 2] No such file or directory: 'exp/train_in240_hidden256_e3_lstm_drop0.2_dot_emb512_hidden512_d1_epoch20_norm5_bs32_mli800_mlo150_adam_lr1e-3_mmt0_l21e-5_delta/decode_test_beam30_nbest1_ml100/ref.trn.org'
Traceback (most recent call last):
  File "/home/karina/Listen-Attend-Spell/egs/aishell/../../src/utils/filt.py", line 21, in <module>
    with open(args.infile) as textfile:
IOError: [Errno 2] No such file or directory: 'exp/train_in240_hidden256_e3_lstm_drop0.2_dot_emb512_hidden512_d1_epoch20_norm5_bs32_mli800_mlo150_adam_lr1e-3_mmt0_l21e-5_delta/decode_test_beam30_nbest1_ml100/hyp.trn.org'
write a CER (or TER) result in exp/train_in240_hidden256_e3_lstm_drop0.2_dot_emb512_hidden512_d1_epoch20_norm5_bs32_mli800_mlo150_adam_lr1e-3_mmt0_l21e-5_delta/decode_test_beam30_nbest1_ml100/result.txt
|      SPKR        |         # Snt                   # Wrd         |      Corr              Sub              Del              Ins              Err            S.Err      |
|      Sum/Avg     |             0                       0         |       0.0              0.0              0.0              0.0              0.0              0.0      |

what should I do to fix it?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.