cywang97 / streamingtransformer Goto Github PK
View Code? Open in Web Editor NEWLicense: Apache License 2.0
License: Apache License 2.0
When the chunk size is 32, the maximum decoding delay should be 1280 ms. Is that right?
Hi,
I am currently trying to run the librispeech demo. But, when i run ./train.sh, error happens:
/opt/conda/conda-bld/pytorch_1549636813070/work/aten/src/ATen/native/cuda/IndexKernel.cu:53: lambda ->auto::operator()(int)->auto: block: [34,0,0], thread: [94,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds"
failed.
/opt/conda/conda-bld/pytorch_1549636813070/work/aten/src/ATen/native/cuda/IndexKernel.cu:53: lambda ->auto::operator()(int)->auto: block: [34,0,0], thread: [95,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds"
failed.
/opt/conda/conda-bld/pytorch_1549636813070/work/aten/src/ATen/native/cuda/IndexKernel.cu:53: lambda ->auto::operator()(int)->auto: block: [34,0,0], thread: [96,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds"
failed.
/opt/conda/conda-bld/pytorch_1549636813070/work/aten/src/ATen/native/cuda/IndexKernel.cu:53: lambda ->auto::operator()(int)->auto: block: [34,0,0], thread: [97,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds"
failed.
/opt/conda/conda-bld/pytorch_1549636813070/work/aten/src/ATen/native/cuda/IndexKernel.cu:53: lambda ->auto::operator()(int)->auto: block: [34,0,0], thread: [98,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds"
failed.
/opt/conda/conda-bld/pytorch_1549636813070/work/aten/src/ATen/native/cuda/IndexKernel.cu:53: lambda ->auto::operator()(int)->auto: block: [34,0,0], thread: [99,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds"
failed.
Did you meet this error before?
Thank you very much!
I run the script in the following command:
./run --stage 2 and the error is:
File "~/StreamingTransformer/egs/librispeech/asr1/../../../utils/spm_encode", line 14, in
import sentencepiece as spm
ImportError: No module named sentencepiece
But in ESPNet , there was no such error.
I try to solve but failed in this way:
espnet/espnet#1656
Is there any solution? Thanks!!
Hello, I run it in follow README, but got the below issue:
File "/home3/mgd/NNStudy/ASR/StreamingTransformer-master/espnet/asr/pytorch_backend/asr_ddp.py", line 118, in <listcomp> ys_pad = pad_list([torch.from_numpy(y) for y in ys]
I check the asr_ddp.py and have seen the align-variable, so the data-preparation need get the alignment-information?
Thank you very much!
I didn't find the third input of func trigger_mask named trigger in StreamingConverter,but found align,should i change the trigger to align?
I think that the code you provided seems not to support streaming mode.
In real streaming condition, recognizer can't get full sequence of Data.
But prefix_recognize() is assuming to receive the full sequence and perform encoding only one time;
I have finished training and decoding in AISHELL-1 dataset and got cer=12.4% in test set,and i found that my model.json which uses the default config is different from the one of Streaming_transformer-chunk32 with ESPnet Conv2d Encoder. It seems that my model lacks something,such as adaptive decoder.Can you release the result in AISHELL-1?
I applied your viterbi-decoding step to the aishell1 dataset, the operation seems to be successful, but regarding the generated align, I don’t quite understand the meaning of the number corresponding to each sentence. Does the number represent the starting frame corresponding to the token in this sentence? Thanks a lot !
Hi,
I am confused how can I get the the ''/path/to/model'' in Step 2. Viterbi decoding?
Step 2. Viterbi decoding
To train a TA based streaming Transformer, the alignments between CTC paths and transcriptions are required. In our work, we apply Viterbi decoding using the offline Transformer model.
cd egs/librispeech/asr1
./viterbi_decode.sh /path/to/model
Thanks a lot.
Hi, did you finish the implement of Joint CTC-triggered attention decoding algorithm?
I complete the Streaming Transformer model train and finished the train with TensorFlow, but when I tried to finish the Joint CTC-triggered attention decoding algorithm, I was stucked in the CTCPREFIX algorithm, is there any good ideal to complete it?
Papers:
[1]. STREAMING AUTOMATIC SPEECH RECOGNITION WITH THE TRANSFORMER MODEL
[2]. Streaming End-to-End Speech Recognition with Joint CTC-Attention Based Models
Hi!
I'm currently trying to run the viterbi decoding with asr_recog.py, and when it gets to viterbi_decode in streaming_transformer.py it crashes because it doesn't find the viterbi_align function.
This seems to be solved by adding
from espnet.nets.viterbi_align import viterbi_align
at the beginning of the module.
Is this a right solution? If not, any idea why I'm getting this error?
To give a broader picture, I'm working with commit 19bcd9d.
Thank you very much in advance!
Hello, I trained a streaming transformer with following config, it seams that the loss is OK
but the decoding performance is bad. Is it neccesary to use prefix-decoder ?
When I use prefix-recognizie, error occurs. If I don't use prefix-recognize , the performance is bad
File "/home/storage15/username/tools/espnet/egs/librispeech/asr1/../../../espnet/bin/asr_recog.py", line 368, in
main(sys.argv[1:])
File "/home/storage15/username/tools/espnet/egs/librispeech/asr1/../../../espnet/bin/asr_recog.py", line 335, in main
recog_v2(args)
File "/home/storage15/username/tools/espnet/espnet/asr/pytorch_backend/recog.py", line 174, in recog_v2
best, ids, score = model.prefix_recognize(feat, args, train_args, train_args.char_list, lm)
File "/home/storage15/username/tools/espnet/espnet/nets/pytorch_backend/streaming_transformer.py", line 553, in prefix_recognize
self.compute_hyps(tmp,i,h_len,enc_output, hat_att[chunk_index], mask, train_args.chunk)
File "/home/storage15/username/tools/espnet/espnet/nets/pytorch_backend/streaming_transformer.py", line 776, in compute_hyps
enc_output4use, partial_mask4use, cache4use)
File "/home/storage15/username/tools/espnet/espnet/nets/pytorch_backend/transformer/decoder.py", line 310, in forward_one_step
x, tgt_mask, memory, memory_mask, cache=c
File "/home/storage15/username/tools/anaconda3/envs/py36/lib/python3.6/site-packages/torch/nn/modules/module.py", line 532, in call
result = self.forward(*input, **kwargs)
File "/home/storage15/username/tools/espnet/espnet/nets/pytorch_backend/transformer/decoder_layer.py", line 94, in forward
), f"{cache.shape} == {(tgt.shape[0], tgt.shape[1] - 1, self.size)}"
AssertionError: torch.Size([5, 1, 512]) == (5, 2, 512)
train config:
accum-grad: 1
adim: 512
aheads: 8
batch-bins: 3000000
dlayers: 6
dropout-rate: 0.1
dunits: 2048
elayers: 12
epochs: 120
eunits: 2048
grad-clip: 5
lsm-weight: 0.1
model-module: espnet.nets.pytorch_backend.streaming_transformer:E2E
mtlalpha: 0.3
opt: noam
patience: 0
sortagrad: 0
transformer-attn-dropout-rate: 0.0
transformer-init: pytorch
transformer-input-layer: conv2d
transformer-length-normalized-loss: false
transformer-lr: 1.0
transformer-warmup-steps: 2500
n-iter-processes: 0
#enc-init: exp/train_960_pytorch_train_specaug/results/model.val5.avg.best
#/path/to/model
enc-init-mods: encoder,ctc,decoder
streaming: true
chunk: true
chunk-size: 32
decode_config:
lm-weight: 0.5
beam-size: 5
penalty: 2.0
maxlenratio: 0.0
minlenratio: 0.0
ctc-weight: 0.5
threshold: 0.0005
ctc-lm-weight: 0.5
prefix-decode: true
70 File "/root/anaconda3/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 19, in _wrap
71 fn(i, *args)
72 File "/data/app/lilong/StreamingTransformer/espnet/asr/pytorch_backend/asr_ddp.py", line 319, in dist_train
73 train_epoch(train_loader, model, optimizer, epoch, args)
74 File "/data/app/lilong/StreamingTransformer/espnet/asr/pytorch_backend/asr_ddp.py", line 348, in train_epoch
75 for i, batch in enumerate(train_loader):
76 File "/root/anaconda3/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 615, in next
77 batch = self.collate_fn([self.dataset[i] for i in indices])
78 File "/root/anaconda3/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 615, in
79 batch = self.collate_fn([self.dataset[i] for i in indices])
80 File "/root/anaconda3/lib/python3.7/site-packages/chainer/dataset/dataset_mixin.py", line 67, in getitem
81 return self.get_example(index)
82 File "/root/anaconda3/lib/python3.7/site-packages/chainer/datasets/transform_dataset.py", line 52, in get_example
83 return self._transform(in_data)
84 File "/data/app/lilong/StreamingTransformer/espnet/asr/pytorch_backend/asr_ddp.py", line 291, in
85 train_dataset = TransformDataset(train, lambda data: converter(load_tr(data)))
86 File "/data/app/lilong/StreamingTransformer/espnet/asr/pytorch_backend/asr_ddp.py", line 118, in call
87 ys_pad = pad_list([torch.from_numpy(y) for y in ys],
88 File "/data/app/lilong/StreamingTransformer/espnet/asr/pytorch_backend/asr_ddp.py", line 118, in
89 ys_pad = pad_list([torch.from_numpy(y) for y in ys],
90 TypeError: expected np.ndarray (got numpy.int64)
I print the ys, like this:
(1021,)
Under the egs/librispeech/asr1/ folder and run train.sh
Is this a pytorch or numpy version problem?
Help me please.
I trained model with ta use transformer on aishell1 with encoder left window 15, right window 15, decoder window left 15, right 2. I got better acc on train data. But when decode in prefix_recognize, the wer is 9.3 on test set! It was worse than chunk32 with wer 6.3
But compare chunk and Ta training log, the acc in ta was better than chunk. So I doubt the algorithm wrong. By removing the hat_att, which acting like cache, ta got wer 6.5 when ctc_weight 0.5, and terrible Rtf, maybe 8-10.
Could you modify the algorithm to fix ta decoding with better wer and rtf ?
Any option to decode with CPU?
I found that there are many cuda arrays in streaming_transformer.py
In streamin_transformer.py, prefix_recognize looks like frame-synchronize decoding algorithm, and merges chunk decoding and trigger decoding。I try to search papers about chunk transformer and trigger attention, but not found! Can you show me the paper that introduced the algorithm?
I also have some questions about ctc prefix search in the code.
In line 662-664:
if l_plus not in hype:
Pb[l_plus] += lpz[i][0] + ...
Pb[l_plus] += lpz[i][c] * Pnb_prev[l_plus]
I doubt the line 664 should be:
Pnb[l_plus] += lpz[i][c] * Pnb_prev[l_plus]
This satisfies Algorithm 1 in paper: First-pass large vocabulary continuous speech recognition using bi-directional recurrent Dnns
Could you upload the dict as well? especially for the .model file
Hi, sorry to disturbe you.
When I use my own corpus. when I use the off-line model align the ctc path. Sone instance got this bug. So can you tell me how to solve this problem.
logit shape: (1220, 52) 59 52 Traceback (most recent call last): File "../../../espnet/bin/asr_recog.py", line 180, in <module> main(sys.argv[1:]) File "../../../espnet/bin/asr_recog.py", line 176, in main viterbi_decode(args) File "/espnet/asr/pytorch_backend/asr_recog.py", line 177, in viterbi_decode align = model.viterbi_decode(feat[0][0], y) File "/espnet/nets/pytorch_backend/e2e_asr_transformer.py", line 253, in viterbi_decode align = viterbi_align(logit, y)[0] File "/espnet/nets/viterbi_align.py", line 26, in viterbi_align
raise Exception("Number of expected symbols more than the time stamps"
sone instance frame not enough long
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.