tensorspeech / tensorflowasr Goto Github PK

:zap: TensorFlowASR: Almost State-of-the-art Automatic Speech Recognition in Tensorflow 2. Supported languages that can use characters or subwords

Home Page: https://huylenguyen.com/asr

License: Apache License 2.0

Python 18.50% Shell 0.07% Dockerfile 0.04% Jupyter Notebook 81.39%

automatic-speech-recognition deepspeech2 speech-recognition speech-to-text tensorflow2 rnn-transducer conformer tflite tflite-model tflite-convertion

tensorflowasr's Introduction

TensorFlowASR ⚡

Almost State-of-the-art Automatic Speech Recognition in Tensorflow 2

TensorFlowASR implements some automatic speech recognition architectures such as DeepSpeech2, Jasper, RNN Transducer, ContextNet, Conformer, etc. These models can be converted to TFLite to reduce memory and computation for deployment 😄

What's New?

(04/17/2021) Refactor repository with new version 1.x
(02/16/2021) Supported for TPU training
(12/27/2020) Supported naive token level timestamp, see demo with flag --timestamp
(12/17/2020) Supported ContextNet http://arxiv.org/abs/2005.03191
(12/12/2020) Add support for using masking
(11/14/2020) Supported Gradient Accumulation for Training in Larger Batch Size

What's New?
Table of Contents
😋 Supported Models
- Baselines
- Publications
Installation
Setup training and testing
TFLite Convertion
Features Extraction
Augmentations
Training & Testing Tutorial
Corpus Sources and Pretrained Models
References & Credits
Contact

😋 Supported Models

Baselines

Transducer Models (End2end models using RNNT Loss for training, currently supported Conformer, ContextNet, Streaming Transducer)
CTCModel (End2end models using CTC Loss for training, currently supported DeepSpeech2, Jasper)

Publications

Conformer Transducer (Reference: https://arxiv.org/abs/2005.08100) See examples/conformer
Streaming Transducer (Reference: https://arxiv.org/abs/1811.06621) See examples/streaming_transducer
ContextNet (Reference: http://arxiv.org/abs/2005.03191) See examples/contextnet
Deep Speech 2 (Reference: https://arxiv.org/abs/1512.02595) See examples/deepspeech2
Jasper (Reference: https://arxiv.org/abs/1904.03288) See examples/jasper

Installation

For training and testing, you should use git clone for installing necessary packages from other authors (ctc_decoders, rnnt_loss, etc.)

Installing from source (recommended)

git clone https://github.com/TensorSpeech/TensorFlowASR.git
cd TensorFlowASR
# Tensorflow 2.x (with 2.x.x >= 2.5.1)
pip3 install -e ".[tf2.x]" # or ".[tf2.x-gpu]"

For anaconda3:

conda create -y -n tfasr tensorflow-gpu python=3.8 # tensorflow if using CPU, this makes sure conda install all dependencies for tensorflow
conda activate tfasr
pip install -U tensorflow-gpu # upgrade to latest version of tensorflow
git clone https://github.com/TensorSpeech/TensorFlowASR.git
cd TensorFlowASR
# Tensorflow 2.x (with 2.x.x >= 2.5.1)
pip3 install -e ".[tf2.x]" # or ".[tf2.x-gpu]"

Installing via PyPi

# Tensorflow 2.x (with 2.x >= 2.3)
pip3 install -U "TensorFlowASR[tf2.x]" # or pip3 install -U "TensorFlowASR[tf2.x-gpu]"

Running in a container

docker-compose up -d

Setup training and testing

For datasets, see datasets
For training, testing and using CTC Models, run ./scripts/install_ctc_decoders.sh
For training Transducer Models with RNNT Loss in TF, make sure that warp-transducer is not installed (by simply run pip3 uninstall warprnnt-tensorflow) (Recommended)
For training Transducer Models with RNNT Loss from warp-transducer, run export CUDA_HOME=/usr/local/cuda && ./scripts/install_rnnt_loss.sh (Note: only export CUDA_HOME when you have CUDA)
For mixed precision training, use flag --mxp when running python scripts from examples
For enabling XLA, run TF_XLA_FLAGS=--tf_xla_auto_jit=2 python3 $path_to_py_script)
For hiding warnings, run export TF_CPP_MIN_LOG_LEVEL=2 before running any examples

TFLite Convertion

After converting to tflite, the tflite model is like a function that transforms directly from an audio signal to unicode code points, then we can convert unicode points to string.

Install tf-nightly using pip install tf-nightly
Build a model with the same architecture as the trained model (if model has tflite argument, you must set it to True), then load the weights from trained model to the built model
Load TFSpeechFeaturizer and TextFeaturizer to model using function add_featurizers
Convert model's function to tflite as follows:

func = model.make_tflite_function(**options) # options are the arguments of the function
concrete_func = func.get_concrete_function()
converter = tf.lite.TFLiteConverter.from_concrete_functions([concrete_func])
converter.experimental_new_converter = True
converter.optimizations = [tf.lite.Optimize.DEFAULT]
converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS,
                                       tf.lite.OpsSet.SELECT_TF_OPS]
tflite_model = converter.convert()

Save the converted tflite model as follows:

if not os.path.exists(os.path.dirname(tflite_path)):
    os.makedirs(os.path.dirname(tflite_path))
with open(tflite_path, "wb") as tflite_out:
    tflite_out.write(tflite_model)

Then the .tflite model is ready to be deployed

Features Extraction

See features_extraction

Augmentations

See augmentations

Training & Testing Tutorial

Define config YAML file, see the config.yml files in the example folder for reference (you can copy and modify values such as parameters, paths, etc.. to match your local machine configuration)
Download your corpus (a.k.a datasets) and create a script to generate transcripts.tsv files from your corpus (this is general format used in this project because each dataset has different format). For more detail, see datasets. Note: Make sure your data contain only characters in your language, for example, english has a to z and '. Do not use cache if your dataset size is not fit in the RAM.
[Optional] Generate TFRecords to use tf.data.TFRecordDataset for better performance by using the script create_tfrecords.py
Create vocabulary file (characters or subwords/wordpieces) by defining language.characters, using the scripts generate_vocab_subwords.py or generate_vocab_sentencepiece.py. There're predefined ones in vocabularies
[Optional] Generate metadata file for your dataset by using script generate_metadata.py. This metadata file contains maximum lengths calculated with your config.yml and total number of elements in each dataset, for static shape training and precalculated steps per epoch.
For training, see train.py files in the example folder to see the options
For testing, see test.py files in the example folder to see the options. Note: Testing is currently not supported for TPUs. It will print nothing other than the progress bar in the console, but it will store the predicted transcripts to the file output.tsv and then calculate the metrics from that file.

FYI: Keras builtin training uses infinite dataset, which avoids the potential last partial batch.

See examples for some predefined ASR models and results

Corpus Sources and Pretrained Models

For pretrained models, go to drive

English

Name	Source	Hours
LibriSpeech	LibriSpeech	970h
Common Voice	https://commonvoice.mozilla.org	1932h

Vietnamese

Name	Source	Hours
Vivos	https://ailab.hcmus.edu.vn/vivos	15h
InfoRe Technology 1	InfoRe1 (passwd: BroughtToYouByInfoRe)	25h
InfoRe Technology 2 (used in VLSP2019)	InfoRe2 (passwd: BroughtToYouByInfoRe)	415h

German

Name	Source	Hours
Common Voice	https://commonvoice.mozilla.org/	750h

References & Credits

Contact

Huy Le Nguyen

Email: [email protected]

tensorflowasr's People

Contributors

Stargazers

Watchers

Forkers

marcohatran entn-at jackyvan deeplearning2012 benwaldner xrosliang templeblock whitefu hengzi52125 ahsanmemon alsm168 samsgates bliunlpr2020 humphryshikunzi thangldvn wangguangyuan unparalleled-ysj dathudeptrai honghe xiaming9880 shaunholt toanhvu ivo-gilles hannes1 jupinter studywithjeffrey whaozl joaoalvarenga spxia juneren eddy0613 tricky61 daxiafresh phattharachon hh1992 tund ductho9799 shuaijiang ai-nikolai ddoron9 zhangxinaaaa ryantang1993 gavin90s gk169 gaoyuanliang rxhmdia overflow001 songmyekyo lbxcfx bihujrj tuananhktmt enormous-system omarhory sar-dar nichongjia-2007 spacehopy zoucan520 sphara-app monatis gandroz sknadig xhtian iamweiweishi chenying99 abnerzhangxinbinappen takasyo webstorage119 mega-cqz fengredrum jldeng3 trillionmonster karibbov osamamo7sen zzpdapeng targonaut jet-voice custljc n0todd ybno1 twistedmove jsliugang rudrarobotics harrisalikhanofficial raphaelolivier lbtanh deepdubbed normalclone vaibhav016 tongning weimingtom double22a gdoras emekaborisama agadob coolwind8214 hommmm ck196 yyht leonardbongard yasinjan99

tensorflowasr's Issues

Conformer decode speed too slow

hi, I'm using Conformer example, training is good, but the decode speed is very slow.
I tried another speech-transform based project: https://github.com/ZhengkunTian/OpenTransformer
use same GPU, almost same size model (~30M parm),
TensorFlowASR's Conformer infer 3 wav/s, OpenTransformer's speech-transform infer 20 wav/s
I look at TensorFlowASR's code, it infer one wav per batch, is it the reason why it is so slow? do you have plan to improve it?

How many epochs are required for training purpose?

I was trying training conformer model for testing purpose using 1 epoch by running the train_conformer.py file.The loss value is around 400 in one epoch.So what should be the optimal no of epochs for training purpose on Librispeech dataset as right now i am getting WER=100 (weird).Kindly help me out here

about vocabulary

hello, when train transducer, whether the vocabulary should include S and /S as the sentence start and sentence end tag?thanks.

does it work with other languaje?

if i change the vocabulary file, will ir work with other languaje? or this code is for vietnamese only?

KeyError: 'dmodel' when trying to train conformer

I used the new config example in which for example dmodel has been updated to encoder_dmodel and it breaks the train_conformer.py script

Problem of warp_rnnt tensorflow in windows environment

OS: Windows 10 , Anaconda
CUDA_toolkit: 10.1
Python: 3.7
Framework : tensorflow-gpu (2.3.0), pytorch(1.7.0)

Hello. Thank you for your projects
I have a problem with warprnnt

No module named 'warprnnt_tensorflow'

When I run ./scripts/install_rnnt_loss.sh, I got error message that make command could not be found. (The cmake command worked successfully. )

So I used an alternative method.
cmake --build . --target INSTALL --config Release

However, another error message was displayed.

Building NVCC (Device) object CMakeFiles/warprnnt.dir/src/Release/warprnnt_generated_rnnt_entrypoint.cu.obj
nvcc fatal : 32 bit compilation is only supported for Microsoft Visual Studio 2013 and earlier
CMake Error at warprnnt_generated_rnnt_entrypoint.cu.obj.Release.cmake:220 (message):
Error generating
E:/Dev/Python/TEST/TensorFlowASR/externals/warp-transducer/build/CMakeFiles/warprnnt.dir/src/Release/warprnnt_gener
ated_rnnt_entrypoint.cu.obj

C:\Program Files (x86)\MSBuild\Microsoft.Cpp\v4.0\V140\Microsoft.CppCommon.targets(171,5): error MSB6006: "cmd.exe"exited.(code: 1). [E:\Dev\Python\TEST\TensorFlowASR\externals\warp-transducer\build\warprnnt.vcxproj]

How can I solve this problem? Or your project doesn't support windows?

Can't find Residual Connections

I cant find residual connections in the conformer block. Is there something that I am missing or probably

you have missed on your end?

Training loss is always above 200 on dataset LibriSpeech/train-clean-100

Environment:

TensorFlow GPU 2.3
TensorFlowASR ade7891

Training command:

PYTHONPATH=`pwd` python examples/conformer/train_subword_conformer.py --tbs 2 --ebs 2 --mxp --devices 0 --cache --subwords ./output/librispeed \
--subwords_corpus \
/home/ubuntu/Data/LibriSpeechConformer/LibriSpeech/train-clean-100/transcripts.tsv \
/home/ubuntu/Data/LibriSpeechConformer/LibriSpeech/test-clean/transcripts.tsv \
/home/ubuntu/Data/LibriSpeechConformer/LibriSpeech/dev-clean/transcripts.tsv \
/home/ubuntu/Data/LibriSpeechConformer/LibriSpeech/dev-other/transcripts.tsv

InvalidArgumentError during test of conformer

I encountered an InvalidArgumentError error during testing phase. I haven't had any issue during training, only when trying to test the resulting model.

Traceback (most recent call last):
  File "examples/conformer/test_conformer.py", line 97, in <module>
    conformer_tester.run(test_dataset)
  File "/home/guillaume/miniconda3/envs/stt/lib/python3.8/site-packages/TensorFlowASR-0.4.0-py3.8.egg/tensorflow_asr/runners/base_runners.py", line 405, in run
  File "/home/guillaume/miniconda3/envs/stt/lib/python3.8/site-packages/TensorFlowASR-0.4.0-py3.8.egg/tensorflow_asr/runners/base_runners.py", line 416, in _test_epoch
  File "/home/guillaume/miniconda3/envs/stt/lib/python3.8/site-packages/tensorflow/python/eager/def_function.py", line 780, in __call__
    result = self._call(*args, **kwds)
  File "/home/guillaume/miniconda3/envs/stt/lib/python3.8/site-packages/tensorflow/python/eager/def_function.py", line 846, in _call
    return self._concrete_stateful_fn._filtered_call(canon_args, canon_kwds)  # pylint: disable=protected-access
  File "/home/guillaume/miniconda3/envs/stt/lib/python3.8/site-packages/tensorflow/python/eager/function.py", line 1843, in _filtered_call
    return self._call_flat(
  File "/home/guillaume/miniconda3/envs/stt/lib/python3.8/site-packages/tensorflow/python/eager/function.py", line 1923, in _call_flat
    return self._build_call_outputs(self._inference_function.call(
  File "/home/guillaume/miniconda3/envs/stt/lib/python3.8/site-packages/tensorflow/python/eager/function.py", line 545, in call
    outputs = execute.execute(
  File "/home/guillaume/miniconda3/envs/stt/lib/python3.8/site-packages/tensorflow/python/eager/execute.py", line 59, in quick_execute
    tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name,
tensorflow.python.framework.errors_impl.InvalidArgumentError: 2 root error(s) found.
  (0) Invalid argument:  2 root error(s) found.
  (0) Invalid argument: During Variant Host->Device Copy: non-DMA-copy attempted of tensor type: string
  (1) Invalid argument: During Variant Host->Device Copy: non-DMA-copy attempted of tensor type: string
0 successful operations.
0 derived errors ignored.
         [[{{node StatefulPartitionedCall/StatefulPartitionedCall_1/map/while/body/_3023/map/while/conformer_beam_search/TensorArrayV2Write_2/TensorListSetItem/_184}}]]
  (1) Invalid argument:  2 root error(s) found.
  (0) Invalid argument: During Variant Host->Device Copy: non-DMA-copy attempted of tensor type: string
  (1) Invalid argument: During Variant Host->Device Copy: non-DMA-copy attempted of tensor type: string
0 successful operations.
0 derived errors ignored.
         [[{{node StatefulPartitionedCall/StatefulPartitionedCall_1/map/while/body/_3023/map/while/conformer_beam_search/TensorArrayV2Write_2/TensorListSetItem/_184}}]]
         [[StatefulPartitionedCall/StatefulPartitionedCall_2/map/while/body/_3651/map/while/conformer_beam_search/TensorArrayV2Write_1/TensorListSetItem/_188]]
0 successful operations.
0 derived errors ignored. [Op:__inference__test_function_50368]

Function call stack:
_test_function -> _test_function

That's the arguments passed to the TF function:

OP name __inference__test_function_50368
Attrs ('executor_type', '', 'config_proto', b'\n\x07\n\x03CPU\x10\x01\n\x07\n\x03GPU\x10\x012\x05*\x010J\x008\x01R\x05R\x03\xb8\x01\x02\x82\x01\x00')

During Variant Host->Device Copy: non-DMA-copy attempted of tensor type: string

Hi,
First, I use conda env
When I run python test_streaming_transducer.py, I encounter the below error

tensorflow.python.framework.errors_impl.InvalidArgumentError: 2 root error(s) found.
  (0) Invalid argument:  2 root error(s) found.
  (0) Invalid argument: During Variant Host->Device Copy: non-DMA-copy attempted of tensor type: string
  (1) Invalid argument: During Variant Host->Device Copy: non-DMA-copy attempted of tensor type: string
0 successful operations.
0 derived errors ignored.
         [[{{node StatefulPartitionedCall/StatefulPartitionedCall_2/map/while/body/_549/map/while/streaming_transducer_beam_search/TensorArrayV2Write_2/TensorListSetItem/_188}}]]
  (1) Invalid argument:  2 root error(s) found.
  (0) Invalid argument: During Variant Host->Device Copy: non-DMA-copy attempted of tensor type: string
  (1) Invalid argument: During Variant Host->Device Copy: non-DMA-copy attempted of tensor type: string
0 successful operations.
0 derived errors ignored.
         [[{{node StatefulPartitionedCall/StatefulPartitionedCall_2/map/while/body/_549/map/while/streaming_transducer_beam_search/TensorArrayV2Write_2/TensorListSetItem/_188}}]]
         [[StatefulPartitionedCall/StatefulPartitionedCall_1/map/while/body/_438/map/while/streaming_transducer_beam_search/while/body/_1485/map/while/streaming_transducer_beam_search/while/while/body/_2063/map/while/streaming_transducer_beam_search/while/while/while/body/_2487/map/while/streaming_transducer_beam_search/while/while/while/cond/pivot_f/_2813/_653]]
0 successful operations.
0 derived errors ignored. [Op:__inference__test_function_26746]

Function call stack:
_test_function -> _test_function

I thought It's problem about tensorflow version
Because My env include the below

tensorboard               2.3.0              pyh4dce500_0
tensorboard-plugin-wit    1.6.0                      py_0
tensorflow                2.2.0           gpu_py37h1a511ff_0
tensorflow-addons         0.11.2                   pypi_0    pypi
tensorflow-base           2.2.0           gpu_py37h8a81be8_0
tensorflow-datasets       3.2.1                    pypi_0    pypi
tensorflow-estimator      2.3.0                    pypi_0    pypi
tensorflow-gpu            2.3.1                    pypi_0    pypi
tensorflow-metadata       0.25.0                   pypi_0    pypi
warprnnt-tensorflow       0.1                      pypi_0    pypi

I focused tensorflow 2.2.0, So I update that using pip install -U tensorflow

Now, My env include

tensorboard               2.3.0              pyh4dce500_0
tensorboard-plugin-wit    1.6.0                      py_0
tensorflow                2.3.1                    pypi_0    pypi
tensorflow-addons         0.11.2                   pypi_0    pypi
tensorflow-datasets       3.2.1                    pypi_0    pypi
tensorflow-estimator      2.3.0                    pypi_0    pypi
tensorflow-gpu            2.3.1                    pypi_0    pypi
tensorflow-metadata       0.25.0                   pypi_0    pypi
warprnnt-tensorflow       0.1                      pypi_0    pypi

Well.. I don't know Why tensorflow-base uninstalled

Anyway, The same error occur again...

What can I do for running test code?

Thanks

tflite conversation broken.

HI -

With the latest pull, it seems the TFlite is broken.
(trying to convert a conformer model with GA training)

tensorflow.python.framework.errors_impl.InvalidArgumentError: Attempting to add a duplicate function with name: __inference_standard_lstm_20821 where the previous and current definitions differ. Previous definiton: signature {
  name: "__inference_standard_lstm_20821"
  input_arg {
    name: "inputs"
    type: DT_FLOAT
  }
  input_arg {
    name: "init_h"
    type: DT_FLOAT
  }
  input_arg {
    name: "init_c"
    type: DT_FLOAT
  }
  input_arg {
    name: "kernel"
    type: DT_FLOAT
  }
  input_arg {
    name: "recurrent_kernel"
    type: DT_FLOAT
  }

librosa sfft with center=True ，value is different with tensorflow tf.signal.sfft

librosa sfft with center=True ，value is different with tensorflow tf.signal.sfft。

librosa with center=true， it will pad the feame_length to n_fft with constant zero value 。It also pad signal with reflect mode。

The feature extraction shuold cover this difference。

Pretrained models

Hi @usimarit , is it possible if you could share the pretrained models of Deep Speech 2, Transducer, Streaming Transducer?
I only see Conformer in the Google Drive link you shared.

Thanks!

TensorFlowASR_Chinese

中文语音识别感兴趣的可以加:lp9628进群讨论。

Should I use eager mode?

I have a question about whether should I use the eager or graph mode while launching trainining scripts. The thing is that in my tf 2.4 I have it disabled by default, which results in errors in all calls to some_tensor.numpy(). However, when I enable the eager mode manually at the begining of the script, I end up getting even more errors, at even earlier stages of the training.
So far I was able to run the training script with eager mode disabled, while commenting out all calls to .numpy()

AttributeError: 'StreamingTransducerEncoder' object has no attribute 'get_initial_states'

Hi,
When I run python test_streaming_transducer.py, The below error occur

AttributeError: 'StreamingTransducerEncoder' object has no attribute 'get_initial_states'

I think attribute get_initial_states is wrong spell

The below codes are in tensorflow_asr/models/streaming_transducer.py

encoded, _ = self.encoder_inference(features, self.encoder.get_initial_states())  # at 260 line
encoded, _ = self.encoder_inference(features, self.encoder.get_initial_states())  # at 313 line

You should fix get_initial_states to get_initial_state (s is point)
And remove TensorFlowASR-0.4.0-py3.7.egg at your env (in my case, TensorFlowASR-0.4.0-py3.7.egg is in /home/user/anaconda3/envs/tfasr/lib/python3.7/site-packages/)
then python setup.py install again

I hope my issue help someone who have problem like me

New Dataset Structure

I have created my own data as documentation said ( wave files and tsv file for them. near 300h ) Now I can't understand what really should I do, to prepare it for training. is just the tsv and wav files enough to start training?

To make a custom dataset, inherit the BaseDataset class and override following methods:

create to create tf.data.Dataset instance.
parse for transforming tf.data.Dataset during creation by applyting tf.data.Dataset.map function.

I mean here. should I create a new py file and implement these 2 files? how is the implementation of these 2 functions? where this new py file should be imported?
an Example of this part may be really useful.

Receiving forward-backward warning

I am getting this warning. Is it okay to be getting this?

best practices for confidence extraction

What would be the best way to show/extract the confidence of the recognition (greedy or beam search) during inference.
The sequence/word confidence would be interesting to see.

Creating Subword Vocabulary for Custom Language

Hi,

Thank you for your great work. How do I create the subword vocabulary for any custom language? Should I do it manually or alter the config file for auto-extraction and auto-generation of subwords?

Segmentation fault while running train_conformer.py

Hello, everyone！ I was trying to running train_conformer.py on LibriSpeech dataset(in particular, dev-clean), and i've got this error below:
Run on 1 Physical GPUs
Model: "conformer_encoder"

Layer (type) Output Shape Param #

conformer_encoder_subsampling (Conv2dSubsampling) (None, None, 2880) 188208

conformer_encoder_pe (PositionalEncodingConcat) (1, None, 144) 0

conformer_encoder_linear (Dense) (None, None, 144) 414864

conformer_encoder_dropout (Dropout) (None, None, 144) 0

conformer_encoder_block_0 (ConformerBlock) (None, None, 144) 506736

conformer_encoder_block_1 (ConformerBlock) (None, None, 144) 506736

conformer_encoder_block_2 (ConformerBlock) (None, None, 144) 506736

conformer_encoder_block_3 (ConformerBlock) (None, None, 144) 506736

conformer_encoder_block_4 (ConformerBlock) (None, None, 144) 506736

conformer_encoder_block_5 (ConformerBlock) (None, None, 144) 506736

Total params: 3,643,488
Trainable params: 3,641,760
Non-trainable params: 1,728

Model: "conformer_prediction"

Layer (type) Output Shape Param #

conformer_prediction_embedding (Embedding) (None, None, 320) 9280

conformer_prediction_dropout (Dropout) (None, None, 320) 0

conformer_prediction_ln_0 (LayerNormalization) (None, None, 320) 640

conformer_prediction_lstm_0 (LSTM) [(None, None, 320), (None, 320), (None, 320)] 820480

Total params: 830,400
Trainable params: 830,400
Non-trainable params: 0

Model: "conformer_joint"

Layer (type) Output Shape Param #

conformer_joint_enc (Dense) (None, None, 320) 46400

conformer_joint_pred (Dense) (None, None, 320) 102400

conformer_joint_vocab (Dense) multiple 9309

Total params: 158,109
Trainable params: 158,109
Non-trainable params: 0

Model: "conformer"

Layer (type) Output Shape Param #

conformer_encoder (ConformerEncoder) (None, None, 144) 3643488

conformer_prediction (TransducerPrediction) (None, None, 320) 830400

conformer_joint (TransducerJoint) (None, None, None, 29) 158109

Total params: 4,631,997
Trainable params: 4,630,269
Non-trainable params: 1,728

Reading /media/huaxin/tcl1/asr/tanglei/work2020/work202012/TensorFlowASR_New/TensorFlowASR/examples/conformer/data/libri_train.tsv ...
Reading /media/huaxin/tcl1/asr/tanglei/work2020/work202012/TensorFlowASR_New/TensorFlowASR/examples/conformer/data/libri_dev.tsv ...
[Train] | | 0/14980 [00:00<?, ?batch/s]./train_conformer.sh: line 1: 4671 Segmentation fault (core dumped) python ./examples/conformer/train_conformer.py --config ./examples/conformer/config.yml --tbs 2 --ebs 2 --devices 0

Some information about the system and software is as follows:
lsb_release -a:
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 16.04 LTS
Release: 16.04
Codename: xenial

gcc --version
gcc (Ubuntu 4.9.3-13ubuntu2) 4.9.3

g++ --version
g++ (Ubuntu 4.9.3-13ubuntu2) 4.9.3

pip list:
...
ctc-decoders 1.1
tensorboard 2.4.0
tensorboard-plugin-wit 1.7.0
tensorflow-addons 0.11.2
tensorflow-datasets 3.2.1
tensorflow-estimator 2.3.0
tensorflow-gpu 2.3.1
tensorflow-metadata 0.26.0
TensorFlowASR 0.4.3
termcolor 1.1.0
threadpoolctl 2.1.0
tqdm 4.54.1
typeguard 2.10.0
typing-extensions 3.7.4.3
urllib3 1.26.2
warprnnt-tensorflow 0.1

nvcc -V:
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2019 NVIDIA Corporation
Built on Sun_Jul_28_19:07:16_PDT_2019
Cuda compilation tools, release 10.1, V10.1.243

CUDNN:v7.6.4

This is very strange. I can't find the wrong place for the moment. Do you have any ideas?Thanks very much

Augmentation not working

Augmentations giving this error (below). However, when I disable augmentation from config.yml file, the system works perfectly.

Something that I am missing?

Error numpy version

provide example

provide example in readme for quality evaluation, original text and recognised text (speak original text with your voice)

Example of conformer

Would it be possible to provide an example of running inference on the unconverted (non-tflite) model as well.

If I try to run the tflite if segfaults always. I think the conversation is a bit buggy.

my steps are:
In TensorFlowASR/examples/conformer:

train_conformer.py -> creates checkpoint and h5 file
tflite_conformer.py -> point to the h5 file
Then, I run TensorFlowASR/examples/demonstrations/conformer.py and point to tflite file produced.
INFO: TfLiteFlexDelegate delegate: 212 nodes delegated out of 2737 nodes with 130 partitions.

INFO: TfLiteFlexDelegate delegate: 0 nodes delegated out of 1 nodes with 0 partitions.

INFO: TfLiteFlexDelegate delegate: 0 nodes delegated out of 26 nodes with 0 partitions.

INFO: TfLiteFlexDelegate delegate: 0 nodes delegated out of 1 nodes with 0 partitions.

INFO: TfLiteFlexDelegate delegate: 1 nodes delegated out of 40 nodes with 1 partitions.

2020-11-16 23:01:41.748132: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcufft.so.10
Segmentation fault (core dumped)

@Honghe @pquochuy @entn-at
Thank you for the consideration!

Attempting to add a duplicate function when exporting to TF Lite

I want to deploy Conformer Subwords model on RPi for a PoC work. When exporting the latest pretrained model with tflite_subword_conformer.py and tf-nightly v2.5.0.dev20201106, I'm getting the following exception. Anyone getting the same?

  File "tflite_subword_conformer.py", line 70, in <module>
    tflite_model = converter.convert()
  File "D:\coding\asr\examples\conformer\env\lib\site-packages\tensorflow\lite\python\lite.py", line 1118, in convert
    return super(TFLiteConverterV2, self).convert()
  File "D:\coding\asr\examples\conformer\env\lib\site-packages\tensorflow\lite\python\lite.py", line 922, in convert
    self._funcs[0], lower_control_flow=False))
  File "D:\coding\asr\examples\conformer\env\lib\site-packages\tensorflow\python\framework\convert_to_constants.py", line 1111, in convert_variables_to_constants_v2_as_graph
    converted_input_indices)
  File "D:\coding\asr\examples\conformer\env\lib\site-packages\tensorflow\python\framework\convert_to_constants.py", line 1003, in _construct_concrete_function
    new_output_names)
  File "D:\coding\asr\examples\conformer\env\lib\site-packages\tensorflow\python\eager\wrap_function.py", line 650, in function_from_graph_def
    wrapped_import = wrap_function(_imports_graph_def, [])
  File "D:\coding\asr\examples\conformer\env\lib\site-packages\tensorflow\python\eager\wrap_function.py", line 630, in wrap_function
    signature=signature)
  File "D:\coding\asr\examples\conformer\env\lib\site-packages\tensorflow\python\eager\wrap_function.py", line 229, in __init__
    context.context().add_function_def(f)
  File "D:\coding\asr\examples\conformer\env\lib\site-packages\tensorflow\python\eager\context.py", line 1138, in add_function_def
    len(fdef_string))
tensorflow.python.framework.errors_impl.InvalidArgumentError: Attempting to add a duplicate function with name: __inference_standard_lstm_20496 where the previous and current definitions differ. Previous definiton: signature {
...
}

Please note that I omited the signature definition for brevity, but I can post it separately if it matters.

Unable to train conformer

First, I used the new config example in which for example dmodel has been updated to encoder_dmodel and it breaks the train_conformer.py script. I updated the script with the config change and the script launches.

Second, I have then a TypeError

2020-12-08 15:42:02.517498: W tensorflow/core/framework/op_kernel.cc:1755] Invalid argument: TypeError: can only concatenate str (not "int") to str
Traceback (most recent call last):

  File "/home/guillaume/miniconda3/envs/tfasr/lib/python3.7/site-packages/tensorflow/python/ops/script_ops.py", line 244, in __call__
    ret = func(*args)

  File "/home/guillaume/miniconda3/envs/tfasr/lib/python3.7/site-packages/tensorflow/python/autograph/impl/api.py", line 302, in wrapper
    return func(*args, **kwargs)

  File "/home/guillaume/miniconda3/envs/tfasr/lib/python3.7/site-packages/TensorFlowASR-0.4.0-py3.7.egg/tensorflow_asr/datasets/asr_dataset.py", line 225, in preprocess
    data = super(ASRSliceDataset, self).preprocess(path.decode("utf-8"), transcript)

  File "/home/guillaume/miniconda3/envs/tfasr/lib/python3.7/site-packages/TensorFlowASR-0.4.0-py3.7.egg/tensorflow_asr/datasets/asr_dataset.py", line 90, in preprocess
    features = self.augmentations.after.augment(features)

  File "/home/guillaume/miniconda3/envs/tfasr/lib/python3.7/site-packages/nlpaug-1.1.0-py3.7.egg/nlpaug/flow/pipeline.py", line 58, in augment
    augmented_results = [self._augment(data) for _ in range(n)]

  File "/home/guillaume/miniconda3/envs/tfasr/lib/python3.7/site-packages/nlpaug-1.1.0-py3.7.egg/nlpaug/flow/pipeline.py", line 58, in <listcomp>
    augmented_results = [self._augment(data) for _ in range(n)]

  File "/home/guillaume/miniconda3/envs/tfasr/lib/python3.7/site-packages/nlpaug-1.1.0-py3.7.egg/nlpaug/flow/pipeline.py", line 102, in _augment
    augmented_data = aug.augment(augmented_data, n=n, num_thread=num_thread)

  File "/home/guillaume/miniconda3/envs/tfasr/lib/python3.7/site-packages/nlpaug-1.1.0-py3.7.egg/nlpaug/base_augmenter.py", line 115, in augment
    augmented_results = self._parallel_augment(action_fx, clean_data, n=n, num_thread=num_thread)

  File "/home/guillaume/miniconda3/envs/tfasr/lib/python3.7/site-packages/nlpaug-1.1.0-py3.7.egg/nlpaug/base_augmenter.py", line 176, in _parallel_augment
    results = pool.map(action_fx, [data] * n)

  File "/home/guillaume/miniconda3/envs/tfasr/lib/python3.7/multiprocessing/pool.py", line 268, in map
    return self._map_async(func, iterable, mapstar, chunksize).get()

  File "/home/guillaume/miniconda3/envs/tfasr/lib/python3.7/multiprocessing/pool.py", line 657, in get
    raise self._value

  File "/home/guillaume/miniconda3/envs/tfasr/lib/python3.7/multiprocessing/pool.py", line 121, in worker
    result = (True, func(*args, **kwds))

  File "/home/guillaume/miniconda3/envs/tfasr/lib/python3.7/multiprocessing/pool.py", line 44, in mapstar
    return list(map(*args))

  File "/home/guillaume/miniconda3/envs/tfasr/lib/python3.7/site-packages/TensorFlowASR-0.4.0-py3.7.egg/tensorflow_asr/augmentations/spec_augment.py", line 76, in substitute
    return self.flow.augment(data)

  File "/home/guillaume/miniconda3/envs/tfasr/lib/python3.7/site-packages/nlpaug-1.1.0-py3.7.egg/nlpaug/flow/pipeline.py", line 58, in augment
    augmented_results = [self._augment(data) for _ in range(n)]

  File "/home/guillaume/miniconda3/envs/tfasr/lib/python3.7/site-packages/nlpaug-1.1.0-py3.7.egg/nlpaug/flow/pipeline.py", line 58, in <listcomp>
    augmented_results = [self._augment(data) for _ in range(n)]

  File "/home/guillaume/miniconda3/envs/tfasr/lib/python3.7/site-packages/nlpaug-1.1.0-py3.7.egg/nlpaug/flow/pipeline.py", line 102, in _augment
    augmented_data = aug.augment(augmented_data, n=n, num_thread=num_thread)

  File "/home/guillaume/miniconda3/envs/tfasr/lib/python3.7/site-packages/nlpaug-1.1.0-py3.7.egg/nlpaug/base_augmenter.py", line 115, in augment
    augmented_results = self._parallel_augment(action_fx, clean_data, n=n, num_thread=num_thread)

  File "/home/guillaume/miniconda3/envs/tfasr/lib/python3.7/site-packages/nlpaug-1.1.0-py3.7.egg/nlpaug/base_augmenter.py", line 176, in _parallel_augment
    results = pool.map(action_fx, [data] * n)

  File "/home/guillaume/miniconda3/envs/tfasr/lib/python3.7/multiprocessing/pool.py", line 268, in map
    return self._map_async(func, iterable, mapstar, chunksize).get()

  File "/home/guillaume/miniconda3/envs/tfasr/lib/python3.7/multiprocessing/pool.py", line 657, in get
    raise self._value

  File "/home/guillaume/miniconda3/envs/tfasr/lib/python3.7/multiprocessing/pool.py", line 121, in worker
    result = (True, func(*args, **kwds))

  File "/home/guillaume/miniconda3/envs/tfasr/lib/python3.7/multiprocessing/pool.py", line 44, in mapstar
    return list(map(*args))

  File "/home/guillaume/miniconda3/envs/tfasr/lib/python3.7/site-packages/TensorFlowASR-0.4.0-py3.7.egg/tensorflow_asr/augmentations/spec_augment.py", line 61, in substitute
    return self.model.mask(data)

  File "/home/guillaume/miniconda3/envs/tfasr/lib/python3.7/site-packages/TensorFlowASR-0.4.0-py3.7.egg/tensorflow_asr/augmentations/spec_augment.py", line 43, in mask
    freq = np.random.randint(0, self.mask_factor + 1)

TypeError: can only concatenate str (not "int") to str

and I dont know where to start with this error

Android Example App for real-time speech-to-text

Hi,

I'm evaluating different techniques for creating speech recognition models that are suitable for use on an Android app.

I'm looking for a way to perform speech-to-text in a real-time, streaming fashion, which is performant even on an Android phone. The full plan is for the output of the speech-to-text to enter a text-to-speech (FastSpeech2 + MB-MelGAN) model for speech synthesis with as little latency as possible.

I see in this library you have implemented a few models, and support exporting models to tflite. Which would you recommend for use on Android?

Finally, do you have any examples using this project in an Android app? In the other TensorSpeech project (TensorFlowTTS), there is an example, and it does not look trivial to bring such a complex project to Android.

Thanks!

RNNT Loss gives Nan values

Working on the increasing loss values to Nan problem

OOM when execute test_conformer or test_subword_conformer

hi, I test the example/conformer, use the pretrained or myself trained weight, both get OOM error, I'm using 1080Tix4.

command is:

python3 examples/conformer/test_subword_conformer.py --config=xxxxxxxx/subwords/config.yml --saved=xxxxxxxxxx/subwords/latest.h5 --subwords=xxxxxxxxx/subwords/conformer.subwords --output_name=test1.tcv

log is:

bbbbbbbbb
cccccccccc1
[Test]: 0batch [00:00, ?batch/s]cccccccccc2
cccccccccc3
cccccccccc0
cccccccccc41
cccccccccc42
Tensor("batch:0", shape=(1,), dtype=string)
cccccccccc43
cccccccccc44
cccccccccc45
cccccccccc46
cccccccccc47
cccccccccc4
cccccccccc49
cccccccccc5
[Test]: 1batch [04:42, 282.20s/batch]cccccccccc6
cccccccccc0
2020-11-21 15:06:45.863538: W tensorflow/core/common_runtime/bfc_allocator.cc:431] Allocator (GPU_0_bfc) ran out of memory trying to allocate 3.15MiB (rounded to 3304960)requested by op CudnnRNN
Current allocation summary follows.
2020-11-21 15:06:45.919955: W tensorflow/core/common_runtime/bfc_allocator.cc:439] ****************************************************************************************************

here is the debug print postion:

    def run(self, test_dataset):
        self.set_output_file();print("aaaaaaa")
        self.set_test_data_loader(test_dataset);print("bbbbbbbbb")
        self._test_epoch();print("cccccccccc")
        self._finish();print("ddddddd")

    def _test_epoch(self):
        if self.processed_records > 0:
            self.test_data_loader = self.test_data_loader.skip(self.processed_records)
        print("cccccccccc1")
        progbar = tqdm(initial=self.processed_records, total=None,
                       unit="batch", position=0, desc="[Test]")
        print("cccccccccc2")
        test_iter = iter(self.test_data_loader)
        print("cccccccccc3")
        while True:
            try:
                print("cccccccccc0")
                decoded = self._test_function(test_iter);print("cccccccccc4")
            except StopIteration:
                break
            except tf.errors.OutOfRangeError:
                break
            print("cccccccccc49")
            decoded = [d.numpy() for d in decoded]
            self._append_to_file(*decoded)
            print("cccccccccc5")
            progbar.update(1)
            print("cccccccccc6")
        print("cccccccccc7")
        progbar.close()

    @tf.function
    def _test_function(self, iterator):
        print("cccccccccc41")
        batch = next(iterator);print("cccccccccc42")
        return self._test_step(batch)

    @tf.function(experimental_relax_shapes=True)
    def _test_step(self, batch):
        """
        One testing step
        Args:
            batch: a step fed from test dataset

        Returns:
            (file_paths, groundtruth, greedy, beamsearch, beamsearch_lm) each has shape [B]
        """
        file_paths, features, _, labels, _, _ = batch
        print(file_paths)
        print("cccccccccc43")

        labels = self.model.text_featurizer.iextract(labels);print("cccccccccc44")
        greed_pred = self.model.recognize(features);print("cccccccccc45")
        if self.model.text_featurizer.decoder_config["beam_width"] > 0:
            beam_pred = self.model.recognize_beam(features=features, lm=False);print("cccccccccc46")
            beam_lm_pred = self.model.recognize_beam(features=features, lm=True);print("cccccccccc47")
        else:
            beam_pred = beam_lm_pred = tf.constant([""], dtype=tf.string);print("cccccccccc48")

        return file_paths, labels, greed_pred, beam_pred, beam_lm_pred

OOM when training with train_conformer or train_subword_conformer

Hi, I'm trying to train conformer for the Lithuanian language.

I'm training on NVIDIA Tesla P100 and I keep getting OOM after some time.
Using all default values from examples/conformer/config.yml.

Using a custom audio dataset with wav audio files up to 10 seconds. At first tried with up to 30, then up to 15.

What could I do differently?

[Train] [Epoch 1/20] |�[32m▏                   �[39m| 1316/181580 [23:54<42:25:21,  1.18batch/s, transducer_loss=110.57875]�[0m�[0;31m
[Train] [Epoch 1/20] |�[32m▏                   �[39m| 1316/181580 [23:54<42:25:21,  1.18batch/s, transducer_loss=110.32569]�[0m�[0;31m2020-12-05 14:53:17.261450: W tensorflow/core/common_runtime/bfc_allocator.cc:431] Allocator (GPU_0_bfc) ran out of memory trying to allocate 4.42GiB (rounded to 4746649600)requested by op StatefulPartitionedCall/gradient_tape/conformer/conformer_joint/conformer_joint_vocab/Tensordot/MatMul/MatMul
Current allocation summary follows.
�[0m�[0;31m2020-12-05 14:53:17.263848: W tensorflow/core/common_runtime/bfc_allocator.cc:439] *****************************************************************************************__*****____
2020-12-05 14:53:17.263895: W tensorflow/core/framework/op_kernel.cc:1767] OP_REQUIRES failed at matmul_op.cc:481 : Resource exhausted: OOM when allocating tensor with shape[3708320,320] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
�[0m�[0;31mTraceback (most recent call last):
  File "examples/conformer/train_conformer.py", line 140, in <module>
�[0m�[0;31m    �[0m�[0;31mconformer_trainer.fit(train_dataset, eval_dataset, train_bs=args.tbs, eval_bs=args.ebs)
�[0m�[0;31m  File "/root/.cache/pypoetry/virtualenvs/tf-asr-FRX1YIGf-py3.6/lib/python3.6/site-packages/tensorflow_asr/runners/base_runners.py", line 312, in fit
�[0m�[0;31m    self.run()
  File "/root/.cache/pypoetry/virtualenvs/tf-asr-FRX1YIGf-py3.6/lib/python3.6/site-packages/tensorflow_asr/runners/base_runners.py", line 192, in run
�[0m�[0;31m    self._train_epoch()
  File "/root/.cache/pypoetry/virtualenvs/tf-asr-FRX1YIGf-py3.6/lib/python3.6/site-packages/tensorflow_asr/runners/base_runners.py", line 213, in _train_epoch
�[0m�[0;31m    raise e
�[0m�[0;31m  File "/root/.cache/pypoetry/virtualenvs/tf-asr-FRX1YIGf-py3.6/lib/python3.6/site-packages/tensorflow_asr/runners/base_runners.py", line 207, in _train_epoch
�[0m�[0;31m    self._train_function(train_iterator)  # Run train step
  File "/root/.cache/pypoetry/virtualenvs/tf-asr-FRX1YIGf-py3.6/lib/python3.6/site-packages/tensorflow/python/eager/def_function.py", line 780, in __call__
�[0m�[0;31m    �[0m�[0;31mresult = self._call(*args, **kwds)�[0m�[0;31m
�[0m�[0;31m  File "/root/.cache/pypoetry/virtualenvs/tf-asr-FRX1YIGf-py3.6/lib/python3.6/site-packages/tensorflow/python/eager/def_function.py", line 814, in _call
�[0m�[0;31m    results = self._stateful_fn(*args, **kwds)
  File "/root/.cache/pypoetry/virtualenvs/tf-asr-FRX1YIGf-py3.6/lib/python3.6/site-packages/tensorflow/python/eager/function.py", line 2829, in __call__
�[0m�[0;31m    return graph_function._filtered_call(args, kwargs)  # pylint: disable=protected-access�[0m�[0;31m
�[0m�[0;31m  File "/root/.cache/pypoetry/virtualenvs/tf-asr-FRX1YIGf-py3.6/lib/python3.6/site-packages/tensorflow/python/eager/function.py", line 1848, in _filtered_call
�[0m�[0;31m    �[0m�[0;31mcancellation_manager=cancellation_manager)�[0m�[0;31m
�[0m�[0;31m  File "/root/.cache/pypoetry/virtualenvs/tf-asr-FRX1YIGf-py3.6/lib/python3.6/site-packages/tensorflow/python/eager/function.py", line 1924, in _call_flat
�[0m�[0;31m    ctx, args, cancellation_manager=cancellation_manager))
  File "/root/.cache/pypoetry/virtualenvs/tf-asr-FRX1YIGf-py3.6/lib/python3.6/site-packages/tensorflow/python/eager/function.py", line 550, in call
�[0m�[0;31m    �[0m�[0;31mctx=ctx)
�[0m�[0;31m  File "/root/.cache/pypoetry/virtualenvs/tf-asr-FRX1YIGf-py3.6/lib/python3.6/site-packages/tensorflow/python/eager/execute.py", line 60, in quick_execute
�[0m�[0;31m    �[0m�[0;31minputs, attrs, num_outputs)�[0m�[0;31m
�[0m�[0;31mtensorflow.python.framework.errors_impl.ResourceExhaustedError�[0m�[0;31m:  OOM when allocating tensor with shape[3708320,320] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
	 [[{{node StatefulPartitionedCall/gradient_tape/conformer/conformer_joint/conformer_joint_vocab/Tensordot/MatMul/MatMul}}]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.
 [Op:__inference__train_function_69904]

Low GPU utilization for multi-GPU training

I have trained a Conformer model using my own custom dataset in Thai. However, GPU Utilization seems to be pretty low as the training speed is pretty slow (~2 s/batch). The GPU was utilized by around 5-10%. Are there anyways to debug this problem?

For training, I simply edited config.yaml in examples/conformer/config.yaml and run

$ python examples/conformer/train_conformer.py --device 0 1 2 3

Software Specification:
OS: Debian GNU/Linux 10 (buster) (GNU/Linux 4.19.0-9-cloud-amd64 x86_64\n)
GPUs: Nvidia Tesla V100 16Gb RAM
Installed by building from source

config.yaml

speech_config:
  sample_rate: 16000
  frame_ms: 25
  stride_ms: 10
  num_feature_bins: 80
  feature_type: log_mel_spectrogram
  preemphasis: 0.97
  normalize_signal: True
  normalize_feature: True
  normalize_per_feature: False

decoder_config:
  vocabulary: vocabularies/thai.characters
  target_vocab_size: 1024
  max_subword_length: 4
  blank_at_zero: True
  beam_width: 5
  norm_score: True

model_config:
  name: conformer
  subsampling:
    type: conv2d
    filters: 144
    kernel_size: 3
    strides: 2
  positional_encoding: sinusoid_concat
  dmodel: 144
  num_blocks: 16
  head_size: 36
  num_heads: 4
  mha_type: relmha
  kernel_size: 32
  fc_factor: 0.5
  dropout: 0.1
  embed_dim: 320
  embed_dropout: 0.1
  num_rnns: 1
  rnn_units: 320
  rnn_type: lstm
  layer_norm: True
  joint_dim: 320

learning_config:
  augmentations:
    after:
      time_masking:
        num_masks: 10
        mask_factor: 100
        p_upperbound: 0.05
      freq_masking:
        num_masks: 1
        mask_factor: 27

  dataset_config:
    train_paths:
      - /home/chompk/trainv1_trainscript.tsv
    eval_paths:
      -  /home/chompk/valv1_trainscript.tsv
    test_paths:
      - /mnt/d/SpeechProcessing/Datasets/LibriSpeech/test-clean/transcripts.tsv
    tfrecords_dir: null

  optimizer_config:
    warmup_steps: 40000
    beta1: 0.9
    beta2: 0.98
    epsilon: 1e-9

  running_config:
    batch_size: 4
    accumulation_steps: 4
    num_epochs: 20
    outdir: /mnt/d/SpeechProcessing/Trained/local/conformer
    log_interval_steps: 300
    eval_interval_steps: 500
    save_interval_steps: 1000

GPU Utilization

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 418.87.01    Driver Version: 418.87.01    CUDA Version: 10.1     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Tesla V100-SXM2...  Off  | 00000000:00:04.0 Off |                    0 |
| N/A   40C    P0    58W / 300W |  15752MiB / 16130MiB |      5%      Default |
+-------------------------------+----------------------+----------------------+
|   1  Tesla V100-SXM2...  Off  | 00000000:00:05.0 Off |                    0 |
| N/A   38C    P0    66W / 300W |  15704MiB / 16130MiB |      5%      Default |
+-------------------------------+----------------------+----------------------+
|   2  Tesla V100-SXM2...  Off  | 00000000:00:06.0 Off |                    0 |
| N/A   40C    P0    66W / 300W |  15752MiB / 16130MiB |      4%      Default |
+-------------------------------+----------------------+----------------------+
|   3  Tesla V100-SXM2...  Off  | 00000000:00:07.0 Off |                    0 |
| N/A   39C    P0    58W / 300W |  15704MiB / 16130MiB |      7%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0      9598      C   python                                     15741MiB |
|    1      9598      C   python                                     15693MiB |
|    2      9598      C   python                                     15741MiB |
|    3      9598      C   python                                     15693MiB |
+-----------------------------------------------------------------------------+

Training Steps Example

> Start evaluation ...
[Eval] [Step 1000] |████████████████████| 4423/4423 [22:08<00:00,  3.33batch/s, transducer_loss=171.865
> End evaluation ...
[Train] [Epoch 1/20] |                    | 1500/796100 [1:38:28<421:04:34,  1.91s/batch, transducer_loss=159.42458]
> Start evaluation ...
[Eval] [Step 1500] |████████████████████| 4423/4423 [23:15<00:00,  3.17batch/s, transducer_loss=153.2395]
> End evaluation ...
[Train] [Epoch 1/20] |                    | 2000/796100 [2:18:06<456:58:56,  2.07s/batch, transducer_loss=140.7582]
> Start evaluation ...
[Eval] [Step 2000] |████████████████████| 4423/4423 [22:36<00:00,  3.26batch/s, transducer_loss=137.00543]
> End evaluation ...
[Train] [Epoch 1/20] |                    | 2500/796100 [2:57:05<409:56:45,  1.86s/batch, transducer_loss=126.64603]
> Start evaluation ...
[Eval] [Step 2500] |████████████████████| 4423/4423 [22:52<00:00,  3.22batch/s, transducer_loss=126.15583]
> End evaluation ...
[Train] [Epoch 1/20] |                    | 2648/796100 [3:23:48<506:25:46,  2.30s/batch, transducer_loss=125.96002

transducer_loss gives NAN values

I am trying to run test_conformer.py file but with the default config it is showing OOM error,so i changed the model config file to avoid memory issues.Below I am attaching the config.yml file please tell what changes should i makeso that the loss gives some specific values.

Here is my config file:
speech_config:
sample_rate: 16000
frame_ms: 25
stride_ms: 10
num_feature_bins: 80
feature_type: log_mel_spectrogram
preemphasis: 0.97
normalize_signal: True
normalize_feature: True
normalize_per_feature: False

decoder_config:
vocabulary: null
target_vocab_size: 1024
max_subword_length: 4
blank_at_zero: True
beam_width: 5
norm_score: True

model_config:
name: conformer
subsampling:
type: conv2d
filters: 32
kernel_size: 5
strides: 3
positional_encoding: sinusoid_concat
dmodel: 144
num_blocks: 16
head_size: 36
num_heads: 4
mha_type: relmha
kernel_size: 32
fc_factor: 0.5
dropout: 0.1
embed_dim: 320
embed_dropout: 0.1
num_rnns: 1
rnn_units: 100
rnn_type: lstm
layer_norm: True
projection_units: 0
joint_dim: 320

learning_config:
augmentations:
after:
time_masking:
num_masks: 10
mask_factor: 100
p_upperbound: 0.05
freq_masking:
num_masks: 1
mask_factor: 27

dataset_config:
train_paths:
- /mnt/d/SpeechProcessing/Datasets/LibriSpeech/train-clean-100/transcripts.tsv
eval_paths:
- /mnt/d/SpeechProcessing/Datasets/LibriSpeech/dev-clean/transcripts.tsv
- /mnt/d/SpeechProcessing/Datasets/LibriSpeech/dev-other/transcripts.tsv
test_paths:
- /mnt/d/SpeechProcessing/Datasets/LibriSpeech/test-clean/transcripts.tsv
tfrecords_dir: null

optimizer_config:
warmup_steps: 40000
beta1: 0.9
beta2: 0.98
epsilon: 1e-9

running_config:
batch_size: 16
accumulation_steps: 4
num_epochs: 1
outdir: /mnt/d/SpeechProcessing/Trained/local/conformer
log_interval_steps: 300
eval_interval_steps: 500
save_interval_steps: 1000

Question about training setup

Hi,

I'm curious to learn what training setup you are using to train the subword conformer librispeech model. I'm currently training such a model (on a different dataset) on a RTX2080Ti but noticed very low GPU utilization and generally slow training speed. I am trying to figure out how to improve training speed.

Thanks!

Reshape error with tflite model

I run the conformer model without any error.
But I got the following error when I run the tflite model with the same inputs.

RuntimeError: tensorflow/lite/kernels/reshape.cc:66 num_input_elements != num_output_elements (0 != 144)Node number 5 (RESHAPE) failed to prepare.
Node number 2840 (WHILE) failed to invoke.

Any idea why?

tflite converter error: failed to legalize operation

I have trained a streaming transducer for a couple of epochs. Tried converting it to a tflite function using the following code.
train_streaming_transducer_tflite.txt

I am getting the following traceback. Any ideas what the problem might be?
train_streaming_transducer_tflite_traceback.txt

p.s. I am not using a GPU at the moment because of libcudnn errors.

Any help would be help :)

Question: Transducer.recognize for streaming-decode

I am trying to understand how the streaming-decode works. There's a few things which I am not sure whether I completely understand them so I hope it's okay if I'm asking here.

The first part concerns the memory of the prediction network. In TransducerPrediction I see that there's two arguments p_memory_states and p_carry_states.

outputs = self.embed(inputs, training=training)
outputs = self.do(outputs, training=training)

n_memory_states = []
n_carry_states = []

for i, lstm in enumerate(self.lstms):
    initial_state = [p_memory_states[i], p_carry_states[i]] if has_memories else None

    outputs, new_memory_state, new_carry_state = lstm(outputs, training=training, initial_state=initial_state)

    n_memory_states.append(tf.expand_dims(new_memory_state, 0))
    n_carry_states.append(tf.expand_dims(new_carry_state, 0))

return outputs, tf.concat(n_memory_states, axis=0), tf.concat(n_carry_states, axis=0)

These arguments are used in Tramsducer.perform_greedy to initialize the states of the LSTM-stack during prediction/recognition.

So, if I get this right, what this does is, as we stream-decode, initialize each LSTM with its previous state from the last time-step. Is that correct?

And, we have to keep track of each individual layer (instead of forward passing the last state) because during streaming-decode we're essentially looking at only one time slice every time we run the model:

hi = tf.reshape(enc[i], [1, 1, -1])  # <-- Take the i-th slice of the encoder output
y, n_memory_states, n_carry_states = self.predict_network(
    inputs=tf.reshape(new_hyps[0]["yseq"][-1], [1, 1]),  # <-- Take the previously predicted symbol
    p_memory_states=new_hyps[0]["p_memory_states"],
    p_carry_states=new_hyps[0]["p_carry_states"],
    has_memories=new_hyps[0]["has_memories"],
    training=False
)

I think I understand this part so far but:

Q: Why are we not storing the states of the EncoderNetwork like we do for the PredictionNetwork?

If we're streaming, where features are the spectrogram features, wouldn't it make sense to also keep the internal LSTM-state(s) of the encoder?

My own implementation of the model is slightly different: The encoder-network is a stack of LSTMs whereas in your example you're only using one LSTM. But in both cases we're having internal states which we're not carrying along for Transducer.recognize and not sure if I understand why this is the case.

EncoderNetwork Code (click to expand)

class EncoderNetwork(network.Network):

    def __init__(
        self,
        num_layers: int,
        lstm_units: int,
        time_reduction_index: int = None,
        time_reduction_factor: int = 2,
        dropout: float = 0,
        *args,
        **kwargs
    ):
        super().__init__(*args, **kwargs)
        self.reduction_index = time_reduction_index
        self.reduction_factor = time_reduction_factor
        self.lstm_stack = list()

        for i in range(num_layers):
            lstm = layers.LSTM(
                units=lstm_units,
                return_sequences=True,
                return_state=True,
                dropout=dropout
            )
            norm = layers.LayerNormalization()
            self.lstm_stack.append((lstm, norm))

        if self.reduction_index:
            self.time_reduction = TimeReduction(self.reduction_factor)

    def call(self, inputs, training=None, mask=None):

        x = inputs
        states = None
        for i, (lstm, norm) in enumerate(self.lstm_stack):
            x, state_h, state_c = lstm(x, initial_state=states)
            x = norm(x)
            states = state_h, state_c
            if self.reduction_index and i == self.reduction_index:
                x = self.time_reduction(x)

        return x

Shouldn't we keep those states as well? What if I stream the first 2 seconds of an audio and then the next 2 seconds and so on. Shouldn't we keep track of the state for the EncoderNetwork as well in that case?

The second part concerns the input of the prediction network. I can see that you're prepending the ids with the blank (0) symbol. So [1, 2, 3] will be changed to [0, 1, 2 3]. Now, we're then also using Dataset.padded_batch in order to align examples and here we're also using the same blank symbol. This means the sample could end up looking something like this: [0, 1, 2, 3, 0, 0] - is this correct? One-hot encoded this would take the form:

[1, 0, 0, 0],
[0, 1, 0, 0],
[0, 0, 1, 0],
[0, 0, 0, 1],
[1, 0, 0, 0],
[1, 0, 0, 0],

I am asking this because in https://arxiv.org/pdf/1211.3711.pdf the blank-symbol is actually a vector containing all zeros:

[0, 0, 0],
[1, 0, 0],
[0, 1, 0],
[0, 0, 1],
[0, 0, 0],
[0, 0, 0],

and I was wondering whether this could make a difference?

Thank you for shedding any light on this :)

streaming tflite conformer giving pretty bad results

Hi,
I tested the demonstration/streaming_tflite_conformer.py file as following
python examples/demonstration/streaming_tflite_conformer.py "/home/ahsanmemon/Desktop/LibriSpeech/test-clean/61/70968/61-70968-0000.flac" --tflite "models/conformer-tflite/subsampling-conformer.latest.tflite"

and
python examples/demonstration/streaming_tflite_conformer.py "/home/ahsanmemon/Desktop/LibriSpeech/test-clean/61/70968/61-70968-0000.flac" --tflite "models/conformer-tflite/subword-conformer.latest.tflite"

However, both the times, I got really funny results.
Ground Truth was: he began a confused complaint against the wizard who had vanished behind the curtain.
Output was as following

and

for subword and subsampling tflite pretrained models that you uploaded on your drive.

Is there something that I am missing in particular?

Cannot access to dataset InfoRe

Hi,
I cannot access to dataset InfoRe. Could you please update the link?
Thank you!

about mask in conformer?

hello，
I see the multi-head-attention have mask input in your code , but I can't find any code that generate the mask，Is this a bug?thank you very much.

TODO

ValueError: The channel dimension of the inputs should be defined. Found 'None'

I tried to test with "python examples/conformer/train_conformer.py" when I encounted the above error. It happend in line 114:"conformer._build(speech_featurier.shape)" and the shape is [None,80,1], But in the last line of error, happened in "site-packages/tensorflow/python/keras/layers/convolutional.py:361 _get_input_channel" raise ValueError("The channel dimension of the inputs is None" I wondered what's wrong?

InvalidArgumentError when converting conformer model to tflite

I was trying to convert my conformer model to tflite using the provided python script but I've encountered this error. I've trained this model by modifying just its batch size to 8 and using my own vocabulary.

config.yml

# Copyright 2020 Huy Le Nguyen (@usimarit)
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
  
speech_config:
  sample_rate: 16000
  frame_ms: 25
  stride_ms: 10
  num_feature_bins: 80
  feature_type: log_mel_spectrogram
  preemphasis: 0.97
  normalize_signal: True
  normalize_feature: True
  normalize_per_feature: False

decoder_config:
  vocabulary: vocabularies/thai.characters
  target_vocab_size: 1024
  max_subword_length: 4
  blank_at_zero: True
  beam_width: 5
  norm_score: True

model_config:
  name: conformer
  subsampling:
    type: conv2d
    filters: 144
    kernel_size: 3
    strides: 2
  positional_encoding: sinusoid_concat
  dmodel: 144
  num_blocks: 16
  head_size: 36
  num_heads: 4
  mha_type: relmha
  kernel_size: 32
  fc_factor: 0.5
  dropout: 0.1
  embed_dim: 320
  embed_dropout: 0.1
  num_rnns: 1
  rnn_units: 320
  rnn_type: lstm
  layer_norm: True
  joint_dim: 320
  
learning_config:
  augmentations:
    after:
      time_masking:
        num_masks: 10
        mask_factor: 100
        p_upperbound: 0.05
      freq_masking:
        num_masks: 1
        mask_factor: 27

  dataset_config:
    train_paths:
      - /home/chompk/dataset/trainv1_transcript.tsv
    eval_paths:
      - /home/chompk/dataset/valv1_transcript.tsv
    test_paths:
      - /mnt/d/SpeechProcessing/Datasets/LibriSpeech/test-clean/transcripts.tsv
    tfrecords_dir: /home/chompk/dataset/tfrecords_databs16

  optimizer_config:
    warmup_steps: 40000
    beta1: 0.9
    beta2: 0.98
    epsilon: 1e-9

  running_config:
    batch_size: 8
    accumulation_steps: 4
    num_epochs: 20
    outdir: /home/chompk/dataset/conformer_bs8
    log_interval_steps: 200
    eval_interval_steps: 2000
    save_interval_steps: 1000

Error message:

---------------------------------------------------------------------------
InvalidArgumentError                      Traceback (most recent call last)
<ipython-input-7-c548bab089a8> in <module>
----> 1 tflite_model = converter.convert()

~/.env/lib/python3.8/site-packages/tensorflow/lite/python/lite.py in convert(self)
   1116         Invalid quantization parameters.
   1117     """
-> 1118     return super(TFLiteConverterV2, self).convert()
   1119 
   1120 

~/.env/lib/python3.8/site-packages/tensorflow/lite/python/lite.py in convert(self)
    919 
    920     frozen_func, graph_def = (
--> 921         _convert_to_constants.convert_variables_to_constants_v2_as_graph(
    922             self._funcs[0], lower_control_flow=False))
    923 

~/.env/lib/python3.8/site-packages/tensorflow/python/framework/convert_to_constants.py in convert_variables_to_constants_v2_as_graph(func, lower_control_flow, aggressive_inlining)
   1108       converter_data=converter_data)
   1109 
-> 1110   frozen_func = _construct_concrete_function(func, output_graph_def,
   1111                                              converted_input_indices)
   1112   return frozen_func, output_graph_def

~/.env/lib/python3.8/site-packages/tensorflow/python/framework/convert_to_constants.py in _construct_concrete_function(func, output_graph_def, converted_input_indices)
    999   new_input_names = [tensor.name for tensor in not_converted_inputs]
   1000   new_output_names = [tensor.name for tensor in func.outputs]
-> 1001   new_func = wrap_function.function_from_graph_def(output_graph_def,
   1002                                                    new_input_names,
   1003                                                    new_output_names)

~/.env/lib/python3.8/site-packages/tensorflow/python/eager/wrap_function.py in function_from_graph_def(graph_def, inputs, outputs)
    648     importer.import_graph_def(graph_def, name="")
    649 
--> 650   wrapped_import = wrap_function(_imports_graph_def, [])
    651   import_graph = wrapped_import.graph
    652   return wrapped_import.prune(

~/.env/lib/python3.8/site-packages/tensorflow/python/eager/wrap_function.py in wrap_function(fn, signature, name)
    618   if name is not None:
    619     func_graph_name = "wrapped_function_" + name
--> 620   return WrappedFunction(
    621       func_graph.func_graph_from_py_func(
    622           func_graph_name,

~/.env/lib/python3.8/site-packages/tensorflow/python/eager/wrap_function.py in __init__(self, fn_graph, variable_holder, attrs, signature)
    227     # properly reflects the new captured inputs.
    228     for f in fn_graph.as_graph_def().library.function:
--> 229       context.context().add_function_def(f)
    230     self._signature = signature
    231     super(WrappedFunction, self).__init__(fn_graph, attrs=attrs)

~/.env/lib/python3.8/site-packages/tensorflow/python/eager/context.py in add_function_def(self, fdef)
   1135     self.ensure_initialized()
   1136     fdef_string = fdef.SerializeToString()
-> 1137     pywrap_tfe.TFE_ContextAddFunctionDef(self._handle, fdef_string,
   1138                                          len(fdef_string))
   1139 

InvalidArgumentError: Attempting to add a duplicate function with name: __inference_standard_lstm_32975 where the previous and current definitions differ. Previous definition: signature {
  name: "__inference_standard_lstm_32975"
  input_arg {
    name: "inputs"
    type: DT_FLOAT
...

The error is very long and the full version is on pastebin

Issues running conformer example with RTX 3090

Hi!

First of all, very nice repository you have. Great work, I like your work.

I've been trying to run your example for the conformer with a rtx 3090 from the new nvidia series and I was wondering if its something you have tried/tested or even support.

Im running cuda 11.1 and cudnn cudnn-11.1-v8.0.5.39 and I tried running your installation commands with conda:

conda create -y -n tfasr tensorflow-gpu python=3.7 # tensorflow if using CPU
conda activate tfasr
pip install -U tensorflow-gpu # upgrade to latest version of tensorflow 
git clone https://github.com/TensorSpeech/TensorFlowASR.git
cd TensorFlowASR
python setup.py install

Then install the rnnt_loss with

export CUDA_HOME=/usr/local/cuda && ./scripts/install_rnnt_loss.sh

And got this output

Cloning into 'warp-transducer'...
remote: Enumerating objects: 20, done.
remote: Counting objects: 100% (20/20), done.
remote: Compressing objects: 100% (16/16), done.
remote: Total 914 (delta 2), reused 9 (delta 1), pack-reused 894
Receiving objects: 100% (914/914), 252.69 KiB | 1014.00 KiB/s, done.
Resolving deltas: 100% (463/463), done.
-- The C compiler identification is GNU 10.2.0
-- The CXX compiler identification is GNU 10.2.0
-- Check for working C compiler: /usr/bin/cc
-- Check for working C compiler: /usr/bin/cc -- works
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Detecting C compile features
-- Detecting C compile features - done
-- Check for working CXX compiler: /usr/bin/c++
-- Check for working CXX compiler: /usr/bin/c++ -- works
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Looking for pthread.h
-- Looking for pthread.h - found
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Failed
-- Looking for pthread_create in pthreads
-- Looking for pthread_create in pthreads - not found
-- Looking for pthread_create in pthread
-- Looking for pthread_create in pthread - found
-- Found Threads: TRUE  
-- Found CUDA: /usr/local/cuda (found version "11.1") 
-- cuda found TRUE
-- Building shared library with GPU support
-- Configuring done
-- Generating done
-- Build files have been written to: /mnt/kingston/github/TensorFlowASR/externals/warp-transducer/build
[  7%] Building NVCC (Device) object CMakeFiles/warprnnt.dir/src/warprnnt_generated_rnnt_entrypoint.cu.o
nvcc fatal   : Unsupported gpu architecture 'compute_30'
CMake Error at warprnnt_generated_rnnt_entrypoint.cu.o.cmake:220 (message):
  Error generating
  /mnt/kingston/github/TensorFlowASR/externals/warp-transducer/build/CMakeFiles/warprnnt.dir/src/./warprnnt_generated_rnnt_entrypoint.cu.o


make[2]: *** [CMakeFiles/warprnnt.dir/build.make:65: CMakeFiles/warprnnt.dir/src/warprnnt_generated_rnnt_entrypoint.cu.o] Error 1
make[1]: *** [CMakeFiles/Makefile2:191: CMakeFiles/warprnnt.dir/all] Error 2
make: *** [Makefile:130: all] Error 2
2020-11-22 19:08:46.598045: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1
Could not find libwarprnnt.so in ../build.
Build warp-rnnt and set WARP_RNNT_PATH to the location of libwarprnnt.so (default is '../build')

It looked like the warp-transducer wont compile so following this post I commented the following lines of the warp-transducer cmake file:

# set(CUDA_NVCC_FLAGS "${CUDA_NVCC_FLAGS} -gencode arch=compute_30,code=sm_30 -O2")
# set(CUDA_NVCC_FLAGS "${CUDA_NVCC_FLAGS} -gencode arch=compute_35,code=sm_35")

set(CUDA_NVCC_FLAGS "${CUDA_NVCC_FLAGS} -gencode arch=compute_50,code=sm_50")
# set(CUDA_NVCC_FLAGS "${CUDA_NVCC_FLAGS} -gencode arch=compute_52,code=sm_52")

And I was able to compile it. After this I tried running the example python examples/conformer/train_conformer.py but it wont start running due to a gpu error:

Run on 1 Physical GPUs
Traceback (most recent call last):
  File "examples/conformer/train_conformer.py", line 57, in <module>
    strategy = setup_strategy(args.devices)
  File "/home/jiwidi/anaconda3/envs/tf/lib/python3.7/site-packages/TensorFlowASR-0.3.1-py3.7.egg/tensorflow_asr/utils/__init__.py", line 63, in setup_strategy
  File "/home/jiwidi/anaconda3/envs/tf/lib/python3.7/site-packages/tensorflow/python/distribute/mirrored_strategy.py", line 269, in __init__
    self, devices=devices, cross_device_ops=cross_device_ops)
  File "/home/jiwidi/anaconda3/envs/tf/lib/python3.7/site-packages/tensorflow/python/distribute/mirrored_strategy.py", line 306, in __init__
    devices = devices or all_local_devices()
  File "/home/jiwidi/anaconda3/envs/tf/lib/python3.7/site-packages/tensorflow/python/distribute/mirrored_strategy.py", line 172, in all_local_devices
    devices = config.list_logical_devices("GPU")
  File "/home/jiwidi/anaconda3/envs/tf/lib/python3.7/site-packages/tensorflow/python/framework/config.py", line 403, in list_logical_devices
    return context.context().list_logical_devices(device_type=device_type)
  File "/home/jiwidi/anaconda3/envs/tf/lib/python3.7/site-packages/tensorflow/python/eager/context.py", line 1344, in list_logical_devices
    self.ensure_initialized()
  File "/home/jiwidi/anaconda3/envs/tf/lib/python3.7/site-packages/tensorflow/python/eager/context.py", line 539, in ensure_initialized
    context_handle = pywrap_tfe.TFE_NewContext(opts)
tensorflow.python.framework.errors_impl.InternalError: CUDA runtime implicit initialization on GPU:0 failed. Status: device kernel image is invalid

So I upgraded tensorflow to pip install tf-nightly-gpu==2.5.0.dev20201028 and solved it. Now im able to run the code
in the example script but I have loss equal to 0 and I wonder if this is something normal or could be a bug from my installation

[Train] [Epoch 1/20] |                    | 25/142680 [05:19<504:44:03, 12.74s/batch, transducer_loss=0.0]

Is this related to the tf version or the warp-transducer version? Has anyone run examples from this repository with the new nvidia 3000 cards? Could you provide me with some information about your installation?

Here is the full output from my execution of the conformer example:

Run on 1 Physical GPUs
Model: "conformer_encoder"
________________________________________________________________________________________________________________________
Layer (type)                                          Output Shape                                    Param #           
========================================================================================================================
conformer_encoder_subsampling (Conv2dSubsampling)     multiple                                        188208            
________________________________________________________________________________________________________________________
conformer_encoder_pe (PositionalEncodingConcat)       multiple                                        0                 
________________________________________________________________________________________________________________________
conformer_encoder_linear (Dense)                      multiple                                        414864            
________________________________________________________________________________________________________________________
conformer_encoder_dropout (Dropout)                   multiple                                        0                 
________________________________________________________________________________________________________________________
conformer_encoder_block_0 (ConformerBlock)            multiple                                        506736            
________________________________________________________________________________________________________________________
conformer_encoder_block_1 (ConformerBlock)            multiple                                        506736            
________________________________________________________________________________________________________________________
conformer_encoder_block_2 (ConformerBlock)            multiple                                        506736            
________________________________________________________________________________________________________________________
conformer_encoder_block_3 (ConformerBlock)            multiple                                        506736            
________________________________________________________________________________________________________________________
conformer_encoder_block_4 (ConformerBlock)            multiple                                        506736            
________________________________________________________________________________________________________________________
conformer_encoder_block_5 (ConformerBlock)            multiple                                        506736            
________________________________________________________________________________________________________________________
conformer_encoder_block_6 (ConformerBlock)            multiple                                        506736            
________________________________________________________________________________________________________________________
conformer_encoder_block_7 (ConformerBlock)            multiple                                        506736            
________________________________________________________________________________________________________________________
conformer_encoder_block_8 (ConformerBlock)            multiple                                        506736            
________________________________________________________________________________________________________________________
conformer_encoder_block_9 (ConformerBlock)            multiple                                        506736            
________________________________________________________________________________________________________________________
conformer_encoder_block_10 (ConformerBlock)           multiple                                        506736            
________________________________________________________________________________________________________________________
conformer_encoder_block_11 (ConformerBlock)           multiple                                        506736            
________________________________________________________________________________________________________________________
conformer_encoder_block_12 (ConformerBlock)           multiple                                        506736            
________________________________________________________________________________________________________________________
conformer_encoder_block_13 (ConformerBlock)           multiple                                        506736            
________________________________________________________________________________________________________________________
conformer_encoder_block_14 (ConformerBlock)           multiple                                        506736            
________________________________________________________________________________________________________________________
conformer_encoder_block_15 (ConformerBlock)           multiple                                        506736            
========================================================================================================================
Total params: 8,710,848
Trainable params: 8,706,240
Non-trainable params: 4,608
________________________________________________________________________________________________________________________
Model: "conformer_prediction"
________________________________________________________________________________________________________________________
Layer (type)                                          Output Shape                                    Param #           
========================================================================================================================
conformer_prediction_embedding (Embedding)            multiple                                        9280              
________________________________________________________________________________________________________________________
conformer_prediction_dropout (Dropout)                multiple                                        0                 
________________________________________________________________________________________________________________________
conformer_prediction_ln_0 (LayerNormalization)        multiple                                        640               
________________________________________________________________________________________________________________________
conformer_prediction_lstm_0 (LSTM)                    multiple                                        820480            
========================================================================================================================
Total params: 830,400
Trainable params: 830,400
Non-trainable params: 0
________________________________________________________________________________________________________________________
Model: "conformer_joint"
________________________________________________________________________________________________________________________
Layer (type)                                          Output Shape                                    Param #           
========================================================================================================================
conformer_joint_enc (Dense)                           multiple                                        46400             
________________________________________________________________________________________________________________________
conformer_joint_pred (Dense)                          multiple                                        102400            
________________________________________________________________________________________________________________________
conformer_joint_vocab (Dense)                         multiple                                        9309              
========================================================================================================================
Total params: 158,109
Trainable params: 158,109
Non-trainable params: 0
________________________________________________________________________________________________________________________
Model: "conformer"
________________________________________________________________________________________________________________________
Layer (type)                                          Output Shape                                    Param #           
========================================================================================================================
conformer_encoder (ConformerEncoder)                  multiple                                        8710848           
________________________________________________________________________________________________________________________
conformer_prediction (TransducerPrediction)           multiple                                        830400            
________________________________________________________________________________________________________________________
conformer_joint (TransducerJoint)                     multiple                                        158109            
========================================================================================================================
Total params: 9,699,357
Trainable params: 9,694,749
Non-trainable params: 4,608
________________________________________________________________________________________________________________________
Reading /mnt/kingston/asr-datasets/LibriSpeech/train-clean-100/transcripts.tsv ...
2020-11-22 19:36:11.117394: W tensorflow/core/grappler/optimizers/data/auto_shard.cc:654] In AUTO-mode, and switching to DATA-based sharding, instead of FILE-based sharding as we cannot find appropriate reader dataset op(s) to shard. Error: Found an unshardable source dataset: name: "TensorSliceDataset/_1"
op: "TensorSliceDataset"
input: "Placeholder/_0"
attr {
  key: "Toutput_types"
  value {
    list {
      type: DT_STRING
    }
  }
}
attr {
  key: "output_shapes"
  value {
    list {
      shape {
        dim {
          size: 2
        }
      }
    }
  }
}

Reading /mnt/kingston/asr-datasets/LibriSpeech/dev-clean/transcripts.tsv ...
Reading /mnt/kingston/asr-datasets/LibriSpeech/dev-other/transcripts.tsv ...
2020-11-22 19:36:11.158860: W tensorflow/core/grappler/optimizers/data/auto_shard.cc:654] In AUTO-mode, and switching to DATA-based sharding, instead of FILE-based sharding as we cannot find appropriate reader dataset op(s) to shard. Error: Found an unshardable source dataset: name: "TensorSliceDataset/_1"
op: "TensorSliceDataset"
input: "Placeholder/_0"
attr {
  key: "Toutput_types"
  value {
    list {
      type: DT_STRING
    }
  }
}
attr {
  key: "output_shapes"
  value {
    list {
      shape {
        dim {
          size: 2
        }
      }
    }
  }
}

[Train] |                    | 0/142680 [00:00<?, ?batch/s]2020-11-22 19:36:35.725016: W tensorflow/stream_executor/gpu/asm_compiler.cc:63] Running ptxas --version returned 256
2020-11-22 19:36:35.808323: W tensorflow/stream_executor/gpu/redzone_allocator.cc:314] Internal: ptxas exited with non-zero error code 256, output: 
Relying on driver to perform ptx compilation. 
Modify $PATH to customize ptxas location.
This message will be only logged once.
[Train] [Epoch 1/20] |                    | 1/142680 [00:37<1497:40:58, 37.79s/batch, transducer_loss=0.0]

Thanks

InfoRe data is not available anymore

Hi,

I have tried to download the InfoRe dataset with the torrent link provided in the Post shared by you. But there is no seed anymore so I cannot download it. Could you please share the dataset via Google Drive or other hosts?

Thank you in advance.

GPU Training broken on Conformer and Conformer with Gradient Accumulation

The transducer loss runs into a NaN and starts off already negative on the GPU (cuda10.2) - works fine on CPU.

Using supplied config - with small libri dataset.

transducer_loss is negative

Environment:

Ubuntu 20.04
TensorFlow 2.3
TensorFlowASR main

Command:

python examples/conformer/train_subword_conformer.py --tbs 8 --ebs 8 --mxp --devices 0 --cache --subwords ./output/librispeed \
--subwords_corpus \
/home/ubuntu/Data/LibriSpeechConformer/LibriSpeech/train-clean-100/transcripts.tsv \
/home/ubuntu/Data/LibriSpeechConformer/LibriSpeech/test-clean/transcripts.tsv \
/home/ubuntu/Data/LibriSpeechConformer/LibriSpeech/dev-clean/transcripts.tsv \
/home/ubuntu/Data/LibriSpeechConformer/LibriSpeech/dev-other/transcripts.tsv

Log:

Model: "conformer"
________________________________________________________________________________________________________________________
Layer (type)                                          Output Shape                                    Param #           
========================================================================================================================
conformer_encoder (ConformerEncoder)                  (None, None, 144)                               8710848           
________________________________________________________________________________________________________________________
conformer_prediction (TransducerPrediction)           (None, None, 320)                               1151040           
________________________________________________________________________________________________________________________
conformer_joint (TransducerJoint)                     (None, None, None, 1031)                        479751            
========================================================================================================================
Total params: 10,341,639
Trainable params: 10,337,031
Non-trainable params: 4,608
________________________________________________________________________________________________________________________
Reading /home/ubuntu/Data/LibriSpeechConformer/LibriSpeech/train-clean-100/transcripts.tsv ...
Reading /home/ubuntu/Data/LibriSpeechConformer/LibriSpeech/dev-clean/transcripts.tsv ...
Reading /home/ubuntu/Data/LibriSpeechConformer/LibriSpeech/dev-other/transcripts.tsv ...
[Train] |                    | 0/71340 [00:00<?, ?batch/s]2020-10-21 10:26:47.344713: W tensorflow/stream_executor/gpu/asm_compiler.cc:81] Running ptxas --version returned 256
2020-10-21 10:26:47.417289: W tensorflow/stream_executor/gpu/redzone_allocator.cc:314] Internal: ptxas exited with non-zero error code 256, output: 
Relying on driver to perform ptx compilation. 
Modify $PATH to customize ptxas location.
This message will be only logged once.
[Train] [Epoch 1/20] |                    | 45/71340 [02:59<71:52:55,  3.63s/batch, transducer_loss=-298.91907]

token level timestep

Is it possible to output token level timestep？

eg：
hello 100-600
world 712-900
.......

Setup: Could not find a version that satisfies the requirement ctc-decoders

Running pip install . after cloning the repository gives me:

ERROR: Could not find a version that satisfies the requirement ctc-decoders (from tiramisu-asr==0.0.1) (from versions: none)
ERROR: No matching distribution found for ctc-decoders (from tiramisu-asr==0.0.1)

How can I set this up correctly?

Enrionment

$ python --version
Python 3.7.0
$ uname -sv
Linux #100~16.04.1-Ubuntu SMP Wed Apr 22 23:56:30 UTC 2020

SimCLR loss for self-supervised speech representation learning

Hi,
Self-supervised pretraining for speech representation is a promising technique for developing ASR in resource-constraint languages with little transcribed data, and SimCLR is applied with success for this purpose in recent papers. Any plan to add support for SimCLR contrastive objective to this package?

Relevant papers

tensorspeech / tensorflowasr Goto Github PK

tensorflowasr's Introduction

TensorFlowASR ⚡

Almost State-of-the-art Automatic Speech Recognition in Tensorflow 2

What's New?

Table of Contents

😋 Supported Models

Baselines

Publications

Installation

Installing from source (recommended)

Installing via PyPi

Running in a container

Setup training and testing

TFLite Convertion

Features Extraction

Augmentations

Training & Testing Tutorial

Corpus Sources and Pretrained Models

English

Vietnamese

German

References & Credits

Contact

tensorflowasr's People

Contributors

Stargazers

Watchers

Forkers

tensorflowasr's Issues

Q: Why are we not storing the states of the EncoderNetwork like we do for the PredictionNetwork?

Enrionment

Recommend Projects

Recommend Topics

Recommend Org

Jobs