TiramisuASR 🍰

The Newest Automatic Speech Recognition in Tensorflow 2

TiramisuASR implements some speech recognition and speech enhancement architectures such as CTC-based models (Deep Speech 2, etc.), Speech Enhancement Generative Adversarial Network (SEGAN), RNN Transducer (Conformer, etc.). These models can be converted to TFLite to reduce memory and computation for deployment 😄

What's New?

Support transducer tflite greedy decoding (conversion and invocation)
Distributed training using tf.distribute.MirroredStrategy
Fixed transducer beam search
Add log_gammatone_spectrogram

😋 Supported Models

CTCModel (End2end models using CTC Loss for training)
SEGAN (Refer to https://github.com/santi-pdp/segan), see examples/segan
Transducer Models (End2end models using RNNT Loss for training)
Conformer Transducer (Reference: https://arxiv.org/abs/2005.08100) See examples/conformer

Requirements

Ubuntu distribution (ctc-decoders and semetrics require some packages from apt)
Python 3.6+
Tensorflow 2.2+: pip install tensorflow

Setup Environment and Datasets

Install gammatone: pip3 install git+https://github.com/detly/gammatone.git

Install tensorflow: pip3 install tensorflow or pip3 install tf-nightly (for using tflite)

Install packages: python3 setup.py install

For setting up datasets, see datasets

For training, testing and using CTC Models, run ./scripts/install_ctc_decoders.sh
For training Transducer Models, export CUDA_HOME and run ./scripts/install_rnnt_loss.sh
For testing Speech Enhancement Model (i.e SEGAN), install octave and run ./scripts/install_semetrics.sh
Method tiramisu_asr.utils.setup_environment() automatically enable mixed_precision if available.
To enable XLA, run TF_XLA_FLAGS=--tf_xla_auto_jit=2 $python_train_script

Clean up: python3 setup.py clean --all (this will remove /build contents)

TFLite Convertion

After converting to tflite, the tflite model is like a function that transforms directly from an audio signal to unicode code points, then we can convert unicode points to string.

Install tf-nightly using pip install tf-nightly
Build a model with the same architecture as the trained model (if model has tflite argument, you must set it to True), then load the weights from trained model to the built model
Load TFSpeechFeaturizer and TextFeaturizer to model using function add_featurizers
Convert model's function to tflite as follows:

func = model.make_tflite_function(greedy=True) # or False
concrete_func = func.get_concrete_function()
converter = tf.lite.TFLiteConverter.from_concrete_functions([concrete_func])
converter.experimental_new_converter = True
converter.optimizations = [tf.lite.Optimize.DEFAULT]
converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS,
                                       tf.lite.OpsSet.SELECT_TF_OPS]
tflite_model = converter.convert()

Save the converted tflite model as follows:

if not os.path.exists(os.path.dirname(tflite_path)):
    os.makedirs(os.path.dirname(tflite_path))
with open(tflite_path, "wb") as tflite_out:
    tflite_out.write(tflite_model)

Then the .tflite model is ready to be deployed

Features Extraction

See features_extraction

Augmentations

See augmentations

Training & Testing

Example YAML Config Structure

speech_config: ...
model_config: ...
decoder_config: ...
learning_config:
  augmentations: ...
  dataset_config:
    train_paths: ...
    eval_paths: ...
    test_paths: ...
    tfrecords_dir: ...
  optimizer_config: ...
  running_config:
    batch_size: 8
    num_epochs: 20
    outdir: ...
    log_interval_steps: 500

See examples for some predefined ASR models.

Corpus Sources and Pretrained Models

For pretrained models, go to drive

English

Name	Source	Hours
LibriSpeech	LibriSpeech	970h
Common Voice	https://commonvoice.mozilla.org	1932h

Vietnamese

Name	Source	Hours
Vivos	https://ailab.hcmus.edu.vn/vivos	15h
InfoRe Technology 1	InfoRe1 (passwd: BroughtToYouByInfoRe)	25h
InfoRe Technology 2 (used in VLSP2019)	InfoRe2 (passwd: BroughtToYouByInfoRe)	415h

German

Name	Source	Hours
Common Voice	https://commonvoice.mozilla.org/	750h

whitefu / tiramisuasr Goto Github PK

tiramisuasr's Introduction

TiramisuASR 🍰

The Newest Automatic Speech Recognition in Tensorflow 2

What's New?

😋 Supported Models

Requirements

Setup Environment and Datasets

TFLite Convertion

Features Extraction

Augmentations

Training & Testing

Corpus Sources and Pretrained Models

English

Vietnamese

German

References & Credits

tiramisuasr's People

Contributors

Watchers

Recommend Projects

Recommend Topics

Recommend Org

Jobs