reith / deepspeech-playground Goto Github PK

View Code? Open in Web Editor NEW

23.0 6.0 4.0 70.35 MB

Baidu's DeepSpeech updated for better training

License: Apache License 2.0

Python 82.17% Shell 0.79% Jupyter Notebook 17.04%

deepspeech deep-learning

deepspeech-playground's Introduction

deepspeech-playground

This repo is a fork of Baidu's DeepSpeech. Unlike Baidu's repo:

It works with both Tensorflow and Theano
It has helpers for better training by training against auto-generated phonograms
Training by Theano can be much faster, since CTC calculation may be done by GPU

Training

If you want train by Theano you'll need Theano>=0.10 since It has bindings for Baidu's CTC.

Using Phonogram

HalfPhonemeModelWrapper class in model_wrp module implements training of a model with half of RNN layers trained for Phonorgrams and rest of them for actual output text. To generate Phonograms, Logios tool of CMU Sphinx can be used. Sphinx Phonogram symbols are called Arpabets. To generate Arpabets from Baidu's DeepSpeech description files you can:

$ cat train_corpus.json | sed -e 's/.*"text": "\([^"]*\)".*/\1/' > train_corpus.txt
# make_pronunciation.pl script is provided by logios
# https://github.com/skerit/cmusphinx/tree/master/logios/Tools/MakeDict
$ perl ./make_pronunciation.pl -tools ../ -dictdir .  -words prons/train_corpus.txt -dict prons/train_corpus.dict
$ python create_arpabet_json.py train_corpus.json train_corpus.dict train_corpus.arpadesc

Choose backend

Select Keras backend by environment variable KERAS_BACKEND to theano or tensorflow.

Train!

Make a train routine, a function like this:

def train_sample_half_phoneme(datagen, save_dir, epochs, sortagrad,
                              start_weights=False, mb_size=60):
    model_wrp = HalfPhonemeModelWrapper()
    model = model_wrp.compile(nodes=1000, conv_context=5, recur_layers=5)
    logger.info('model :\n%s' % (model.to_yaml(),))

    if start_weights:
        model.load_weights(start_weights)

    train_fn, test_fn = (model_wrp.compile_train_fn(1e-4),
                         model_wrp.compile_test_fn())
    trainer = Trainer(model, train_fn, test_fn, on_text=True, on_phoneme=True)
    trainer.run(datagen, save_dir, epochs=epochs, do_sortagrad=sortagrad,
                mb_size=mb_size, stateful=False)
    return trainer, model_wrp

And call it in from main() of train.py. Training can be done by:

$ KERAS_BACKEND="tensorflow" python train.py descs/small.arpadesc descs/test-clean.arpadesc models/test --epochs 20 --use-arpabets --sortagrad 1

Evaluation

visualize.py will give you a semi-shell for testing your model by giving it input files. There is also models-evaluation notebook, though it may look too dirty..

Pre-trained models

These models are trained for about three days by LibriSpeech corpus on a GTX 1080 Ti GPU:

A five layers unidirectional RNN model trained by LibriSpeech using Theano: mega, drive
A five layers unidirectional RNN model trained by LibriSpeech using Tensorflow: mega, drive

Validation ~~WER~~ CER of these models on test-clean is about %5 an It's about %15 on test-other.

deepspeech-playground's People

Contributors

Stargazers

Watchers

Forkers

zemosolabs liuguangyuan edresson clcarwin

deepspeech-playground's Issues

Want to train DeepSpeech for Indian English

Hi Reith,

Currently we are using a pre-trained DeepSpeech model for our project, we need to train the model for indian english, can I have some pointers or suggestions from you.

Regards,
Gaurav

model predicts blank for every test sample

Hello,
I tried testing the model 45-best-val-weights.h5 but was consistently getting blank outputs.
I tried testing with my own samples aswell as librispeech samples, but no change in result.
I am using the models-evaluation.ipynb notebook mentioned. No changes to the notebook except for the path inputs.
I am using it using latest Keras 2.0.8 and TF 1.3.0
Can u look into the matter.
Thanks

language model

can you please publish the language model you used in models-evaluation.ipynb ?

what is the language of your pre trained model

Trying to load sample model fails

When loading with the following command on current source branch and tensorflow trained model downloaded:

python visualize.py --interactive --weights-file pre-trained/45-best-val-weights.h5 --train-desc-file pre-trained/model_45_config.json

during interactive session, I set up model_wrp via commands provided, I receive the error as follows:

Traceback (most recent call last):
  File "visualize.py", line 218, in <module>
    main()
  File "visualize.py", line 205, in main
    args.weights_file)
  File "visualize.py", line 107, in interactive_vis
    model.load_weights(weights_file)
  File "/data/Documents/Projects/TUM/IDP/TabShare/TabShare/venv/lib/python3.5/site-packages/keras/engine/topology.py", line 2619, in load_weights
    load_weights_from_hdf5_group(f, self.layers)
  File "/data/Documents/Projects/TUM/IDP/TabShare/TabShare/venv/lib/python3.5/site-packages/keras/engine/topology.py", line 3068, in load_weights_from_hdf5_group
    str(len(filtered_layers)) + ' layers.')
ValueError: You are trying to load a weight file containing 7 layers into a model with 14 layers.
(venv) ➜  deepsp

What's the correct way of testing out the pretrained model?

No module named 'ctc'

In the model.py file, you imported ctc but it's not found inside the repo !!

'860-1000' dataset?

I found this repo very useful, and appreciated your effort on this! :)
Was wondering what "860-1000" dataset meant in your data_generator.py -- looked like a subset of librispeech dataset used for extracting mean/std. Do you have a sense if this mean/std can be applied to speech data from other domain or dataset?

How to combine my own custom model and pretrained deepspeech language model

Sir I have my own trained model. but it is working for that particular trained audios only. how to combine deepspeech pretrained language model, else other common voice model. sir help me. i am very low knowledge about this topic. thank you sir.

How to integrate/use this with tensorflow model ?

Hi I am trying to see how much improvement I get after combining this with deepspeech generated model. Do you have any ideas how to integrate these two ?

Keras package not found

Hi,
Can you specify what versions are you using for tensorflow and keras, and also Is the other package installation is same as of original baidu's deepspeech packages?

Using TensorFlow backend.
Traceback (most recent call last):
File "test.py", line 63, in
main(args.test_desc_file, args.model_config, args.weights_file)
File "test.py", line 36, in main
model_wrapper = load_model_wrapper(model_config_file, weights_file)
File "test.py", line 19, in load_model_wrapper
import model_wrp
File "/home/prashant/Documents/speech_recognition/speechRecognition/thirdpartyDP/deepspeech-playground/model_wrp.py", line 10, in
from keras.layers import (BatchNormalization, Dense, Input, GRU, concatenate,
ImportError: cannot import name concatenate

Python package dependencies are not clear

I couldn't be able to test the pre-trained model because Python package dependencies are unclear.

I tried to run python3 test.py but I got this error:

Using TensorFlow backend.
Traceback (most recent call last):
  File "test.py", line 10, in <module>
    from model import compile_test_fn
  File "/home/mertyildiran/Downloads/deepspeech-playground/model.py", line 6, in <module>
    import ctc
ImportError: No module named 'ctc'

What is ctc library? How can I install it? There is no library named ctc in PyPI. As a suggestion, it's better to have a requirements.txt file.

I tried to run python3 visualize.py but I got:

Using TensorFlow backend.
usage: visualize.py [-h] [--test_file TEST_FILE] [--load_dir LOAD_DIR]
                    [--weights_file WEIGHTS_FILE] [--interactive]
                    train_desc_file
visualize.py: error: the following arguments are required: train_desc_file

Which one of the files is train_desc_file file? Could you improve README.md by adding a step-by-step guide for testing pre-trained models. 😊 Thank so much, great work 👍

input shape for inference with new data

hello,first thank you for sharing your code, i tried to export the model to run inference and i would like to get some clarification on the input data shape, if i try to send an audio file (wav, 1 channel at 16khz) i get this error:

INVALID_ARGUMENT, details="input must be 4-dimensional[102400,1]
	 [[Node: conv_1d_1/convolution/Conv2D = Conv2D[T=DT_FLOAT, _output_shapes=[[?,1,?,1000]], data_format="NHWC", padding="VALID", strides=[1, 1, 2, 1], use_cudnn_on_gpu=true, _device="/job:localhost/replica:0/task:0/cpu:0"](conv_1d_1/convolution/ExpandDims, conv_1d_1/convolution/ExpandDims_1)]]")

thank you for your help