GithubHelp home page GithubHelp logo

reith / deepspeech-playground Goto Github PK

View Code? Open in Web Editor NEW
23.0 6.0 4.0 70.35 MB

Baidu's DeepSpeech updated for better training

License: Apache License 2.0

Python 82.17% Shell 0.79% Jupyter Notebook 17.04%
deepspeech deep-learning

deepspeech-playground's Introduction

deepspeech-playground

This repo is a fork of Baidu's DeepSpeech. Unlike Baidu's repo:

  • It works with both Tensorflow and Theano
  • It has helpers for better training by training against auto-generated phonograms
  • Training by Theano can be much faster, since CTC calculation may be done by GPU

Training

If you want train by Theano you'll need Theano>=0.10 since It has bindings for Baidu's CTC.

Using Phonogram

HalfPhonemeModelWrapper class in model_wrp module implements training of a model with half of RNN layers trained for Phonorgrams and rest of them for actual output text. To generate Phonograms, Logios tool of CMU Sphinx can be used. Sphinx Phonogram symbols are called Arpabets. To generate Arpabets from Baidu's DeepSpeech description files you can:

$ cat train_corpus.json | sed -e 's/.*"text": "\([^"]*\)".*/\1/' > train_corpus.txt
# make_pronunciation.pl script is provided by logios
# https://github.com/skerit/cmusphinx/tree/master/logios/Tools/MakeDict
$ perl ./make_pronunciation.pl -tools ../ -dictdir .  -words prons/train_corpus.txt -dict prons/train_corpus.dict
$ python create_arpabet_json.py train_corpus.json train_corpus.dict train_corpus.arpadesc

Choose backend

Select Keras backend by environment variable KERAS_BACKEND to theano or tensorflow.

Train!

Make a train routine, a function like this:

def train_sample_half_phoneme(datagen, save_dir, epochs, sortagrad,
                              start_weights=False, mb_size=60):
    model_wrp = HalfPhonemeModelWrapper()
    model = model_wrp.compile(nodes=1000, conv_context=5, recur_layers=5)
    logger.info('model :\n%s' % (model.to_yaml(),))

    if start_weights:
        model.load_weights(start_weights)

    train_fn, test_fn = (model_wrp.compile_train_fn(1e-4),
                         model_wrp.compile_test_fn())
    trainer = Trainer(model, train_fn, test_fn, on_text=True, on_phoneme=True)
    trainer.run(datagen, save_dir, epochs=epochs, do_sortagrad=sortagrad,
                mb_size=mb_size, stateful=False)
    return trainer, model_wrp

And call it in from main() of train.py. Training can be done by:

$ KERAS_BACKEND="tensorflow" python train.py descs/small.arpadesc descs/test-clean.arpadesc models/test --epochs 20 --use-arpabets --sortagrad 1

Evaluation

visualize.py will give you a semi-shell for testing your model by giving it input files. There is also models-evaluation notebook, though it may look too dirty..

Pre-trained models

These models are trained for about three days by LibriSpeech corpus on a GTX 1080 Ti GPU:

  • A five layers unidirectional RNN model trained by LibriSpeech using Theano: mega, drive
  • A five layers unidirectional RNN model trained by LibriSpeech using Tensorflow: mega, drive

Validation WER CER of these models on test-clean is about %5 an It's about %15 on test-other.

deepspeech-playground's People

Contributors

reith avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

deepspeech-playground's Issues

Want to train DeepSpeech for Indian English

Hi Reith,

Currently we are using a pre-trained DeepSpeech model for our project, we need to train the model for indian english, can I have some pointers or suggestions from you.

Regards,
Gaurav

model predicts blank for every test sample

Hello,
I tried testing the model 45-best-val-weights.h5 but was consistently getting blank outputs.
I tried testing with my own samples aswell as librispeech samples, but no change in result.
I am using the models-evaluation.ipynb notebook mentioned. No changes to the notebook except for the path inputs.
I am using it using latest Keras 2.0.8 and TF 1.3.0
Can u look into the matter.
Thanks

language model

can you please publish the language model you used in models-evaluation.ipynb ?

Trying to load sample model fails

When loading with the following command on current source branch and tensorflow trained model downloaded:

python visualize.py --interactive --weights-file pre-trained/45-best-val-weights.h5 --train-desc-file pre-trained/model_45_config.json

during interactive session, I set up model_wrp via commands provided, I receive the error as follows:

Traceback (most recent call last):
  File "visualize.py", line 218, in <module>
    main()
  File "visualize.py", line 205, in main
    args.weights_file)
  File "visualize.py", line 107, in interactive_vis
    model.load_weights(weights_file)
  File "/data/Documents/Projects/TUM/IDP/TabShare/TabShare/venv/lib/python3.5/site-packages/keras/engine/topology.py", line 2619, in load_weights
    load_weights_from_hdf5_group(f, self.layers)
  File "/data/Documents/Projects/TUM/IDP/TabShare/TabShare/venv/lib/python3.5/site-packages/keras/engine/topology.py", line 3068, in load_weights_from_hdf5_group
    str(len(filtered_layers)) + ' layers.')
ValueError: You are trying to load a weight file containing 7 layers into a model with 14 layers.
(venv) โžœ  deepsp

What's the correct way of testing out the pretrained model?

'860-1000' dataset?

I found this repo very useful, and appreciated your effort on this! :)
Was wondering what "860-1000" dataset meant in your data_generator.py -- looked like a subset of librispeech dataset used for extracting mean/std. Do you have a sense if this mean/std can be applied to speech data from other domain or dataset?

Keras package not found

Hi,
Can you specify what versions are you using for tensorflow and keras, and also Is the other package installation is same as of original baidu's deepspeech packages?

Using TensorFlow backend.
Traceback (most recent call last):
File "test.py", line 63, in
main(args.test_desc_file, args.model_config, args.weights_file)
File "test.py", line 36, in main
model_wrapper = load_model_wrapper(model_config_file, weights_file)
File "test.py", line 19, in load_model_wrapper
import model_wrp
File "/home/prashant/Documents/speech_recognition/speechRecognition/thirdpartyDP/deepspeech-playground/model_wrp.py", line 10, in
from keras.layers import (BatchNormalization, Dense, Input, GRU, concatenate,
ImportError: cannot import name concatenate

Python package dependencies are not clear

I couldn't be able to test the pre-trained model because Python package dependencies are unclear.

I tried to run python3 test.py but I got this error:

Using TensorFlow backend.
Traceback (most recent call last):
  File "test.py", line 10, in <module>
    from model import compile_test_fn
  File "/home/mertyildiran/Downloads/deepspeech-playground/model.py", line 6, in <module>
    import ctc
ImportError: No module named 'ctc'

What is ctc library? How can I install it? There is no library named ctc in PyPI. As a suggestion, it's better to have a requirements.txt file.

I tried to run python3 visualize.py but I got:

Using TensorFlow backend.
usage: visualize.py [-h] [--test_file TEST_FILE] [--load_dir LOAD_DIR]
                    [--weights_file WEIGHTS_FILE] [--interactive]
                    train_desc_file
visualize.py: error: the following arguments are required: train_desc_file

Which one of the files is train_desc_file file? Could you improve README.md by adding a step-by-step guide for testing pre-trained models. ๐Ÿ˜Š Thank so much, great work ๐Ÿ‘

input shape for inference with new data

hello,first thank you for sharing your code, i tried to export the model to run inference and i would like to get some clarification on the input data shape, if i try to send an audio file (wav, 1 channel at 16khz) i get this error:

INVALID_ARGUMENT, details="input must be 4-dimensional[102400,1]
	 [[Node: conv_1d_1/convolution/Conv2D = Conv2D[T=DT_FLOAT, _output_shapes=[[?,1,?,1000]], data_format="NHWC", padding="VALID", strides=[1, 1, 2, 1], use_cudnn_on_gpu=true, _device="/job:localhost/replica:0/task:0/cpu:0"](conv_1d_1/convolution/ExpandDims, conv_1d_1/convolution/ExpandDims_1)]]")

thank you for your help

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.