mozilla / deepspeech-playbook Goto Github PK

View Code? Open in Web Editor NEW

23.0 9.0 4.0 380 KB

A crash course for training speech recognition models using DeepSpeech.

Home Page: https://mozilla.github.io/deepspeech-playbook/

License: Other

deepspeech speech-recognition acoustic-model language-model common-voice

deepspeech-playbook's Introduction

DeepSpeech Playbook

A crash course on training speech recognition models using DeepSpeech.

Quick links

Introduction

Start here. This section will set your expectations for what you can achieve with the DeepSpeech Playbook, and the prerequisites you'll need to start to train your own speech recognition models.

Once you know what you can achieve with the DeepSpeech Playbook, this section provides an overview of DeepSpeech itself, its component parts, and how it differs from other speech recognition engines you may have used in the past.

Formatting your training data

Before you can train a model, you will need to collect and format your corpus of data. This section provides an overview of the data format required for DeepSpeech, and walks through an example in prepping a dataset from Common Voice.

The alphabet.txt file

If you are training a model that uses a different alphabet to English, for example a language with diacritical marks, then you will need to modify the alphabet.txt file.

Building your own scorer

Learn what the scorer does, and how you can go about building your own.

Acoustic model and language model

Learn about the differences between DeepSpeech's acoustic model and language model and how they combine to provide end to end speech recognition.

Setting up your training environment

This section walks you through building a Docker image, and spawning DeepSpeech in a Docker container with persistent storage. This approach avoids the complexities of dependencies such as tensorflow.

Training a model

Once you have your training data formatted, and your training environment established, this section will show you how to train a model, and provide guidance for overcoming common pitfalls.

Testing a model

Once you've trained a model, you will need to validate that it works for the context it's been designed for. This section walks you through this process.

Deploying your model

Once trained and tested, your model is deployed. This section provides an overview of how you can deploy your model.

Applying DeepSpeech to real world problems

This section covers specific use cases where DeepSpeech can be applied to real world problems, such as transcription, keyword searching and voice controlled applications.

Setting up Continuous Integration

Learn how to set up Continuous Integration (CI) for your own fork of DeepSpeech. Intended for developers who are utilising DeepSpeech for their own specific use cases.

Introductory courses on machine learning

Providing an introduction to machine learning is beyond the scope of this PlayBook, howevever having an understanding of machine learning and deep learning concepts will aid your efforts in training speech recognition models with DeepSpeech.

Here, we've linked to several resources that you may find helpful; they're listed in the order we recommend reading them in.

Digital Ocean's introductory machine learning tutorial provides an overview of different types of machine learning. The diagrams in this tutorial are a great way of explaining key concepts.
Google's machine learning crash course provides a gentle introduction to the main concepts of machine learning, including gradient descent, learning rate, training, test and validation sets and overfitting.
If machine learning is something that sparks your interest, then you may enjoy the MIT Open Learning Library's Introduction to Machine Learning course, a 13-week college-level course covering perceptrons, neural networks, support vector machines and convolutional neural networks.

How you can help provide feedback on the DeepSpeech PlayBook

You can help to make the DeepSpeech PlayBook even better by providing via a GitHub Issue

Please try these instructions, particularly for building a Docker image and running a Docker container, on multiple distributions of Linux so that we can identify corner cases.
Please contribute your tacit knowledge - such as:
- common errors encountered in data formatting, environment setup, training and validation
- techniques or approaches for improving the scorer, alphabet file or the accuracy of Word Error Rate (WER) and Character Error Rate (CER).
- case studies of the work you or your organisation have been doing, showing your approaches to data validation, training or evaluation.
Please identify errors in text - with many eyes, bugs are shallow :-)

deepspeech-playbook's People

Contributors

Stargazers

Watchers

Forkers

trisgelar vladhornai https-github-com-yelinee-capstone sanjaykrkundu

deepspeech-playbook's Issues

Importing CV data in DATA_FORMATTING.md fails due to `sox` deps not in Docker Hub image

The instructions given for importing Common Voice datasets in DATA_FORMATTING.md fail as the Docker Hub training image for DeepSpeech does not include sox dependencies.

If you try to import Common Voice using the current instructions, it will fail with:

root@c7f3e6f3c302:/DeepSpeech# bin/import_cv2.py deepspeech-data/cv-corpus-6.1-2020-12-11/vi
/bin/sh: 1: sox: not found
SoX could not be found!

    If you do not have SoX, proceed here:
     - - - http://sox.sourceforge.net/ - - -

    If you do (or think that you should) have SoX, double-check your
    path variables.
    
Loading TSV file:  /DeepSpeech/deepspeech-data/cv-corpus-6.1-2020-12-11/vi/test.tsv
Importing mp3 files...
WARNING: No --validate_label_locale specified, your might end with inconsistent dataset.
WARNING: No --validate_label_locale specified, your might end with inconsistent dataset.
WARNING: No --validate_label_locale specified, your might end with inconsistent dataset.
WARNING: No --validate_label_locale specified, your might end with inconsistent dataset.
WARNING: No --validate_label_locale specified, your might end with inconsistent dataset.
WARNING: No --validate_label_locale specified, your might end with inconsistent dataset.
WARNING: No --validate_label_locale specified, your might end with inconsistent dataset.
WARNING: No --validate_label_locale specified, your might end with inconsistent dataset.
WARNING: No --validate_label_locale specified, your might end with inconsistent dataset.
WARNING: No --validate_label_locale specified, your might end with inconsistent dataset.
WARNING: No --validate_label_locale specified, your might end with inconsistent dataset.
WARNING: No --validate_label_locale specified, your might end with inconsistent dataset.
WARNING: No --validate_label_locale specified, your might end with inconsistent dataset.
WARNING: No --validate_label_locale specified, your might end with inconsistent dataset.
WARNING: No --validate_label_locale specified, your might end with inconsistent dataset.
WARNING: No --validate_label_locale specified, your might end with inconsistent dataset.
This install of SoX cannot process .mp3 files.
This install of SoX cannot process .mp3 files.
This install of SoX cannot process .mp3 files.
This install of SoX cannot process .mp3 files.
This install of SoX cannot process .mp3 files.
This install of SoX cannot process .mp3 files.
This install of SoX cannot process .mp3 files.
This install of SoX cannot process .mp3 files.
This install of SoX cannot process .mp3 files.
This install of SoX cannot process .mp3 files.
This install of SoX cannot process .mp3 files.
This install of SoX cannot process .mp3 files.
This install of SoX cannot process .mp3 files.
This install of SoX cannot process .mp3 files.
This install of SoX cannot process .mp3 files.
This install of SoX cannot process .mp3 files.
This install of SoX cannot process .mp3 files.
This install of SoX cannot process .mp3 files.
This install of SoX cannot process .mp3 files.
This install of SoX cannot process .mp3 files.
This install of SoX cannot process .mp3 files.
This install of SoX cannot process .mp3 files.
This install of SoX cannot process .mp3 files.
This install of SoX cannot process .mp3 files.
This install of SoX cannot process .mp3 files.
This install of SoX cannot process .mp3 files.
This install of SoX cannot process .mp3 files.
This install of SoX cannot process .mp3 files.
This install of SoX cannot process .mp3 files.
This install of SoX cannot process .mp3 files.
This install of SoX cannot process .mp3 files.
This install of SoX cannot process .mp3 files.
This install of SoX cannot process .mp3 files.
This install of SoX cannot process .mp3 files.
This install of SoX cannot process .mp3 files.
This install of SoX cannot process .mp3 files.
This install of SoX cannot process .mp3 files.
This install of SoX cannot process .mp3 files.
This install of SoX cannot process .mp3 files.
This install of SoX cannot process .mp3 files.
This install of SoX cannot process .mp3 files.
This install of SoX cannot process .mp3 files.
This install of SoX cannot process .mp3 files.
This install of SoX cannot process .mp3 files.
multiprocessing.pool.RemoteTraceback: 
"""
Traceback (most recent call last):
  File "/usr/lib/python3.6/multiprocessing/pool.py", line 119, in worker
    result = (True, func(*args, **kwds))
  File "bin/import_cv2.py", line 65, in one_sample
    _maybe_convert_wav(mp3_filename, wav_filename)
  File "bin/import_cv2.py", line 185, in _maybe_convert_wav
    transformer.build(mp3_filename, wav_filename)
  File "/usr/local/lib/python3.6/dist-packages/sox/transform.py", line 594, in build
    input_filepath, input_array, sample_rate_in
  File "/usr/local/lib/python3.6/dist-packages/sox/transform.py", line 496, in _parse_inputs
    input_format['channels'] = file_info.channels(input_filepath)
  File "/usr/local/lib/python3.6/dist-packages/sox/file_info.py", line 82, in channels
    output = soxi(input_filepath, 'c')
  File "/usr/local/lib/python3.6/dist-packages/sox/core.py", line 149, in soxi
    stderr=subprocess.PIPE
  File "/usr/lib/python3.6/subprocess.py", line 356, in check_output
    **kwargs).stdout
  File "/usr/lib/python3.6/subprocess.py", line 423, in run
    with Popen(*popenargs, **kwargs) as process:
  File "/usr/lib/python3.6/subprocess.py", line 729, in __init__
    restore_signals, start_new_session)
  File "/usr/lib/python3.6/subprocess.py", line 1364, in _execute_child
    raise child_exception_type(errno_num, err_msg, err_filename)
FileNotFoundError: [Errno 2] No such file or directory: 'sox': 'sox'
"""This install of SoX cannot process .mp3 files.
This install of SoX cannot process .mp3 files.
This install of SoX cannot process .mp3 files.
This install of SoX cannot process .mp3 files.
This install of SoX cannot process .mp3 files.
This install of SoX cannot process .mp3 files.
This install of SoX cannot process .mp3 files.
This install of SoX cannot process .mp3 files.
This install of SoX cannot process .mp3 files.
This install of SoX cannot process .mp3 files.
This install of SoX cannot process .mp3 files.
This install of SoX cannot process .mp3 files.
This install of SoX cannot process .mp3 files.
This install of SoX cannot process .mp3 files.
This install of SoX cannot process .mp3 files.
This install of SoX cannot process .mp3 files.
This install of SoX cannot process .mp3 files.
This install of SoX cannot process .mp3 files.
This install of SoX cannot process .mp3 files.
This install of SoX cannot process .mp3 files.
This install of SoX cannot process .mp3 files.
This install of SoX cannot process .mp3 files.
This install of SoX cannot process .mp3 files.
This install of SoX cannot process .mp3 files.
This install of SoX cannot process .mp3 files.
This install of SoX cannot process .mp3 files.
This install of SoX cannot process .mp3 files.
This install of SoX cannot process .mp3 files.
This install of SoX cannot process .mp3 files.
This install of SoX cannot process .mp3 files.
This install of SoX cannot process .mp3 files.
This install of SoX cannot process .mp3 files.
This install of SoX cannot process .mp3 files.
This install of SoX cannot process .mp3 files.
This install of SoX cannot process .mp3 files.
This install of SoX cannot process .mp3 files.
This install of SoX cannot process .mp3 files.
This install of SoX cannot process .mp3 files.
This install of SoX cannot process .mp3 files.
This install of SoX cannot process .mp3 files.
This install of SoX cannot process .mp3 files.
This install of SoX cannot process .mp3 files.
This install of SoX cannot process .mp3 files.
This install of SoX cannot process .mp3 files.
This install of SoX cannot process .mp3 files.
This install of SoX cannot process .mp3 files.
This install of SoX cannot process .mp3 files.
This install of SoX cannot process .mp3 files.
This install of SoX cannot process .mp3 files.
This install of SoX cannot process .mp3 files.
This install of SoX cannot process .mp3 files.
This install of SoX cannot process .mp3 files.
This install of SoX cannot process .mp3 files.
This install of SoX cannot process .mp3 files.
This install of SoX cannot process .mp3 files.
This install of SoX cannot process .mp3 files.


The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "bin/import_cv2.py", line 221, in <module>
This install of SoX cannot process .mp3 files.
This install of SoX cannot process .mp3 files.
This install of SoX cannot process .mp3 files.
This install of SoX cannot process .mp3 files.
This install of SoX cannot process .mp3 files.
This install of SoX cannot process .mp3 files.
This install of SoX cannot process .mp3 files.
This install of SoX cannot process .mp3 files.
This install of SoX cannot process .mp3 files.
This install of SoX cannot process .mp3 files.
This install of SoX cannot process .mp3 files.
This install of SoX cannot process .mp3 files.
This install of SoX cannot process .mp3 files.
This install of SoX cannot process .mp3 files.
This install of SoX cannot process .mp3 files.
This install of SoX cannot process .mp3 files.
This install of SoX cannot process .mp3 files.
This install of SoX cannot process .mp3 files.
This install of SoX cannot process .mp3 files.
This install of SoX cannot process .mp3 files.
    main()
  File "bin/import_cv2.py", line 216, in main
    _preprocess_data(PARAMS.tsv_dir, audio_dir, PARAMS.space_after_every_character)
  File "bin/import_cv2.py", line 172, in _preprocess_data
    set_samples = _maybe_convert_set(dataset, tsv_dir, audio_dir, space_after_every_character)
  File "bin/import_cv2.py", line 127, in _maybe_convert_set
This install of SoX cannot process .mp3 files.
This install of SoX cannot process .mp3 files.
    for i, processed in enumerate(pool.imap_unordered(one_sample, samples), start=1):
This install of SoX cannot process .mp3 files.
  File "/usr/lib/python3.6/multiprocessing/pool.py", line 735, in next
This install of SoX cannot process .mp3 files.
This install of SoX cannot process .mp3 files.
    raise value
FileNotFoundError: [Errno 2] No such file or directory: 'sox': 'sox'
This install of SoX cannot process .mp3 files.

upstream PR at:
mozilla/DeepSpeech#3488

docker pull should refer to stable rather than latest

Moved from https://github.com/JRMeyer/deepspeech-playbook/issues/32 raised by @lissyx

docker pull mozilla/deepspeech-train:latest is going to pull latest push from master tag which not unlikely to be broken
We should rather direct people to a stable tag, e.g., v0.9.3 like mozilla/deepspeech-train:v0.9.3

@KathyReid Do you think it's good enough to point to a fixed version tag? We should be able to augment the code that push to docker hub to also make a tag for this, but it would require some work.

Add information to TRAINING.md on `dropout_rate`

The default dropout_rate parameter for training, documented at;
https://deepspeech.readthedocs.io/en/v0.9.3/Flags.html#training-flags
is 0.05.

Looking at Discourse, for small datasets, a dropout_rate of 0.3 or 0.4 gives better results.

This should be documented in TRAINING.md.

Add example to ALPHABET.md on how to diagnose a mis-matched alphabet

In ALPHABET.md we have the following information, which would be made stronger with an example. This Issue is to add information on how to diagnose a mis-matched alphabet.txt file - by providing a worked example.

### How to diagnose mis-matched alphabets?

If you think you used different alphabets to create a [language model and an acoustic model](AM_vs_LM.md), try decoding _without_ the scorer. If you can decode the audio without a scorer and the output is reasonable, but when you decode the same audio with a scorer, and the output is _not_ reasonable, then you could have mis-matched alphabets. Usually the easiest way to fix this is to re-compile the scorer with the correct alphabet.

[Read more information on building a language model (scorer)](SCORER.md).

Provide some GPUs recommandations / hints

It would be useful to help people scoping the requirements in term of GPUs to properly set expectations:

how much VRAM required for training from scratch
how much VRAM for transfer learning
some ratio of GPU model / audio volume / training time

Leverage existing community-oriented docker work?

Raised by @lissyx in https://github.com/JRMeyer/deepspeech-playbook/issues/15

https://github.com/Common-Voice/commonvoice-fr/blob/master/DeepSpeech/

This is work I have been conducting explicitely to factorize and simplify most of the pipeline to help community build their own models. To the best of my knowledge, several languages are building on top of this, including Italian, Kabyle and Spanish.

I think it would be wise instead of re-inventing the wheel to move this work out of that repo and into a new one, and use that for the Playbook.

TESTING.md - does not contain example code to run a test

Picked up via Twitter by @hammono with thanks

In TESTING.md, there is no code example to show how to run a test on a language or acoustic model.
Code examples should be shown.

Add information on checkpoint storage to TRAINING.md

Per discussions with @JRMeyer;

the name of the checkpoint file is how many steps the model has taken, ever. If the model has trained for 10 epochs at 500 steps per epoch, the final checkpoints will be labeled with "5000". The checkpoint files are saved not at the end of every epoch, but at a set time interval [link]. the default is every 10 minutes

Information needed in PlayBook on:

naming format of checkpoint files
how to alter the time interval for checkpointing - ie for more frequent or less frequent checkpointing - it takes up a fair amount of hard drive space so it's useful to be able to tweak it.

Add GitHub Actions Continuous Integration information to the PlayBook

User Story

As a developer who has forked DeepSpeech to extend it for a particular use case, I want to be able to set up Continuous Integration so that I can test my fork and assure that it is exhibiting correct behaviour.
As the Applied ML team manager, I want to have the CI pipeline for DeepSpeech well documented so that it reduces the dependency the community has on my team for support.

Acceptance criteria

Outlines the concept of events, workflows, jobs, runners
Provides enough information so someone who forks DeepSpeech can run CI and know where the CI files are for DeepSpeech

Add example to SCORER.md on how to rebuild scorer

Community feedback has indicated that we need a tutorial in SCORER.md on how to rebuild the scorer / language model for specific vocabulary, or a new language. This Issue is to update SCORER.md with an example of rebuilding the scorer.

people should refer to docker images using tags

Moved from https://github.com/JRMeyer/deepspeech-playbook/issues/33 raised by @lissyx

You will need the id of the Docker image that you created when you set up your DeepSpeech training environment.

We should rather instruct people to apply tags to their image so they can e.g., docker run [...] deepspeech-local:stable.

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.

Jobs

Jooble