GithubHelp home page GithubHelp logo

adhnguyen / deepspeech-german Goto Github PK

View Code? Open in Web Editor NEW

This project forked from ynop/deepspeech-german

0.0 0.0 0.0 33 KB

Scripts for training Mozilla's DeepSpeech using german speech data

Python 75.03% Shell 24.97%

deepspeech-german's Introduction

DeepSpeech German

This repository contains scripts used to test deepspeech (https://github.com/mozilla/DeepSpeech) (v0.2.0-alpha.8). This is just for prototyping. The results, on speech that is noisy or very dissimilar to the training data, are really bad.

Used data

Language model

For the language model a text corpus is used, also provided by the people of the " German speech data corpus" (https://www.inf.uni-hamburg.de/en/inst/ab/lt/resources/data/acoustic-models.html).

Speech data

Training

First the following path have to be defined:

  • tuda_corpus_path: Path where the German Distant Speech Corpus is stored.
  • voxforge_corpus_path: Path where the Voxforge German Speech data is stored.
  • swc_corpus_path: Path where the Spoken Wikipedia Corpus is stored.
  • text_corpus_path: Path where the text corpus is stored.
  • exp_path: A directory where all output files are written to.
  • kenlm_bin: Path to the kenLM tool
  • deepspeech: Path to the cloned DeepSpeech repository

The commands are expected to be executed from the path where this repository is cloned. Take a look at run_all.sh as an example for executing all the commands.

Install Python Requirements

pip install -r requirements.txt

For requirements regarding DeepSpeech checkout their repository. For the native-client with gpu use:

python3 util/taskcluster.py --target native_client --branch "v0.2.0-alpha.8" --arch gpu

Download the data

  1. Download the text corpus from http://ltdata1.informatik.uni-hamburg.de/kaldi_tuda_de/German_sentences_8mil_filtered_maryfied.txt.gz and store it to text_corpus_path.
  2. Download the German Distant Speech Corpus (TUDA) from http://www.repository.voxforge1.org/downloads/de/german-speechdata-package-v2.tar.gz and store it to tuda_corpus_path.
  3. Download the Spoken Wikipedia Corpus (SWC) from https://nats.gitlab.io/swc/ and prepare it according to https://audiomate.readthedocs.io/en/latest/documentation/indirect_support.html.
  4. Download the Voxforge German Speech data (via pingu python library):
from audiomate.corpus import io

dl = io.VoxforgeDownloader(lang='de')
dl.download(voxforge_corpus_path)

Prepare audio data

# prepare_data.py creates the csv files defining the audio data used for training
./prepare_data.py $tuda_corpus_path $voxforge_corpus_path $exp_path/data

Build LM

# First the text is normalized and cleaned.
./prepare_vocab.py $text_corpus_path $exp_path/clean_vocab.txt

# KenLM is used to build the LM
$kenlm_bin/lmplz --text $exp_path/clean_vocab.txt --arpa $exp_path/words.arpa --o 3
$kenlm_bin/build_binary -T -s $exp_path/words.arpa $exp_path/lm.binary

Build trie

# The deepspeech tools are used to create the trie
$deepspeech/native_client/generate_trie data/alphabet.txt $exp_path/lm.binary $exp_path/clean_vocab.txt $exp_path/trie

Run training

./run_training.sh $deepspeech $(realpath data/alphabet.txt) $exp_path

Results

I Test of Epoch 19 - WER: 0.667205, loss: 69.56213065852289, mean edit distance: 0.287312
I --------------------------------------------------------------------------------
I WER: 0.333333, loss: 0.544307, mean edit distance: 0.200000
I  - src: "p c b"
I  - res: "p c "
I --------------------------------------------------------------------------------
I WER: 0.500000, loss: 0.533773, mean edit distance: 0.142857
I  - src: "oder bundesrat"
I  - res: "der bundesrat "
I --------------------------------------------------------------------------------
I WER: 1.000000, loss: 0.102555, mean edit distance: 0.125000
I  - src: "handlung"
I  - res: "handlunge"
I --------------------------------------------------------------------------------
I WER: 1.000000, loss: 0.152456, mean edit distance: 0.500000
I  - src: "erde"
I  - res: "er "
I --------------------------------------------------------------------------------
I WER: 1.000000, loss: 0.152456, mean edit distance: 0.500000
I  - src: "erde"
I  - res: "er "
I --------------------------------------------------------------------------------
I WER: 1.000000, loss: 0.555418, mean edit distance: 0.500000
I  - src: "form"
I  - res: "vor "
I --------------------------------------------------------------------------------
I WER: 1.000000, loss: 0.714687, mean edit distance: 0.250000
I  - src: "werk"
I  - res: "wer "
I --------------------------------------------------------------------------------
I WER: 1.000000, loss: 0.851070, mean edit distance: 0.200000
I  - src: "texte"
I  - res: "text "
I --------------------------------------------------------------------------------
I WER: 1.000000, loss: 0.912912, mean edit distance: 0.600000
I  - src: "misst"
I  - res: "mit "
I --------------------------------------------------------------------------------
I WER: 2.000000, loss: 0.898193, mean edit distance: 0.250000
I  - src: "beilagen"
I  - res: "bei lagen "
I --------------------------------------------------------------------------------

deepspeech-german's People

Contributors

ynop avatar adhnguyen avatar sebastiandeutsch avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.