GithubHelp home page GithubHelp logo

noahchalifour / baidu-deepspeech2 Goto Github PK

View Code? Open in Web Editor NEW
18.0 5.0 4.0 146 KB

A Tensorflow implementation of Baidu's Deep Speech 2 paper

License: MIT License

Python 100.00%
python deep-learning tensorflow deepspeech2 deepspeech machine-learning speech speech-recognition

baidu-deepspeech2's Introduction

Baidu's Deep Speech 2 (Tensorflow)

(This is a work in progress)

This is a python implementation of Baidu's Deep Speech 2 paper https://arxiv.org/pdf/1512.02595.pdf using tensorflow

TODO:

  • Fix GPU memory
  • Add batch normalization to RNN
  • Implement row convolution layer
  • Add other dataset support
  • Create pretrained models

Preprocessing

To preprocess your data you must first download the one of the datasets above and extract them to a folder. Then run the following script to preprocess the data (This might take a while depending on the amount of data you have)

python preprocess.py --data-dir=<your data directory> --dataset=<dataset name>

Training

Now that you have preprocessed your data, you can train a model. To do this, you can edit the settings in the config.py file if you want. Then run the following command to train the model:

python train.py

Testing your model

Now that you have trained a model, you can go ahead and start using it. We have created two scripts that can help you do this infer.py and streaming_infer.py. The infer.py script, transcribes a audio file that you give it

python infer.py -f <your audio file name>

The streaming_infer.py script uses PyAudio to record audio from your computer's microphone and transcribes it in real-time. To run it simply:

python streaming_infer.py

baidu-deepspeech2's People

Contributors

nchalifo avatar noahchalifour avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

baidu-deepspeech2's Issues

How to predict unlabeling test data?

Hi. I'm Studying Speech Recognition. I have some question.
This model takes 4 inputs, below code.

self.model = tf.keras.Model(inputs=[input_data, labels, input_length, label_length], outputs=[loss_out])

If this model predicts test case that is unlabeled, i can't give 'labels', 'label_length' as inputs.
How can i do?

Contact info

Hi Noah,

I am an undergraduate student and I am using your project as a guideline for creating a similar speech recognition system. I have a few questions, however, I had not been able to contact you. Please contact me on my email: [email protected] or on facebook - Koko Tonev. I found your profile on Facebook and messaged you but I got no response and I have some problems with my system that I don't understand... Many thanks in advance!

Kind regards,
Koko Tonev

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.