Light

noahchalifour / baidu-deepspeech2 Goto Github PK

View Code? Open in Web Editor NEW

18.0 5.0 4.0 146 KB

A Tensorflow implementation of Baidu's Deep Speech 2 paper

License: MIT License

Python 100.00%

python deep-learning tensorflow deepspeech2 deepspeech machine-learning speech speech-recognition

baidu-deepspeech2's Introduction

Baidu's Deep Speech 2 (Tensorflow)

(This is a work in progress)

This is a python implementation of Baidu's Deep Speech 2 paper https://arxiv.org/pdf/1512.02595.pdf using tensorflow

TODO:

Fix GPU memory
Add batch normalization to RNN
Implement row convolution layer
Add other dataset support
Create pretrained models

Preprocessing

To preprocess your data you must first download the one of the datasets above and extract them to a folder. Then run the following script to preprocess the data (This might take a while depending on the amount of data you have)

python preprocess.py --data-dir=<your data directory> --dataset=<dataset name>

Training

Now that you have preprocessed your data, you can train a model. To do this, you can edit the settings in the config.py file if you want. Then run the following command to train the model:

python train.py

Testing your model

Now that you have trained a model, you can go ahead and start using it. We have created two scripts that can help you do this infer.py and streaming_infer.py. The infer.py script, transcribes a audio file that you give it

python infer.py -f <your audio file name>

The streaming_infer.py script uses PyAudio to record audio from your computer's microphone and transcribes it in real-time. To run it simply:

python streaming_infer.py

baidu-deepspeech2's People

Contributors

Stargazers

Watchers

Forkers

shamoons wahyubram82 expressgit artificialnouveau

baidu-deepspeech2's Issues

Why log_linear_specgram?

The paper does not make reference to a log_linear spectrogram.

How to predict unlabeling test data?

Hi. I'm Studying Speech Recognition. I have some question.
This model takes 4 inputs, below code.

self.model = tf.keras.Model(inputs=[input_data, labels, input_length, label_length], outputs=[loss_out])

If this model predicts test case that is unlabeled, i can't give 'labels', 'label_length' as inputs.
How can i do?

Using pretrained model without downloading the data

Is it possible to directly use the pre-trained model without downloading the whole 20-30 GBs of data?

What accuracy are you able to attain with this?

And how does it compare to the reference from the paper?

How to evaluate and test this? Infer.py and stream_infer.py files not available

Hi, I am studying ASR and its implementation. Here I have done upto training part, now want to evaluate and test the model. How can I do this, as there is no evaluate and testing file is available..
Thanks.

only one epoch is ok

Contact info

Hi Noah,

I am an undergraduate student and I am using your project as a guideline for creating a similar speech recognition system. I have a few questions, however, I had not been able to contact you. Please contact me on my email: [email protected] or on facebook - Koko Tonev. I found your profile on Facebook and messaged you but I got no response and I have some problems with my system that I don't understand... Many thanks in advance!

Kind regards,
Koko Tonev

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.

Jobs