GithubHelp home page GithubHelp logo

robmsmt / kerasdeepspeech Goto Github PK

View Code? Open in Web Editor NEW
240.0 18.0 79.0 153.14 MB

A Keras CTC implementation of Baidu's DeepSpeech for model experimentation

License: GNU Affero General Public License v3.0

Python 82.23% Jupyter Notebook 17.70% Shell 0.07%
keras deepspeech asr ctc coreml speechrecognition speech-to-text deep-learning machine-learning neural-networks

kerasdeepspeech's Introduction

Keras DeepSpeech

Build Status

Repository for experimenting with different CTC based model designs for ASR. Supports live recording and testing of speech and quickly creates customised datasets using own-voice dataset creation scripts!

OVERVIEW

SETUP

  1. Recommended > use virtualenv installed with python2.7 (3.x untested and will not work with Core ML)
  2. git clone https://github.com/robmsmt/KerasDeepSpeech
  3. pip install -r requirements.txt
  4. Get the data using the import/download scripts in the data folder, LibriSpeech is a good example.
  5. Download the language model (large file) run ./lm/get_lm.sh

RUN

  1. To Train, simply run python run-train.py In order to specify training/validation files use python run-train.py --train_files <csvfile> --valid_files <csvfile> (see run-train for complete arguments list)
  2. To Test, run python run-test.py --test_files <datacsvfile>

CREDIT

  1. Mozilla DeepSpeech
  2. Baidu DS1 & DS2 papers

Licence

The content of this project itself is licensed under the GNU General Public License. Copyright ยฉ 2018

Contributing

Have a question? Like the tool? Don't like it? Open an issue and let's talk about it! Pull requests are appreciated!

kerasdeepspeech's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

kerasdeepspeech's Issues

Input dimension mismatch error on training the model ds1 dropout

This is the error on running model.fit_generator with Keras(2.0.9) ,Theano(0.9.0) as backend and python version 3

ValueError: Input dimension mis-match. (input[0].shape[1] = 1, input[1].shape[1] = 16)
Apply node that caused the error: Elemwise{eq,no_inplace}(training/ctc_target, Elemwise{round_half_to_even,no_inplace}.0)
Toposort index: 762 Inputs types: [TensorType(float64, matrix), TensorType(float64, row)]
Inputs shapes: [(16, 1), (1, 16)]
Inputs strides: [(8, 8), (128, 8)]
Inputs values: ['not shown', 'not shown']
Outputs clients: [[Sum{acc_dtype=int64}(Elemwise{eq,no_inplace}.0)]]

WER

hi,I want to know why the WER is 0.8? I use the default parameters using TIMIT,shave I do something wrong???

How to adjust the timesteps

Hello, thank you for sharing the great project.

I want to adjust the timesteps in ownModel. However, I can't find where should be adjusted?
In, def ownModel(), it has a

input_data = Input(name='_the_input', shape=(None,input_dim))
...
x = TimeDistributed(Dense(fc_size...))(x)

Where is the def. of timesteps? Thank a lot!

link is broken!

" live recording and testing of speech and quickly creates customised datasets using own-voice dataset creation scripts!"

live recoding and testing of speech link and using own-voice dataset creation script is broken.
Could you re-link those?

ZeroPadding1D at the ds2_gru_model

Hi, I don't understand, because you use ZeroPadding1D in this model. you are adding 2048 zeros in the second shape dimension.
example: when input shape is: (1,280,161) after pass in the ZeroPadding1D layer the output is (1,2328,161).

Do you want to keep the sequence fixed in 2048?
if yes, is it necessary to calculate the number of zeros required for each of the inputs?

Thank you for your response.

Only 1 conv layer where supposed to be many

At model.py at line 241 you have code like:
if use_conv:
conv = ZeroPadding1D(padding=(0, 2048))(x)
for l in range(conv_layers):
x = Conv1D(filters=fc_size, name='conv_{}'.format(l+1), kernel_size=11, padding='valid', activation='relu', strides=2)(conv)

There must be something like:
if use_conv:
conv = ZeroPadding1D(padding=(0, 2048))(x)
x = Conv1D(filters=fc_size, name='conv_{}'.format(1), kernel_size=11, padding='valid', activation='relu', strides=2)(conv)
for l in range(1, conv_layers):
x = Conv1D(filters=fc_size, name='conv_{}'.format(l+1), kernel_size=11, padding='valid', activation='relu', strides=2)(x)

Padding character: #27 or 28?

t.append(27) # replace with a space char to pad

In generator.py, get_intseq(), the padding is done with character 27. In the char map, it stands for an apostrophe, not the extra 28th padding character. In utils.py, int_to_text_sequence, a character 28 is mentioned as the one for padding. Is that intended?

Accuracy of `model_arch==3` i.e. `own_model`

  1. Is there any result on any dataset for your own model i.e. model_arch == 3?
  2. Secondly, If I select model_acrh == 3. The console prints it as DS3. I dont suppose it this model, or is it?
    Thanks in advance.

Language model for bangla

I want to implement it for both bangla isolated and continuous speech. Where i can find language model for banlga if not available how can i make language model?

could you give some examples about the shape in below?

3. input_length (required for CTC loss)

    # this is the time dimension of CTC (batch x time x mfcc)
    #input_length = np.array([get_xsize(mfcc) for mfcc in X_data])
    input_length = np.array(x_val)
    # print("3. input_length shape:", input_length.shape)   
    # print("3. input_length =", input_length)
    assert(input_length.shape == (self.batch_size,))

    # 4. label_length (required for CTC loss)
    # this is the length of the number of label of a sequence
    #label_length = np.array([len(l) for l in labels])
    label_length = np.array(y_val)
    # print("4. label_length shape:", label_length.shape)
    # print("4. label_length =", label_length)
    assert(label_length.shape == (self.batch_size,))

hi, I want to make a ctc demo, I do not know the "label_length.shape" and "input_length.shape", how to calculate them ? and what means them ? thanks you.

About Lookahead Convolution!

Could you tell me how design lookahead convolution with keras. I have designed it but it doesn't work. Thanks!Thanks!Thanks!

Why for loop adding only one Conv1D layer in ds2_gru_model

Hello @robmsmt,

I'm working with your repo. In your model.py file bellow code should three layer but this add just only one Conv1D Layer

conv = ZeroPadding1D(padding=(0, 2048))(x)
for l in range(conv_layers):
  x = Conv1D(filters=fc_size, name='conv_{}'.format(l+1), kernel_size=11, padding='valid', activation='relu', strides=2)(conv)

This is the model summary I get,

Model: "model_1"
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
==================================================================================================
the_input (InputLayer)          (None, None, 161)    0                                            
__________________________________________________________________________________________________
batch_normalization_1 (BatchNor (None, None, 161)    644         the_input[0][0]                  
__________________________________________________________________________________________________
zero_padding1d_1 (ZeroPadding1D (None, None, 161)    0           batch_normalization_1[0][0]      
__________________________________________________________________________________________________
conv_3 (Conv1D)                 (None, None, 512)    907264      zero_padding1d_1[0][0]           
__________________________________________________________________________________________________
batch_normalization_2 (BatchNor (None, None, 512)    2048        conv_3[0][0]                     
__________________________________________________________________________________________________
bidirectional_1 (Bidirectional) (None, None, 1024)   9443328     batch_normalization_2[0][0]      
__________________________________________________________________________________________________
bidirectional_2 (Bidirectional) (None, None, 1024)   12589056    bidirectional_1[0][0]            
__________________________________________________________________________________________________
bidirectional_3 (Bidirectional) (None, None, 1024)   12589056    bidirectional_2[0][0]            
__________________________________________________________________________________________________
batch_normalization_3 (BatchNor (None, None, 1024)   4096        bidirectional_3[0][0]            
__________________________________________________________________________________________________
time_distributed_1 (TimeDistrib (None, None, 512)    524800      batch_normalization_3[0][0]      
__________________________________________________________________________________________________
time_distributed_2 (TimeDistrib (None, None, 1102)   565326      time_distributed_1[0][0]         
__________________________________________________________________________________________________
the_labels (InputLayer)         (None, None)         0                                            
__________________________________________________________________________________________________
input_length (InputLayer)       (None, 1)            0                                            
__________________________________________________________________________________________________
label_length (InputLayer)       (None, 1)            0                                            
__________________________________________________________________________________________________
ctc (Lambda)                    (None, 1)            0           time_distributed_2[0][0]         
                                                                 the_labels[0][0]                 
                                                                 input_length[0][0]               
                                                                 label_length[0][0]               
==================================================================================================
Total params: 36,625,618
Trainable params: 36,622,224
Non-trainable params: 3,394
__________________________________________________________________________________________________

What could be the possible reason.
Thanks in advance

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.