robmsmt / kerasdeepspeech Goto Github PK

View Code? Open in Web Editor NEW

240.0 18.0 79.0 153.14 MB

A Keras CTC implementation of Baidu's DeepSpeech for model experimentation

License: GNU Affero General Public License v3.0

Python 82.23% Jupyter Notebook 17.70% Shell 0.07%

keras deepspeech asr ctc coreml speechrecognition speech-to-text deep-learning machine-learning neural-networks

kerasdeepspeech's Issues

how can i build LM from my text data

in my lenguage i cant find the LM (vietnamese) so how to create it for train model... i have 70000 sentences

Must I download that LM model ?

Must I download that LM model ?
How to build LM from scratch ?

How to adjust the timesteps

Hello, thank you for sharing the great project.

I want to adjust the timesteps in ownModel. However, I can't find where should be adjusted?
In, def ownModel(), it has a

input_data = Input(name='_the_input', shape=(None,input_dim))
...
x = TimeDistributed(Dense(fc_size...))(x)

Where is the def. of timesteps? Thank a lot!

WER

hi,I want to know why the WER is 0.8? I use the default parameters using TIMIT,shave I do something wrong???

Language model for bangla

I want to implement it for both bangla isolated and continuous speech. Where i can find language model for banlga if not available how can i make language model?

Why for loop adding only one Conv1D layer in ds2_gru_model

Hello @robmsmt,

I'm working with your repo. In your model.py file bellow code should three layer but this add just only one Conv1D Layer

conv = ZeroPadding1D(padding=(0, 2048))(x)
for l in range(conv_layers):
  x = Conv1D(filters=fc_size, name='conv_{}'.format(l+1), kernel_size=11, padding='valid', activation='relu', strides=2)(conv)

This is the model summary I get,

Model: "model_1"
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
==================================================================================================
the_input (InputLayer)          (None, None, 161)    0                                            
__________________________________________________________________________________________________
batch_normalization_1 (BatchNor (None, None, 161)    644         the_input[0][0]                  
__________________________________________________________________________________________________
zero_padding1d_1 (ZeroPadding1D (None, None, 161)    0           batch_normalization_1[0][0]      
__________________________________________________________________________________________________
conv_3 (Conv1D)                 (None, None, 512)    907264      zero_padding1d_1[0][0]           
__________________________________________________________________________________________________
batch_normalization_2 (BatchNor (None, None, 512)    2048        conv_3[0][0]                     
__________________________________________________________________________________________________
bidirectional_1 (Bidirectional) (None, None, 1024)   9443328     batch_normalization_2[0][0]      
__________________________________________________________________________________________________
bidirectional_2 (Bidirectional) (None, None, 1024)   12589056    bidirectional_1[0][0]            
__________________________________________________________________________________________________
bidirectional_3 (Bidirectional) (None, None, 1024)   12589056    bidirectional_2[0][0]            
__________________________________________________________________________________________________
batch_normalization_3 (BatchNor (None, None, 1024)   4096        bidirectional_3[0][0]            
__________________________________________________________________________________________________
time_distributed_1 (TimeDistrib (None, None, 512)    524800      batch_normalization_3[0][0]      
__________________________________________________________________________________________________
time_distributed_2 (TimeDistrib (None, None, 1102)   565326      time_distributed_1[0][0]         
__________________________________________________________________________________________________
the_labels (InputLayer)         (None, None)         0                                            
__________________________________________________________________________________________________
input_length (InputLayer)       (None, 1)            0                                            
__________________________________________________________________________________________________
label_length (InputLayer)       (None, 1)            0                                            
__________________________________________________________________________________________________
ctc (Lambda)                    (None, 1)            0           time_distributed_2[0][0]         
                                                                 the_labels[0][0]                 
                                                                 input_length[0][0]               
                                                                 label_length[0][0]               
==================================================================================================
Total params: 36,625,618
Trainable params: 36,622,224
Non-trainable params: 3,394
__________________________________________________________________________________________________

What could be the possible reason.
Thanks in advance

Padding character: #27 or 28?

KerasDeepSpeech/generator.py

Line 215 in 5536388

t.append(27) # replace with a space char to pad

In generator.py, get_intseq(), the padding is done with character 27. In the char map, it stands for an apostrophe, not the extra 28th padding character. In utils.py, int_to_text_sequence, a character 28 is mentioned as the one for padding. Is that intended?

Only 1 conv layer where supposed to be many

At model.py at line 241 you have code like:
if use_conv:
conv = ZeroPadding1D(padding=(0, 2048))(x)
for l in range(conv_layers):
x = Conv1D(filters=fc_size, name='conv_{}'.format(l+1), kernel_size=11, padding='valid', activation='relu', strides=2)(conv)

There must be something like:
if use_conv:
conv = ZeroPadding1D(padding=(0, 2048))(x)
x = Conv1D(filters=fc_size, name='conv_{}'.format(1), kernel_size=11, padding='valid', activation='relu', strides=2)(conv)
for l in range(1, conv_layers):
x = Conv1D(filters=fc_size, name='conv_{}'.format(l+1), kernel_size=11, padding='valid', activation='relu', strides=2)(x)

could you give some examples about the shape in below?

3. input_length (required for CTC loss)

    # this is the time dimension of CTC (batch x time x mfcc)
    #input_length = np.array([get_xsize(mfcc) for mfcc in X_data])
    input_length = np.array(x_val)
    # print("3. input_length shape:", input_length.shape)   
    # print("3. input_length =", input_length)
    assert(input_length.shape == (self.batch_size,))

    # 4. label_length (required for CTC loss)
    # this is the length of the number of label of a sequence
    #label_length = np.array([len(l) for l in labels])
    label_length = np.array(y_val)
    # print("4. label_length shape:", label_length.shape)
    # print("4. label_length =", label_length)
    assert(label_length.shape == (self.batch_size,))

hi, I want to make a ctc demo, I do not know the "label_length.shape" and "input_length.shape", how to calculate them ? and what means them ? thanks you.

Accuracy of `model_arch==3` i.e. `own_model`

Is there any result on any dataset for your own model i.e. model_arch == 3?
Secondly, If I select model_acrh == 3. The console prints it as DS3. I dont suppose it this model, or is it?
Thanks in advance.

How to make it work with Python 3?

I actually need to know why Core ML won't work with Python 3.

link is broken!

" live recording and testing of speech and quickly creates customised datasets using own-voice dataset creation scripts!"

live recoding and testing of speech link and using own-voice dataset creation script is broken.
Could you re-link those?

Input dimension mismatch error on training the model ds1 dropout

This is the error on running model.fit_generator with Keras(2.0.9) ,Theano(0.9.0) as backend and python version 3

ValueError: Input dimension mis-match. (input[0].shape[1] = 1, input[1].shape[1] = 16)
Apply node that caused the error: Elemwise{eq,no_inplace}(training/ctc_target, Elemwise{round_half_to_even,no_inplace}.0)
Toposort index: 762 Inputs types: [TensorType(float64, matrix), TensorType(float64, row)]
Inputs shapes: [(16, 1), (1, 16)]
Inputs strides: [(8, 8), (128, 8)]
Inputs values: ['not shown', 'not shown']
Outputs clients: [[Sum{acc_dtype=int64}(Elemwise{eq,no_inplace}.0)]]

Thank you for your response.

About Lookahead Convolution!

Could you tell me how design lookahead convolution with keras. I have designed it but it doesn't work. Thanks!Thanks!Thanks!

robmsmt / kerasdeepspeech Goto Github PK

kerasdeepspeech's Issues

3. input_length (required for CTC loss)

Recommend Projects

Recommend Topics

Recommend Org

Jobs