robmsmt / kerasdeepspeech Goto Github PK
View Code? Open in Web Editor NEWA Keras CTC implementation of Baidu's DeepSpeech for model experimentation
License: GNU Affero General Public License v3.0
A Keras CTC implementation of Baidu's DeepSpeech for model experimentation
License: GNU Affero General Public License v3.0
in my lenguage i cant find the LM (vietnamese) so how to create it for train model... i have 70000 sentences
Must I download that LM model ?
How to build LM from scratch ?
Hello, thank you for sharing the great project.
I want to adjust the timesteps in ownModel. However, I can't find where should be adjusted?
In, def ownModel(), it has a
input_data = Input(name='_the_input', shape=(None,input_dim))
...
x = TimeDistributed(Dense(fc_size...))(x)
Where is the def. of timesteps? Thank a lot!
hi,I want to know why the WER is 0.8? I use the default parameters using TIMIT,shave I do something wrong???
I want to implement it for both bangla isolated and continuous speech. Where i can find language model for banlga if not available how can i make language model?
Hello @robmsmt,
I'm working with your repo. In your model.py
file bellow code should three layer but this add just only one Conv1D
Layer
conv = ZeroPadding1D(padding=(0, 2048))(x)
for l in range(conv_layers):
x = Conv1D(filters=fc_size, name='conv_{}'.format(l+1), kernel_size=11, padding='valid', activation='relu', strides=2)(conv)
This is the model summary I get,
Model: "model_1"
__________________________________________________________________________________________________
Layer (type) Output Shape Param # Connected to
==================================================================================================
the_input (InputLayer) (None, None, 161) 0
__________________________________________________________________________________________________
batch_normalization_1 (BatchNor (None, None, 161) 644 the_input[0][0]
__________________________________________________________________________________________________
zero_padding1d_1 (ZeroPadding1D (None, None, 161) 0 batch_normalization_1[0][0]
__________________________________________________________________________________________________
conv_3 (Conv1D) (None, None, 512) 907264 zero_padding1d_1[0][0]
__________________________________________________________________________________________________
batch_normalization_2 (BatchNor (None, None, 512) 2048 conv_3[0][0]
__________________________________________________________________________________________________
bidirectional_1 (Bidirectional) (None, None, 1024) 9443328 batch_normalization_2[0][0]
__________________________________________________________________________________________________
bidirectional_2 (Bidirectional) (None, None, 1024) 12589056 bidirectional_1[0][0]
__________________________________________________________________________________________________
bidirectional_3 (Bidirectional) (None, None, 1024) 12589056 bidirectional_2[0][0]
__________________________________________________________________________________________________
batch_normalization_3 (BatchNor (None, None, 1024) 4096 bidirectional_3[0][0]
__________________________________________________________________________________________________
time_distributed_1 (TimeDistrib (None, None, 512) 524800 batch_normalization_3[0][0]
__________________________________________________________________________________________________
time_distributed_2 (TimeDistrib (None, None, 1102) 565326 time_distributed_1[0][0]
__________________________________________________________________________________________________
the_labels (InputLayer) (None, None) 0
__________________________________________________________________________________________________
input_length (InputLayer) (None, 1) 0
__________________________________________________________________________________________________
label_length (InputLayer) (None, 1) 0
__________________________________________________________________________________________________
ctc (Lambda) (None, 1) 0 time_distributed_2[0][0]
the_labels[0][0]
input_length[0][0]
label_length[0][0]
==================================================================================================
Total params: 36,625,618
Trainable params: 36,622,224
Non-trainable params: 3,394
__________________________________________________________________________________________________
What could be the possible reason.
Thanks in advance
Line 215 in 5536388
In generator.py, get_intseq(), the padding is done with character 27. In the char map, it stands for an apostrophe, not the extra 28th padding character. In utils.py, int_to_text_sequence, a character 28 is mentioned as the one for padding. Is that intended?
At model.py at line 241 you have code like:
if use_conv:
conv = ZeroPadding1D(padding=(0, 2048))(x)
for l in range(conv_layers):
x = Conv1D(filters=fc_size, name='conv_{}'.format(l+1), kernel_size=11, padding='valid', activation='relu', strides=2)(conv)
There must be something like:
if use_conv:
conv = ZeroPadding1D(padding=(0, 2048))(x)
x = Conv1D(filters=fc_size, name='conv_{}'.format(1), kernel_size=11, padding='valid', activation='relu', strides=2)(conv)
for l in range(1, conv_layers):
x = Conv1D(filters=fc_size, name='conv_{}'.format(l+1), kernel_size=11, padding='valid', activation='relu', strides=2)(x)
# this is the time dimension of CTC (batch x time x mfcc)
#input_length = np.array([get_xsize(mfcc) for mfcc in X_data])
input_length = np.array(x_val)
# print("3. input_length shape:", input_length.shape)
# print("3. input_length =", input_length)
assert(input_length.shape == (self.batch_size,))
# 4. label_length (required for CTC loss)
# this is the length of the number of label of a sequence
#label_length = np.array([len(l) for l in labels])
label_length = np.array(y_val)
# print("4. label_length shape:", label_length.shape)
# print("4. label_length =", label_length)
assert(label_length.shape == (self.batch_size,))
hi, I want to make a ctc demo, I do not know the "label_length.shape" and "input_length.shape", how to calculate them ? and what means them ? thanks you.
model_arch == 3
?I actually need to know why Core ML won't work with Python 3.
" live recording and testing of speech and quickly creates customised datasets using own-voice dataset creation scripts!"
live recoding and testing of speech link and using own-voice dataset creation script is broken.
Could you re-link those?
This is the error on running model.fit_generator with Keras(2.0.9) ,Theano(0.9.0) as backend and python version 3
ValueError: Input dimension mis-match. (input[0].shape[1] = 1, input[1].shape[1] = 16)
Apply node that caused the error: Elemwise{eq,no_inplace}(training/ctc_target, Elemwise{round_half_to_even,no_inplace}.0)
Toposort index: 762 Inputs types: [TensorType(float64, matrix), TensorType(float64, row)]
Inputs shapes: [(16, 1), (1, 16)]
Inputs strides: [(8, 8), (128, 8)]
Inputs values: ['not shown', 'not shown']
Outputs clients: [[Sum{acc_dtype=int64}(Elemwise{eq,no_inplace}.0)]]
I very confident of that I have configured the GPU, and train an example on the GPU. But, it seems to don't work on the "run-train.py"ใ
Executing a below command to download/Import librispeech throwing an error,
python data/import_librispeech.py path_of_the_file
Doing Pip install of Sox and progressbar2 solved the issue.
Hi, I don't understand, because you use ZeroPadding1D in this model. you are adding 2048 zeros in the second shape dimension.
example: when input shape is: (1,280,161) after pass in the ZeroPadding1D layer the output is (1,2328,161).
Do you want to keep the sequence fixed in 2048?
if yes, is it necessary to calculate the number of zeros required for each of the inputs?
Thank you for your response.
Could you tell me how design lookahead convolution with keras. I have designed it but it doesn't work. Thanks!Thanks!Thanks!
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.