astorfi / 3d-convolutional-speaker-recognition Goto Github PK
View Code? Open in Web Editor NEW:speaker: Deep Learning & 3D Convolutional Neural Networks for Speaker Verification
License: Apache License 2.0
:speaker: Deep Learning & 3D Convolutional Neural Networks for Speaker Verification
License: Apache License 2.0
Running just the train_softmax.py
command in the example run.sh
script with the sample data doesn't seem to converge, even at 50 epochs.
Command:
python -u ./code/1-development/train_softmax.py --num_epochs=50 --batch_size=3 --development_dataset_path=data/development_sample_dataset_speaker.hdf5 --train_dir=results/TRAIN_CNN_3D/train_logs
Output:
Loss:
Learning rate:
I have idea about the speaker identification model using CNN.
But here my question is how to make a verification model using the data that contains only positive value.
Suppose, i have voice data of my voice only and i want to create the verification model from this data such that when i run the model then it will only recognize me not anyone else.
please provide the code for it.
How to record my voice as input with a identification and verify it by another input. It will recognise the speaker or not and what is the work flow?
Is Speaker recognition possible with this framework? I want to store the input voice pattern as part of enrolment process. then want to verify the input voice and find out who is currently speaking.
Is it possible?
Traceback (most recent call last):
File "./code/1-development/train_softmax.py", line 602, in
tf.app.run()
File "/opt/tensorflow/python2.7/local/lib/python2.7/site-packages/tensorflow/python/platform/app.py", line 125, in run
_sys.exit(main(argv))
File "./code/1-development/train_softmax.py", line 414, in main
logits, end_points_speech = model_speech_fn(batch_speech[i * step: (i + 1) * step])
File "/opt/speaker-recognition/code/1-development/nets/nets_factory.py", line 59, in network_fn
return func(images, num_classes, is_training=is_training)
File "/opt/speaker-recognition/code/1-development/nets/cnn_speech.py", line 118, in speech_cnn
net = slim.conv2d(inputs, 16, [3, 1, 5], stride=[1, 1, 1], scope='conv11')
File "/opt/tensorflow/python2.7/local/lib/python2.7/site-packages/tensorflow/contrib/framework/python/ops/arg_scope.py", line 183, in func_with_args
return func(*args, **current_args)
File "/opt/tensorflow/python2.7/local/lib/python2.7/site-packages/tensorflow/contrib/layers/python/layers/layers.py", line 1154, in convolution2d
conv_dims=2)
File "/opt/tensorflow/python2.7/local/lib/python2.7/site-packages/tensorflow/contrib/framework/python/ops/arg_scope.py", line 183, in func_with_args
return func(*args, **current_args)
File "/opt/tensorflow/python2.7/local/lib/python2.7/site-packages/tensorflow/contrib/layers/python/layers/layers.py", line 1025, in convolution
(conv_dims + 2, input_rank))
ValueError: Convolution expects input with rank 4, got 5
Hello,
I have a dataset of voices. I want to generate development and enrollment hdf5 file.
The input_feature.py file seams to generate development files (nx80x40x20). How can I generate the enrollment file?
Hi @astorfi
I have some questions about input dataset.
According to the paper, the number of speakers is 511 in the development phase.
But how long is the input audio file per speaker ??
Although there is the function of CMVN preprocessing in input_feature.py, I'm not sure whether CMVN preprocessing is appropriate for the output of speechpy.feature.lmfe function.
Did you use CMVN preprocessing in the experiment of the paper??
Thank you for your work!!
Hi,
I think train_softmax.py, enrollment.py and evaluation.py get their inputs from the hdf5 files stored in the data folder.
I also think that input_feature.py is supposed to store it's results in these hdf5 files.
But I am not able to figure out which part of the input_feature.py code is responsible for writing the results into the hdf5 files.
Can someone please help me out with this?
Thanks
Hi astorfi,
I'm trying to train my own data set on your model. Is there also an update for the development files to feed dataset from input_features.py? It looks like train_softmax.py still takes in an hdf5 file.
Thanks,
Lucas
Hello,
Thank you for a wonderful work in speaker verification
I am trying to execute the code and its giving me the following error.
RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
Is it because of the different versioning of numpy and scipy? If yes then what are the version which you have used while training?
Thank you for your help !
Hi astorfi:
i try to run the program but i can't find a python script to create enrollment hdf5 file
in code folder only exist create development hdf5 python script.
how to create enrollment hdf5 file?
Mark
When I run the run.sh
, it shows something wrong:
/home/jovyan/Documents/git/3D-convolutional-speaker-recognition/speacker-rec-py35/lib/python3.5/importlib/_bootstrap.py:222: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
return f(*args, **kwds)
/home/jovyan/Documents/git/3D-convolutional-speaker-recognition/speacker-rec-py35/lib/python3.5/importlib/_bootstrap.py:222: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88 return f(*args, **kwds)
/home/jovyan/Documents/git/3D-convolutional-speaker-recognition/speacker-rec-py35/lib/python3.5/importlib/_bootstrap.py:222: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
return f(*args, **kwds)/home/jovyan/Documents/git/3D-convolutional-speaker-recognition/speacker-rec-py35/lib/python3.5/importlib/_bootstrap.py:222: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
return f(*args, **kwds)/home/jovyan/Documents/git/3D-convolutional-speaker-recognition/speacker-rec-py35/lib/python3.5/site-packages/sklearn/cross_validation.py:41: DeprecationWarning: This module was deprecated in version 0.18 in favor of the model_selection module into which all the refactored classes and functions are moved. Also note that the interface of the new CV iterators are different from that of this module. This module will be removed in 0.20.
"This module will be removed in 0.20.", DeprecationWarning)
/home/jovyan/Documents/git/3D-convolutional-speaker-recognition/speacker-rec-py35/lib/python3.5/site-packages/sklearn/ensemble/weight_boosting.py:29: DeprecationWarning: numpy.core.umath_tests is an internal NumPy module and should not be imported. It will be removed in a future NumPy release.
from numpy.core.umath_tests import inner1d/home/jovyan/Documents/git/3D-convolutional-speaker-recognition/speacker-rec-py35/lib/python3.5/site-packages/sklearn/grid_search.py:42: DeprecationWarning: This module was deprecated in version 0.18 in favor of the model_selection module into which all the refactored classes and functions are moved. This module will be removed in 0.20.
DeprecationWarning)
/home/jovyan/Documents/git/3D-convolutional-speaker-recognition/speacker-rec-py35/lib/python3.5/site-packages/sklearn/learning_curve.py:22: DeprecationWarning: This module was deprecated in version 0.18 in favor of the model_selection module into which all the functions are moved. This module will be removed in 0.20
DeprecationWarning)
Train data shape: (12, 80, 40, 20)Train label shape: (12,)Test data shape: (12, 80, 40, 20)
Test label shape: (12,)
Traceback (most recent call last): File "./code/1-development/train_softmax.py", line 602, in <module> tf.app.run()
File "/home/jovyan/Documents/git/3D-convolutional-speaker-recognition/speacker-rec-py35/lib/python3.5/site-packages/tensorflow/python/platform/app.py", line 125, in run
_sys.exit(main(argv))
File "./code/1-development/train_softmax.py", line 414, in main
logits, end_points_speech = model_speech_fn(batch_speech[i * step: (i + 1) * step]) File "/home/jovyan/Documents/git/3D-convolutional-speaker-recognition/code/1-development/nets/nets_factory.py", line 59, in network_fn
return func(images, num_classes, is_training=is_training)
File "/home/jovyan/Documents/git/3D-convolutional-speaker-recognition/code/1-development/nets/cnn_speech.py", line 118, in speech_cnn
net = slim.conv2d(inputs, 16, [3, 1, 5], stride=[1, 1, 1], scope='conv11')
File "/home/jovyan/Documents/git/3D-convolutional-speaker-recognition/speacker-rec-py35/lib/python3.5/site-packages/tensorflow/contrib/framework/python/ops/arg_scope.py", line 183, in func_with_args
return func(*args, **current_args)
File "/home/jovyan/Documents/git/3D-convolutional-speaker-recognition/speacker-rec-py35/lib/python3.5/site-packages/tensorflow/contrib/layers/python/layers/layers.py", line 1154, in convolution2d
conv_dims=2)
File "/home/jovyan/Documents/git/3D-convolutional-speaker-recognition/speacker-rec-py35/lib/python3.5/site-packages/tensorflow/contrib/framework/python/ops/arg_scope.py", line 183, in func_with_args
return func(*args, **current_args)
File "/home/jovyan/Documents/git/3D-convolutional-speaker-recognition/speacker-rec-py35/lib/python3.5/site-packages/tensorflow/contrib/layers/python/layers/layers.py", line 1025, in convolution
(conv_dims + 2, input_rank))
ValueError: Convolution expects input with rank 4, got 5
Closing remaining open files:data/development_sample_dataset_speaker.hdf5...done
/home/jovyan/Documents/git/3D-convolutional-speaker-recognition/speacker-rec-py35/lib/python3.5/importlib/_bootstrap.py:222: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
return f(*args, **kwds)
/home/jovyan/Documents/git/3D-convolutional-speaker-recognition/speacker-rec-py35/lib/python3.5/importlib/_bootstrap.py:222: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
return f(*args, **kwds)
/home/jovyan/Documents/git/3D-convolutional-speaker-recognition/speacker-rec-py35/lib/python3.5/importlib/_bootstrap.py:222: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
return f(*args, **kwds)
/home/jovyan/Documents/git/3D-convolutional-speaker-recognition/speacker-rec-py35/lib/python3.5/importlib/_bootstrap.py:222: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
return f(*args, **kwds)
Enrollment data shape: (108, 80, 40, 1)
Enrollment label shape: (108,)
Evaluation data shape: (12, 80, 40, 1)
Evaluation label shape: (12,)
Traceback (most recent call last):
File "./code/2-enrollment/enrollment.py", line 330, in <module>
tf.app.run()
File "/home/jovyan/Documents/git/3D-convolutional-speaker-recognition/speacker-rec-py35/lib/python3.5/site-packages/tensorflow/python/platform/app.py", line 125, in run
_sys.exit(main(argv))
File "./code/2-enrollment/enrollment.py", line 201, in main
for i in xrange(FLAGS.num_clones):
NameError: name 'xrange' is not defined
Closing remaining open files:data/development_sample_dataset_speaker.hdf5...donedata/enrollment-evaluation_sample_dataset.hdf5...done
/home/jovyan/Documents/git/3D-convolutional-speaker-recognition/speacker-rec-py35/lib/python3.5/importlib/_bootstrap.py:222: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
return f(*args, **kwds)
/home/jovyan/Documents/git/3D-convolutional-speaker-recognition/speacker-rec-py35/lib/python3.5/importlib/_bootstrap.py:222: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
return f(*args, **kwds)
/home/jovyan/Documents/git/3D-convolutional-speaker-recognition/speacker-rec-py35/lib/python3.5/importlib/_bootstrap.py:222: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
return f(*args, **kwds)
/home/jovyan/Documents/git/3D-convolutional-speaker-recognition/speacker-rec-py35/lib/python3.5/importlib/_bootstrap.py:222: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
return f(*args, **kwds)
Enrollment data shape: (108, 80, 40, 1)
Enrollment label shape: (108,)
Evaluation data shape: (12, 80, 40, 1)
Evaluation label shape: (12,)
Traceback (most recent call last):
File "./code/3-evaluation/evaluation.py", line 380, in <module>
tf.app.run()
File "/home/jovyan/Documents/git/3D-convolutional-speaker-recognition/speacker-rec-py35/lib/python3.5/site-packages/tensorflow/python/platform/app.py", line 125, in run
_sys.exit(main(argv))
File "./code/3-evaluation/evaluation.py", line 202, in main
for i in xrange(FLAGS.num_clones):
NameError: name 'xrange' is not defined
Closing remaining open files:data/enrollment-evaluation_sample_dataset.hdf5...donedata/development_sample_dataset_speaker.hdf5...done
/home/jovyan/Documents/git/3D-convolutional-speaker-recognition/speacker-rec-py35/lib/python3.5/importlib/_bootstrap.py:222: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
return f(*args, **kwds)
/home/jovyan/Documents/git/3D-convolutional-speaker-recognition/speacker-rec-py35/lib/python3.5/importlib/_bootstrap.py:222: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
return f(*args, **kwds)
/home/jovyan/Documents/git/3D-convolutional-speaker-recognition/speacker-rec-py35/lib/python3.5/importlib/_bootstrap.py:222: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
return f(*args, **kwds)
/home/jovyan/Documents/git/3D-convolutional-speaker-recognition/speacker-rec-py35/lib/python3.5/importlib/_bootstrap.py:222: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
return f(*args, **kwds)
/home/jovyan/Documents/git/3D-convolutional-speaker-recognition/speacker-rec-py35/lib/python3.5/site-packages/sklearn/cross_validation.py:41: DeprecationWarning: This module was deprecated in version 0.18 in favor of the model_selection module into which all the refactored classes and functions are moved. Also note that the interface of the new CV iterators are different from that of this module. This module will be removed in 0.20.
"This module will be removed in 0.20.", DeprecationWarning)
/home/jovyan/Documents/git/3D-convolutional-speaker-recognition/speacker-rec-py35/lib/python3.5/site-packages/sklearn/ensemble/weight_boosting.py:29: DeprecationWarning: numpy.core.umath_tests is an internal NumPy module and should not be imported. It will be removed in a future NumPy release.
from numpy.core.umath_tests import inner1d
/home/jovyan/Documents/git/3D-convolutional-speaker-recognition/speacker-rec-py35/lib/python3.5/site-packages/sklearn/grid_search.py:42: DeprecationWarning: This module was deprecated in version 0.18 in favor of the model_selection module into which all the refactored classes and functions are moved. This module will be removed in 0.20.
DeprecationWarning)
/home/jovyan/Documents/git/3D-convolutional-speaker-recognition/speacker-rec-py35/lib/python3.5/site-packages/sklearn/learning_curve.py:22: DeprecationWarning: This module was deprecated in version 0.18 in favor of the model_selection module into which all the functions are moved. This module will be removed in 0.20
DeprecationWarning)
Traceback (most recent call last):
File "./code/4-ROC_PR_curve/calculate_roc.py", line 23, in <module>
score = np.load(os.path.join(FLAGS.evaluation_dir,'score_vector.npy'))
File "/home/jovyan/Documents/git/3D-convolutional-speaker-recognition/speacker-rec-py35/lib/python3.5/site-packages/numpy/lib/npyio.py", line 384, in load
fid = open(file, "rb")
FileNotFoundError: [Errno 2] No such file or directory: 'results/SCORES/score_vector.npy'
/home/jovyan/Documents/git/3D-convolutional-speaker-recognition/speacker-rec-py35/lib/python3.5/importlib/_bootstrap.py:222: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
return f(*args, **kwds)
/home/jovyan/Documents/git/3D-convolutional-speaker-recognition/speacker-rec-py35/lib/python3.5/importlib/_bootstrap.py:222: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
return f(*args, **kwds)
/home/jovyan/Documents/git/3D-convolutional-speaker-recognition/speacker-rec-py35/lib/python3.5/importlib/_bootstrap.py:222: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
return f(*args, **kwds)
/home/jovyan/Documents/git/3D-convolutional-speaker-recognition/speacker-rec-py35/lib/python3.5/importlib/_bootstrap.py:222: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
return f(*args, **kwds)
/home/jovyan/Documents/git/3D-convolutional-speaker-recognition/speacker-rec-py35/lib/python3.5/site-packages/sklearn/cross_validation.py:41: DeprecationWarning: This module was deprecated in version 0.18 in favor of the model_selection module into which all the refactored classes and functions are moved. Also note that the interface of the new CV iterators are different from that of this module. This module will be removed in 0.20.
"This module will be removed in 0.20.", DeprecationWarning)
/home/jovyan/Documents/git/3D-convolutional-speaker-recognition/speacker-rec-py35/lib/python3.5/site-packages/sklearn/ensemble/weight_boosting.py:29: DeprecationWarning: numpy.core.umath_tests is an internal NumPy module and should not be imported. It will be removed in a future NumPy release.
from numpy.core.umath_tests import inner1d
/home/jovyan/Documents/git/3D-convolutional-speaker-recognition/speacker-rec-py35/lib/python3.5/site-packages/sklearn/grid_search.py:42: DeprecationWarning: This module was deprecated in version 0.18 in favor of the model_selection module into which all the refactored classes and functions are moved. This module will be removed in 0.20.
DeprecationWarning)
/home/jovyan/Documents/git/3D-convolutional-speaker-recognition/speacker-rec-py35/lib/python3.5/site-packages/sklearn/learning_curve.py:22: DeprecationWarning: This module was deprecated in version 0.18 in favor of the model_selection module into which all the functions are moved. This module will be removed in 0.20
DeprecationWarning)
Traceback (most recent call last):
File "./code/4-ROC_PR_curve/PlotROC.py", line 73, in <module>
score = np.load(os.path.join(FLAGS.evaluation_dir,'score_vector.npy'))
File "/home/jovyan/Documents/git/3D-convolutional-speaker-recognition/speacker-rec-py35/lib/python3.5/site-packages/numpy/lib/npyio.py", line 384, in load
fid = open(file, "rb")
FileNotFoundError: [Errno 2] No such file or directory: 'results/SCORES/score_vector.npy'
/home/jovyan/Documents/git/3D-convolutional-speaker-recognition/speacker-rec-py35/lib/python3.5/importlib/_bootstrap.py:222: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
return f(*args, **kwds)
/home/jovyan/Documents/git/3D-convolutional-speaker-recognition/speacker-rec-py35/lib/python3.5/importlib/_bootstrap.py:222: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
return f(*args, **kwds)
/home/jovyan/Documents/git/3D-convolutional-speaker-recognition/speacker-rec-py35/lib/python3.5/importlib/_bootstrap.py:222: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
return f(*args, **kwds)
/home/jovyan/Documents/git/3D-convolutional-speaker-recognition/speacker-rec-py35/lib/python3.5/importlib/_bootstrap.py:222: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
return f(*args, **kwds)
/home/jovyan/Documents/git/3D-convolutional-speaker-recognition/speacker-rec-py35/lib/python3.5/site-packages/sklearn/cross_validation.py:41: DeprecationWarning: This module was deprecated in version 0.18 in favor of the model_selection module into which all the refactored classes and functions are moved. Also note that the interface of the new CV iterators are different from that of this module. This module will be removed in 0.20.
"This module will be removed in 0.20.", DeprecationWarning)
/home/jovyan/Documents/git/3D-convolutional-speaker-recognition/speacker-rec-py35/lib/python3.5/site-packages/sklearn/ensemble/weight_boosting.py:29: DeprecationWarning: numpy.core.umath_tests is an internal NumPy module and should not be imported. It will be removed in a future NumPy release.
from numpy.core.umath_tests import inner1d
/home/jovyan/Documents/git/3D-convolutional-speaker-recognition/speacker-rec-py35/lib/python3.5/site-packages/sklearn/grid_search.py:42: DeprecationWarning: This module was deprecated in version 0.18 in favor of the model_selection module into which all the refactored classes and functions are moved. This module will be removed in 0.20.
DeprecationWarning)
/home/jovyan/Documents/git/3D-convolutional-speaker-recognition/speacker-rec-py35/lib/python3.5/site-packages/sklearn/learning_curve.py:22: DeprecationWarning: This module was deprecated in version 0.18 in favor of the model_selection module into which all the functions are moved. This module will be removed in 0.20
DeprecationWarning)
Traceback (most recent call last):
File "./code/4-ROC_PR_curve/PlotPR.py", line 58, in <module>
score = np.load(os.path.join(FLAGS.evaluation_dir,'score_vector.npy'))
File "/home/jovyan/Documents/git/3D-convolutional-speaker-recognition/speacker-rec-py35/lib/python3.5/site-packages/numpy/lib/npyio.py", line 384, in load
fid = open(file, "rb")
FileNotFoundError: [Errno 2] No such file or directory: 'results/SCORES/score_vector.npy'
/home/jovyan/Documents/git/3D-convolutional-speaker-recognition/speacker-rec-py35/lib/python3.5/importlib/_bootstrap.py:222: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
return f(*args, **kwds)
/home/jovyan/Documents/git/3D-convolutional-speaker-recognition/speacker-rec-py35/lib/python3.5/importlib/_bootstrap.py:222: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
return f(*args, **kwds)
/home/jovyan/Documents/git/3D-convolutional-speaker-recognition/speacker-rec-py35/lib/python3.5/importlib/_bootstrap.py:222: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
return f(*args, **kwds)
/home/jovyan/Documents/git/3D-convolutional-speaker-recognition/speacker-rec-py35/lib/python3.5/importlib/_bootstrap.py:222: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
return f(*args, **kwds)
/home/jovyan/Documents/git/3D-convolutional-speaker-recognition/speacker-rec-py35/lib/python3.5/site-packages/sklearn/cross_validation.py:41: DeprecationWarning: This module was deprecated in version 0.18 in favor of the model_selection module into which all the refactored classes and functions are moved. Also note that the interface of the new CV iterators are different from that of this module. This module will be removed in 0.20.
"This module will be removed in 0.20.", DeprecationWarning)
/home/jovyan/Documents/git/3D-convolutional-speaker-recognition/speacker-rec-py35/lib/python3.5/site-packages/sklearn/ensemble/weight_boosting.py:29: DeprecationWarning: numpy.core.umath_tests is an internal NumPy module and should not be imported. It will be removed in a future NumPy release.
from numpy.core.umath_tests import inner1d
/home/jovyan/Documents/git/3D-convolutional-speaker-recognition/speacker-rec-py35/lib/python3.5/site-packages/sklearn/grid_search.py:42: DeprecationWarning: This module was deprecated in version 0.18 in favor of the model_selection module into which all the refactored classes and functions are moved. This module will be removed in 0.20.
DeprecationWarning)
/home/jovyan/Documents/git/3D-convolutional-speaker-recognition/speacker-rec-py35/lib/python3.5/site-packages/sklearn/learning_curve.py:22: DeprecationWarning: This module was deprecated in version 0.18 in favor of the model_selection module into which all the functions are moved. This module will be removed in 0.20
DeprecationWarning)
Traceback (most recent call last):
File "./code/4-ROC_PR_curve/PlotHIST.py", line 53, in <module>
score = np.load(os.path.join(FLAGS.evaluation_dir,'score_vector.npy'))
File "/home/jovyan/Documents/git/3D-convolutional-speaker-recognition/speacker-rec-py35/lib/python3.5/site-packages/numpy/lib/npyio.py", line 384, in load
fid = open(file, "rb")
FileNotFoundError: [Errno 2] No such file or directory: 'results/SCORES/score_vector.npy'
Hi @astorfi,
Thanks for your work its really helpful to me.
I would like to know what is the algorithm which you have used for VAD task.
Thank you.
how the shape is obtained in utterance_enrollment in enrollment-evaluation_sample_dataset.hdf5 : no_of_sample x 80 x 40 x1
Hi,
First, that is a great job and ver well done :)
Now I am trying to use your source code and maybe contribute to it, I am working on a speaker recognition problem to detect if a teacher tutorial is recorded by his voic. I have about 10 hours of historical recordings for 6 teachers. First I used the speechpy to get 3d npy from wav files and used your create_development.py to create the hdf5 files for train and eval. Is that correct? Specially I got 13 instead of 40 regarding the features vector length in the npy files! I ran the run.bash file and it gave me also error saying something like that: ValueError: Negative dimension size caused by subtracting 2 from 1 for 'MaxPool_7' (op: 'MaxPool') with input shapes: [?,1,112,128].
Can you provide input pipeline? Thanks!
Hi @astorfi
I have gone through your code. While extracting mfcc features for sample audio file it contains shape (420,40) here, 420 is number of frames and 40 is number of features.But In sample data of your code youre applying mfec feature file contains shape (3209,40,3). As per my understanding 3209 is Number of Frames,40 is Number of Features,3 Is number of Channels. I didn't understand the number of channels usage.can you please suggest how to create Feature_mfec.npy file in your format.
I have use some utterances to test the code, and the program is running well, but the console prints that minibatch loss always during 7~9 and not decrease, accuracy=0, what's wrong about this? Thx!
Hi Astorfi,
Your paper is awesome.
I am trying to train speech data using 3D CNN.
I have prepared data according to mention in paper. but during development phase I am getting mean and standard deviation "nan" in each epoch.
I am getting following output:
Epoch 1, Minibatch 1 of 3 , Minibatch Loss= 2.1972, TRAIN ACCURACY= 0.000
Epoch 1, Minibatch 2 of 3 , Minibatch Loss= 2.2215, TRAIN ACCURACY= 0.000
Epoch 1, Minibatch 3 of 3 , Minibatch Loss= 2.2637, TRAIN ACCURACY= 0.000
TESTING after finishing the training on: epoch 1
Test Accuracy 1, Mean= nan, std= nan
Can you please help me, why am I getting this problem?
I'm trying to figure out your pipeline including reading the paper with no luck so far.
Clearly base on the open and and closed issues I'm not the only one. It seems a lot of work has been done here and quality work too.
However this repository cries for a solid example from WAV file through feature extraction development enrollment and prediction.
I know that each case need to customize it's pipeline by itself but in my point of view the example, paper and documentation doesn't give enough infrastructure to continue on your own.
Again it really seems I'm not the only one. Can you please upload a pipeline example , refer me to one or at least upload a clear description from WAV file to prediction.
Hi @astorfi
In the development step, i tried running train_softmax.py over VoxCeleb2 dataset and then i got nan output (logits variable in code). how to solve this?
Hi astorfi,
Thanks for your great job. These days I am running your code on my dataset but I found it the validate accuracy is low in my experiments. I have no idea if there is something wrong. What's the validate accuracy of your experiments when the network is converged?
Hi guys,
I have a question regarding the input wav files used for training.
What are the audio format specifications?
I used voxceleb ( http://www.robots.ox.ac.uk/~vgg/data/voxceleb/ ) as dataset, but it is giving me some troubles.
Do you know about any other usable dataset?
Thank you ;)
Hi astorfi,
Thanks for such a great work. The pipeline is really great.
But I try ai-shell dataset the kaldi i-vector is around 2% eer. 3D-convolutional-speaker-recognitionwith LDA is 17% eer.
What's wrong? Any help will thank a lot!
Hi Astorfi,
I am trying train speaker model with your model,
How can I prepare my speech data (train_data, development_data, evaluation_data) for your model?
Thank you very much!
Hello everyone,
In the context of 3D-CNN audio features extraction, what does low-level and high-level features extraction mean?
Thank you,
Hi, I am using your function to recognize speaker identification.
I am new in machine learning, could you tell me, can I use this function to do the speaker recognition?
Now, I am able to run your function and out put some graphic, but I do not know how to use those graphic to recognize speaker.
And, I am trying to use my own WAV file to do the training.
However, I get some error:
...3D-convolutional-speaker-recognition-master\code\0-input\create_hdf5\pair_generation.py", line 42, in feed_to_hdf5
:, 0]
IndexError: too many indices for array
Closing remaining open files:development.hdf5...done
Default file, the feature_mfec.npy file like this:
array([[[ 1.41430335e+01, 1.38114970e+00, 8.35106419e-02],
[ 1.41430335e+01, 1.36457288e+00, 8.67885390e-03],
[ 1.39772653e+01, 1.11641647e+00, -2.56141085e-02],
...,
[ 9.24432067e+00, 9.21401209e-01, 1.09648406e-01],
[ 9.19465798e+00, 9.45404618e-01, 1.09012088e-01],
[ 9.37358081e+00, 9.68176377e-01, 1.03772331e-01]]])
I change my WAV change to npy and content like this:
array([[ 143., 143.],
[ 136., 136.],
[ 121., 121.],
...,
[ 72., 72.],
[ 81., 81.],
[ 90., 90.]])
My transform function:
import numpy as np
import scipy.io.wavfile as wav
temp_npy =wav.read('...\\19-198-0000.wav')
print(temp_npy)
result = np.array(temp_npy[1],dtype=float)
np.save('test_wav_r_values.npy', result)
Do I need to change other type WAV file or I should change other transform function from WAV to npy file?
I also find the import scipy.io.wavfile as wav
in the create_development.py
, can I just input the WAV for training feed?
Thank your for you provide this function.
Hi Astorfi,
Your work is prefect! I've read your paper and it's actually great.
So im new in the field of tensorflow and all and im trying to learn.
Im having a problem when executing ./run.sh and this is the text error :
Traceback (most recent call last):
File "./code/4-ROC_PR_curve/PlotHIST.py", line 53, in
score = np.load(os.path.join(FLAGS.evaluation_dir,'score_vector.npy'))
File "C:\Users\Boulbaba Zitouni\AppData\Local\Programs\Python\Python36\lib\site-packages\numpy\lib\npyio.py", line 372, in load
fid = open(file, "rb")
FileNotFoundError: [Errno 2] No such file or directory: 'results/SCORES\score_vector.npy'
i just cloned the project and tried to run it thats it
i have already installed all the requierements (on the requierements.txt )
Any solutions ?
slim.conv2d(inputs, 16, [3, 1, 5], stride=[1, 1, 1], scope='conv11')
throw exception:
The kernel_size
argument must be a tuple of 2 integers. Received: [3, 1, 5]
I've read your paper and it's really impresive.
Would like to ask you regarding the input preprocessing:
Assume I've got a wav file consisting 0.8 sec
fs, signal = wav.read(file_name)
Then I use mfec=speechpy.feature.mfe(signal,fs)
the size if mfec is [79,40] so I changed the input file to be 0.81sec
and then I received [80,40]...
according to your paper I need [20,80,40] to create one training example so I can create this by duplication my original [80,40] by 20 (this is how you did at testing phase) or by concatenating 20 different utterances of 0.81sec. Is that correct?
Any clarifications would be appreciated!
Alan
Hi Astorfi,
First your work is prefect! I've read your paper and it's really great
In sample data of your previous code, you are applying mfec feature file(feature_mfec.npy) contains shape (3209,40,3). As per my understanding 3209 is Number of Frames,40 is Number of Features,3(The 3 is the number of channels which consists of static, first order and second order derivative features.) Is number of Channels. (using speechpy.feature.extract_derivative_feature(feature)) right?
but in the input_feature.py function (you provided these days),the feature file output contains shape (1, 20, 80, 40),80 is Number of Frames,40 is Number of Features,20 is number of utterances,1 represents a cube of one speaker,right?
so in the input_feature.py just use the first channel of the MFEC features of the audio?
Thanks!
When I ran the run.sh, the execution terminated saying:
FileNotFoundError: [Errno 2] No such file or directory: 'results/SCORES/score_vector.npy'
Where do i get this score file from? Do I need to create one? I just ran the run.sh for demo.
Can you please help?
Regards!
Hello, first of all thank you for releasing the code. Unfortunately, I'm stuck at step 1 (step 0 works.)
I installed all the requirements, this is my setup:
Win7 64
Python 3.5.4
Tensorflow 1.6 (installed in a separate Anaconda environment but still wit pip install)
Tables 3.4.3
I also installed pytables from conda, I thought it was missing, but the result is still the same:
When running train_softmax, at 'import tables', I get:
File "C:\Users...\AppData\Local\Continuum\Anaconda3\envs\tensorflow\lib\site-packages\tables_init_.py", line 90, in
from .utilsextension import (
ImportError: DLL load failed: The specified procedure could not be found.
The thing is, if i simpy import tables with no code preceding it, it's fine. If I import it after tensorflow (as in your code), it gives me the error. If I move 'import tables' before 'import tensorflow', then python crashes.
I tried to find answers on the net but none was useful...
Thanks
Should the data for Enrollment and Development be same?
The term utterances has not been defined anywhere in the paper. I am new to the field of speaker recognition.
Can someone tell me what utterances means in the context of this project?
Thanks in advance
Hi @astorfi ,thank for your great work, i also use all the same settings but use hdf5 to store training data instead of Audio Dataset. However, my evaluation result is low, EER is up to 40%. I think there is something wrong with my work. Do you have any idea to fix this?
I use VoxCeleb dataset for background model and only use 1 sample per speaker.
50 people for enrollment, 50 for un-enrollment (reject).
4 samples for evaluation.
Thank for your help.
Hi Astorfi,
I'm trying to train speaker recognition model with your model.
Since I'm a beginner at programming, I don't understand your code nicely.
For enrollment and evaluation phase, I just have to prepare the data (shape of (sample, 1, 80, 40))??
I read the paper and I don't know if I have to copy the data of single utterance to make the data (shape of (sample, 20, 80, 40)).
Also I prepare the data for development (shape of (97, 20, 80, 40)) using input_feature.py, but do I have to prepare the data (shape of (97, 80, 40, 20))??
Thank you very much.
Hi,
I am getting a training accuracy of 95% on voxceleb but testing accuracy is around 10% only. What can be reason of this?
Speakers=1211
Batch size =100
epoch = 50
Data size = 24000
Any ideas why I'm receiving different prediction values when running with batch_size=1,16?
find code below:
Thanks!
def predict(self,speech_input):
labels = np.empty(0, int)
labels = np.append(labels, range(speech_input.shape[0]), axis=0)
feature,logits,_ = self.session.run(
[self.features,self.logits,self.end_points_speech],
feed_dict={self.is_training: False, self.batch_dynamic: labels.shape[0],
self.margin_imp_tensor: 50,
self.batch_speech: speech_input})
#self.batch_labels: labels.reshape([labels.shape[0], 1])})
# Extracting the associated numpy array.
#print (feature[0])
return feature,logits
i follow all instruction given in video and i got this error
No such file or directory: 'results/SCORES/score_vector.npy
please tell me how to resolve it?
fid = open(os_fspath(file), "rb")
FileNotFoundError: [Errno 2] No such file or directory: 'results/SCORES/score_vector.npy'
Traceback (most recent call last):
File "./code/4-ROC_PR_curve/PlotROC.py", line 73, in
C:\Users\user\AppData\Local\Programs\Python\Python36\lib\site-packages\h5py_init_.py:36: FutureWarning: Conversion of the second argument of issubdtype from float
to np.floating
is deprecated. In future, it will be treated as np.float64 == np.dtype(float).type
.
from ._conv import register_converters as _register_converters
WARNING:tensorflow:From C:\Users\user\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow\contrib\learn\python\learn\datasets\base.py:198: retry (from tensorflow.contrib.learn.python.learn.datasets.base) is deprecated and will be removed in a future version.
Instructions for updating:
Use the retry module or similar alternatives.
C:\Users\user\AppData\Local\Programs\Python\Python36\lib\site-packages\sklearn\cross_validation.py:41: DeprecationWarning: This module was deprecated in version 0.18 in favor of the model_selection module into which all the refactored classes and functions are moved. Also note that the interface of the new CV iterators are different from that of this module. This module will be removed in 0.20.
"This module will be removed in 0.20.", DeprecationWarning)
C:\Users\user\AppData\Local\Programs\Python\Python36\lib\site-packages\sklearn\grid_search.py:42: DeprecationWarning: This module was deprecated in version 0.18 in favor of the model_selection module into which all the refactored classes and functions are moved. This module will be removed in 0.20.
DeprecationWarning)
C:\Users\user\AppData\Local\Programs\Python\Python36\lib\site-packages\sklearn\learning_curve.py:22: DeprecationWarning: This module was deprecated in version 0.18 in favor of the model_selection module into which all the functions are moved. This module will be removed in 0.20
DeprecationWarning)
Train data shape: (12, 80, 40, 20)
Train label shape: (12,)
Test data shape: (12, 80, 40, 20)
Test label shape: (12,)
WARNING:tensorflow:From ./code/1-development/train_softmax.py:423: softmax_cross_entropy_with_logits (from tensorflow.python.ops.nn_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Future major versions of TensorFlow will allow gradients to flow
into the labels input on backprop by default.
See tf.nn.softmax_cross_entropy_with_logits_v2.
2018-06-08 08:58:18.940789: I T:\src\github\tensorflow\tensorflow\core\platform\cpu_feature_guard.cc:140] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2
Epoch 1, Minibatch 1 of 4 , Minibatch Loss= 1.3863, TRAIN ACCURACY= 0.000
Epoch 1, Minibatch 2 of 4 , Minibatch Loss= 1.2341, TRAIN ACCURACY= 100.000
Epoch 1, Minibatch 3 of 4 , Minibatch Loss= 0.0000, TRAIN ACCURACY= 0.000
Epoch 1, Minibatch 4 of 4 , Minibatch Loss= 1.0951, TRAIN ACCURACY= 100.000
TESTING after finishing the training on: epoch 1
Test Accuracy 1, Mean= 50.0000, std= 50.000
Closing remaining open files:data/development_sample_dataset_speaker.hdf5...done
C:\Users\user\AppData\Local\Programs\Python\Python36\lib\site-packages\h5py_init_.py:36: FutureWarning: Conversion of the second argument of issubdtype from float
to np.floating
is deprecated. In future, it will be treated as np.float64 == np.dtype(float).type
.
from ._conv import register_converters as _register_converters
WARNING:tensorflow:From C:\Users\user\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow\contrib\learn\python\learn\datasets\base.py:198: retry (from tensorflow.contrib.learn.python.learn.datasets.base) is deprecated and will be removed in a future version.
Instructions for updating:
Use the retry module or similar alternatives.
Enrollment data shape: (108, 80, 40, 1)
Enrollment label shape: (108,)
Evaluation data shape: (12, 80, 40, 1)
Evaluation label shape: (12,)
Traceback (most recent call last):
File "./code/2-enrollment/enrollment.py", line 330, in
tf.app.run()
File "C:\Users\user\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow\python\platform\app.py", line 126, in run
sys.exit(main(argv))
File "./code/2-enrollment/enrollment.py", line 201, in main
for i in xrange(FLAGS.num_clones):
NameError: name 'xrange' is not defined
Closing remaining open files:data/development_sample_dataset_speaker.hdf5...donedata/enrollment-evaluation_sample_dataset.hdf5...done
C:\Users\user\AppData\Local\Programs\Python\Python36\lib\site-packages\h5py_init.py:36: FutureWarning: Conversion of the second argument of issubdtype from float
to np.floating
is deprecated. In future, it will be treated as np.float64 == np.dtype(float).type
.
from ._conv import register_converters as _register_converters
WARNING:tensorflow:From C:\Users\user\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow\contrib\learn\python\learn\datasets\base.py:198: retry (from tensorflow.contrib.learn.python.learn.datasets.base) is deprecated and will be removed in a future version.
Instructions for updating:
Use the retry module or similar alternatives.
Enrollment data shape: (108, 80, 40, 1)
Enrollment label shape: (108,)
Evaluation data shape: (12, 80, 40, 1)
Evaluation label shape: (12,)
Traceback (most recent call last):
File "./code/3-evaluation/evaluation.py", line 380, in
tf.app.run()
File "C:\Users\user\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow\python\platform\app.py", line 126, in run
sys.exit(main(argv))
File "./code/3-evaluation/evaluation.py", line 202, in main
for i in xrange(FLAGS.num_clones):
NameError: name 'xrange' is not defined
Closing remaining open files:data/enrollment-evaluation_sample_dataset.hdf5...donedata/development_sample_dataset_speaker.hdf5...done
C:\Users\user\AppData\Local\Programs\Python\Python36\lib\site-packages\h5py_init.py:36: FutureWarning: Conversion of the second argument of issubdtype from float
to np.floating
is deprecated. In future, it will be treated as np.float64 == np.dtype(float).type
.
from ._conv import register_converters as register_converters
C:\Users\user\AppData\Local\Programs\Python\Python36\lib\site-packages\sklearn\cross_validation.py:41: DeprecationWarning: This module was deprecated in version 0.18 in favor of the model_selection module into which all the refactored classes and functions are moved. Also note that the interface of the new CV iterators are different from that of this module. This module will be removed in 0.20.
"This module will be removed in 0.20.", DeprecationWarning)
C:\Users\user\AppData\Local\Programs\Python\Python36\lib\site-packages\sklearn\grid_search.py:42: DeprecationWarning: This module was deprecated in version 0.18 in favor of the model_selection module into which all the refactored classes and functions are moved. This module will be removed in 0.20.
DeprecationWarning)
C:\Users\user\AppData\Local\Programs\Python\Python36\lib\site-packages\sklearn\learning_curve.py:22: DeprecationWarning: This module was deprecated in version 0.18 in favor of the model_selection module into which all the functions are moved. This module will be removed in 0.20
DeprecationWarning)
Traceback (most recent call last):
File "./code/4-ROC_PR_curve/calculate_roc.py", line 23, in
score = np.load(os.path.join(FLAGS.evaluation_dir,'score_vector.npy'))
File "C:\Users\user\AppData\Local\Programs\Python\Python36\lib\site-packages\numpy\lib\npyio.py", line 372, in load
fid = open(file, "rb")
FileNotFoundError: [Errno 2] No such file or directory: 'results/SCORES\score_vector.npy'
C:\Users\user\AppData\Local\Programs\Python\Python36\lib\site-packages\h5py_init.py:36: FutureWarning: Conversion of the second argument of issubdtype from float
to np.floating
is deprecated. In future, it will be treated as np.float64 == np.dtype(float).type
.
from ._conv import register_converters as register_converters
C:\Users\user\AppData\Local\Programs\Python\Python36\lib\site-packages\sklearn\cross_validation.py:41: DeprecationWarning: This module was deprecated in version 0.18 in favor of the model_selection module into which all the refactored classes and functions are moved. Also note that the interface of the new CV iterators are different from that of this module. This module will be removed in 0.20.
"This module will be removed in 0.20.", DeprecationWarning)
C:\Users\user\AppData\Local\Programs\Python\Python36\lib\site-packages\sklearn\grid_search.py:42: DeprecationWarning: This module was deprecated in version 0.18 in favor of the model_selection module into which all the refactored classes and functions are moved. This module will be removed in 0.20.
DeprecationWarning)
C:\Users\user\AppData\Local\Programs\Python\Python36\lib\site-packages\sklearn\learning_curve.py:22: DeprecationWarning: This module was deprecated in version 0.18 in favor of the model_selection module into which all the functions are moved. This module will be removed in 0.20
DeprecationWarning)
Traceback (most recent call last):
File "./code/4-ROC_PR_curve/PlotROC.py", line 73, in
score = np.load(os.path.join(FLAGS.evaluation_dir,'score_vector.npy'))
File "C:\Users\user\AppData\Local\Programs\Python\Python36\lib\site-packages\numpy\lib\npyio.py", line 372, in load
fid = open(file, "rb")
FileNotFoundError: [Errno 2] No such file or directory: 'results/SCORES\score_vector.npy'
C:\Users\user\AppData\Local\Programs\Python\Python36\lib\site-packages\h5py_init.py:36: FutureWarning: Conversion of the second argument of issubdtype from float
to np.floating
is deprecated. In future, it will be treated as np.float64 == np.dtype(float).type
.
from ._conv import register_converters as register_converters
C:\Users\user\AppData\Local\Programs\Python\Python36\lib\site-packages\sklearn\cross_validation.py:41: DeprecationWarning: This module was deprecated in version 0.18 in favor of the model_selection module into which all the refactored classes and functions are moved. Also note that the interface of the new CV iterators are different from that of this module. This module will be removed in 0.20.
"This module will be removed in 0.20.", DeprecationWarning)
C:\Users\user\AppData\Local\Programs\Python\Python36\lib\site-packages\sklearn\grid_search.py:42: DeprecationWarning: This module was deprecated in version 0.18 in favor of the model_selection module into which all the refactored classes and functions are moved. This module will be removed in 0.20.
DeprecationWarning)
C:\Users\user\AppData\Local\Programs\Python\Python36\lib\site-packages\sklearn\learning_curve.py:22: DeprecationWarning: This module was deprecated in version 0.18 in favor of the model_selection module into which all the functions are moved. This module will be removed in 0.20
DeprecationWarning)
Traceback (most recent call last):
File "./code/4-ROC_PR_curve/PlotPR.py", line 58, in
score = np.load(os.path.join(FLAGS.evaluation_dir,'score_vector.npy'))
File "C:\Users\user\AppData\Local\Programs\Python\Python36\lib\site-packages\numpy\lib\npyio.py", line 372, in load
fid = open(file, "rb")
FileNotFoundError: [Errno 2] No such file or directory: 'results/SCORES\score_vector.npy'
C:\Users\user\AppData\Local\Programs\Python\Python36\lib\site-packages\h5py_init.py:36: FutureWarning: Conversion of the second argument of issubdtype from float
to np.floating
is deprecated. In future, it will be treated as np.float64 == np.dtype(float).type
.
from ._conv import register_converters as _register_converters
C:\Users\user\AppData\Local\Programs\Python\Python36\lib\site-packages\sklearn\cross_validation.py:41: DeprecationWarning: This module was deprecated in version 0.18 in favor of the model_selection module into which all the refactored classes and functions are moved. Also note that the interface of the new CV iterators are different from that of this module. This module will be removed in 0.20.
"This module will be removed in 0.20.", DeprecationWarning)
C:\Users\user\AppData\Local\Programs\Python\Python36\lib\site-packages\sklearn\grid_search.py:42: DeprecationWarning: This module was deprecated in version 0.18 in favor of the model_selection module into which all the refactored classes and functions are moved. This module will be removed in 0.20.
DeprecationWarning)
C:\Users\user\AppData\Local\Programs\Python\Python36\lib\site-packages\sklearn\learning_curve.py:22: DeprecationWarning: This module was deprecated in version 0.18 in favor of the model_selection module into which all the functions are moved. This module will be removed in 0.20
DeprecationWarning)
Traceback (most recent call last):
File "./code/4-ROC_PR_curve/PlotHIST.py", line 53, in
score = np.load(os.path.join(FLAGS.evaluation_dir,'score_vector.npy'))
File "C:\Users\user\AppData\Local\Programs\Python\Python36\lib\site-packages\numpy\lib\npyio.py", line 372, in load
fid = open(file, "rb")
FileNotFoundError: [Errno 2] No such file or directory: 'results/SCORES\score_vector.npy'
Hi astorfi:
Can you show an example of how to prepare data for the enrollment stage?
I met some problem in this stage, I process the data as the generation of data for development,
but it doesn't work, the hdf5 format is somewhat annoying m can you show how to implement it just by
using some random data as an example? Thanks a lot
Hi astorfi,
Thanks for such a great work. I want to ask you some questions, because I am quite in DL but new to speech field so please forgive me if I ask any dump question :D
Hello! We are trying to make our own input pipeline. However, when we follow the getitem method in Audioset (with the setting that cube_shape is (20,80,40)), there is a shape mismatch when the model tries to feed data for batch_speech (placeholder with the shape of (20,80,.40,1)).
After carefully review the code in train_softmax.py, we find that the input shape will conflict with the transpose operation in following code:
speech_train = np.transpose(speech_train[None, :, :, :, :], axes=(1, 4, 2, 3, 0))
What is the solution? Could you give us any help?
To make multiple models from multiple wav files, I added the following to the input_features.py
in order to generate .hdf5
file for all wav files I have:
idx = 0
f = open('file_path_test1.txt','r')
for line in f:
idx = idx + 1
lab = []
feat = []
for i in range(idx):
feature, label = dataset.__getitem__(i)
lab.append(label)
feat.append(feature)
print(feature.shape)
print(label)
######################
## creating hdf5 file ##
######################
h5file = tables.open_file('/root/3D_CNN/3D-convolutional-speaker-recognition/data/evaluation_test.hdf5', 'w')
label_test = h5file.create_carray(where = '/', name = 'label_enrollment', obj = lab, byteorder = 'little')
label_array = h5file.create_carray(where = '/', name = 'label_evaluation', obj = lab, byteorder = 'little')
utterance_test = h5file.create_earray(where = '/', name = 'utterance_enrollment', chunkshape = [1,20,80,40], obj = feat, byteorder = 'little')
utterance_train = h5file.create_earray(where = '/', name = 'utterance_evaluation', `chunkshape = [1,20,80,40]`, obj = feat, byteorder = 'little')
n5file.close()`
When I ran input_features.py
, it gave me the following error:
ValueError: the shape ((0, 1, 20, 80, 40)) and chunkshape ((1, 20, 80, 40)) ranks must be equal.
I recognized that lab
and feat
are arrays and each one has 9 elements (# of wav files I want to test). Each element of the feat
array has the features of each wav file in my wav list. So what I did is changing chunkshape
values to be chunkshape = [9,1,20,80,40]
and the evaluation_test.hdf5
file was created with no errors.
When I used hdf5 file that I created to run run.sh
I got this:
Train data shape: (12, 80, 40, 20)
Train label shape: (12,)
Test data shape: (12, 80, 40, 20)
Test label shape: (12,)
Epoch 1, Minibatch 1 of 4 , Minibatch Loss= 0.0000, TRAIN ACCURACY= 100.000
Epoch 1, Minibatch 2 of 4 , Minibatch Loss= 1.2341, TRAIN ACCURACY= 100.000
Epoch 1, Minibatch 3 of 4 , Minibatch Loss= 1.4641, TRAIN ACCURACY= 0.000
Epoch 1, Minibatch 4 of 4 , Minibatch Loss= 1.4434, TRAIN ACCURACY= 0.000
TESTING after finishing the training on: epoch 1
Test Accuracy 1, Mean= 50.0000, std= 50.000
Closing remaining open files:data/development_sample_dataset_speaker.hdf5...done
Enrollment data shape: (9, 1, 20, 80, 40)
Enrollment label shape: (9,)
Evaluation data shape: (9, 1, 20, 80, 40)
Evaluation label shape: (9,)
INFO:tensorflow:Scale of 0 disables regularizer.
.
.
Traceback (most recent call last):
File "./code/2-enrollment/enrollment.py", line 330, in <module>
tf.app.run()
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/platform/app.py", line 44, in run
_sys.exit(main(_sys.argv[:1] + flags_passthrough))
File "./code/2-enrollment/enrollment.py", line 289, in main
assert len(speaker_index) >= NumUtterance, "At least %d utterances is needed for each speaker" % NumUtterance
AssertionError: At least 20 utterances is needed for each speaker
Closing remaining open files:data/development_sample_dataset_speaker.hdf5...donedata/eval_try.hdf5...done
Enrollment data shape: (9, 1, 20, 80, 40)
Enrollment label shape: (9,)
Evaluation data shape: (9, 1, 20, 80, 40)
Evaluation label shape: (9,)
INFO:tensorflow:Scale of 0 disables regularizer.
.
.
Traceback (most recent call last):
File "./code/3-evaluation/evaluation.py", line 380, in <module>
tf.app.run()
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/platform/app.py", line 44, in run
_sys.exit(main(_sys.argv[:1] + flags_passthrough))
File "./code/3-evaluation/evaluation.py", line 329, in main
speech_evaluation = np.transpose(speech_evaluation[None, :, :, :, :], axes=(1, 4, 2, 3, 0))
File "/usr/local/lib/python2.7/dist-packages/numpy/core/fromnumeric.py", line 598, in transpose
return _wrapfunc(a, 'transpose', axes)
File "/usr/local/lib/python2.7/dist-packages/numpy/core/fromnumeric.py", line 51, in _wrapfunc
return getattr(obj, method)(*args, **kwds)
ValueError: axes don't match array
Closing remaining open files:data/development_sample_dataset_speaker.hdf5...donedata/eval_try.hdf5...done
('EER=', 43.75, 0.0)
('AUC=', 48.4375, 0.0)
('EER = ', 0.44)
('AUC = ', 0.48)
('AP = ', 0.33)
I'm not sure how to fix this: ValueError: axes don't match array
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.