:speaker: Deep Learning & 3D Convolutional Neural Networks for Speaker Verification

License: Apache License 2.0

Python 98.44% Shell 1.56%

convolutional-neural-networks deep-learning speaker-recognition 3d

3d-convolutional-speaker-recognition's People

Stargazers

Watchers

Forkers

nieshaoshuai statml stevenlol baiyancheng20 rhythm92 reyadrahman lulzzz vdt benjamesbabala practise2017 juzenn 007v b2220333 lyk125 mylearning2017 zhly0 guker popzelife dl-yc vsooda chagge changfengfeng artisdom eareat71 aitorbajo mkingupta numpad0 superhg2012 verderey hdubey shownor azzurolilc hyeonseop chengmuni66 seanhsieh a524631266 frannetty jiths summations byzhang drivenow subvin hema-vasudevan nkcsfight sunsetxh daicoolb marianadehon aidman dynastyreaper avinwangzh zelda3721 problemsniper sidkadam chochowski paulmlilo agoila bibhutibhusan89 jianjunwu reiisky machinelearningch yu2002ging qrt159 rohithkodali michaelzhouwang rihab77 alanbekker younkun duynguyen5896 habibzadeh vanlienhuong hermionecleo runngezhang jjj-jessie pb-pravin qshan2170 andra1 colinsongf prajual maxplne yongyug hubeibei007 sameergurjar zgsxwsdxg mohamedtarekm95 githubmg ntu16110052 jundger k-sandhu allazh xiaoqingwang lansingcode del18687058912 zhilangtaosha lbqin gdy1201 hungryquiter nikhilslounge yak0xff njpinton reinhardhsu

3d-convolutional-speaker-recognition's Issues

Default training not converging

Running just the train_softmax.py command in the example run.sh script with the sample data doesn't seem to converge, even at 50 epochs.

Command:

python -u ./code/1-development/train_softmax.py --num_epochs=50 --batch_size=3 --development_dataset_path=data/development_sample_dataset_speaker.hdf5 --train_dir=results/TRAIN_CNN_3D/train_logs

Output:

Loss:

Learning rate:

How to make Speaker Verification (1:1 recognition) model in keras?

I have idea about the speaker identification model using CNN.

But here my question is how to make a verification model using the data that contains only positive value.
Suppose, i have voice data of my voice only and i want to create the verification model from this data such that when i run the model then it will only recognize me not anyone else.

please provide the code for it.

Speaker recognition

How to record my voice as input with a identification and verify it by another input. It will recognise the speaker or not and what is the work flow?

Speaker recognition possible

Is Speaker recognition possible with this framework? I want to store the input voice pattern as part of enrolment process. then want to verify the input voice and find out who is currently speaking.

Is it possible?

Convolution expects input with rank 4, got 5

Traceback (most recent call last):
File "./code/1-development/train_softmax.py", line 602, in
tf.app.run()
File "/opt/tensorflow/python2.7/local/lib/python2.7/site-packages/tensorflow/python/platform/app.py", line 125, in run
_sys.exit(main(argv))
File "./code/1-development/train_softmax.py", line 414, in main
logits, end_points_speech = model_speech_fn(batch_speech[i * step: (i + 1) * step])
File "/opt/speaker-recognition/code/1-development/nets/nets_factory.py", line 59, in network_fn
return func(images, num_classes, is_training=is_training)
File "/opt/speaker-recognition/code/1-development/nets/cnn_speech.py", line 118, in speech_cnn
net = slim.conv2d(inputs, 16, [3, 1, 5], stride=[1, 1, 1], scope='conv11')
File "/opt/tensorflow/python2.7/local/lib/python2.7/site-packages/tensorflow/contrib/framework/python/ops/arg_scope.py", line 183, in func_with_args
return func(*args, **current_args)
File "/opt/tensorflow/python2.7/local/lib/python2.7/site-packages/tensorflow/contrib/layers/python/layers/layers.py", line 1154, in convolution2d
conv_dims=2)
File "/opt/tensorflow/python2.7/local/lib/python2.7/site-packages/tensorflow/contrib/framework/python/ops/arg_scope.py", line 183, in func_with_args
return func(*args, **current_args)
File "/opt/tensorflow/python2.7/local/lib/python2.7/site-packages/tensorflow/contrib/layers/python/layers/layers.py", line 1025, in convolution
(conv_dims + 2, input_rank))
ValueError: Convolution expects input with rank 4, got 5

How to generate data

Hello,

I have a dataset of voices. I want to generate development and enrollment hdf5 file.
The input_feature.py file seams to generate development files (nx80x40x20). How can I generate the enrollment file?

Input dataset

Hi @astorfi
I have some questions about input dataset.

According to the paper, the number of speakers is 511 in the development phase.
But how long is the input audio file per speaker ??
Although there is the function of CMVN preprocessing in input_feature.py, I'm not sure whether CMVN preprocessing is appropriate for the output of speechpy.feature.lmfe function.
Did you use CMVN preprocessing in the experiment of the paper??

Thank you for your work!!

Where does input_feature.py store it's results?

Hi,

I think train_softmax.py, enrollment.py and evaluation.py get their inputs from the hdf5 files stored in the data folder.
I also think that input_feature.py is supposed to store it's results in these hdf5 files.
But I am not able to figure out which part of the input_feature.py code is responsible for writing the results into the hdf5 files.
Can someone please help me out with this?

Thanks

inputting dataset to development

Hi astorfi,

I'm trying to train my own data set on your model. Is there also an update for the development files to feed dataset from input_features.py? It looks like train_softmax.py still takes in an hdf5 file.

Thanks,
Lucas

RuntimeWarning: numpy.dtype size changed

Hello,
Thank you for a wonderful work in speaker verification
I am trying to execute the code and its giving me the following error.
RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88

Is it because of the different versioning of numpy and scipy? If yes then what are the version which you have used while training?

Thank you for your help !

can't create enrollment hdf5 file

Hi astorfi:
i try to run the program but i can't find a python script to create enrollment hdf5 file
in code folder only exist create development hdf5 python script.
how to create enrollment hdf5 file?

Mark

ValueError: Convolution expects input with rank 4, got 5

When I run the run.sh, it shows something wrong:

/home/jovyan/Documents/git/3D-convolutional-speaker-recognition/speacker-rec-py35/lib/python3.5/importlib/_bootstrap.py:222: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
  return f(*args, **kwds)
/home/jovyan/Documents/git/3D-convolutional-speaker-recognition/speacker-rec-py35/lib/python3.5/importlib/_bootstrap.py:222: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88  return f(*args, **kwds)
/home/jovyan/Documents/git/3D-convolutional-speaker-recognition/speacker-rec-py35/lib/python3.5/importlib/_bootstrap.py:222: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
  return f(*args, **kwds)/home/jovyan/Documents/git/3D-convolutional-speaker-recognition/speacker-rec-py35/lib/python3.5/importlib/_bootstrap.py:222: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
  return f(*args, **kwds)/home/jovyan/Documents/git/3D-convolutional-speaker-recognition/speacker-rec-py35/lib/python3.5/site-packages/sklearn/cross_validation.py:41: DeprecationWarning: This module was deprecated in version 0.18 in favor of the model_selection module into which all the refactored classes and functions are moved. Also note that the interface of the new CV iterators are different from that of this module. This module will be removed in 0.20.
  "This module will be removed in 0.20.", DeprecationWarning)
/home/jovyan/Documents/git/3D-convolutional-speaker-recognition/speacker-rec-py35/lib/python3.5/site-packages/sklearn/ensemble/weight_boosting.py:29: DeprecationWarning: numpy.core.umath_tests is an internal NumPy module and should not be imported. It will be removed in a future NumPy release.
  from numpy.core.umath_tests import inner1d/home/jovyan/Documents/git/3D-convolutional-speaker-recognition/speacker-rec-py35/lib/python3.5/site-packages/sklearn/grid_search.py:42: DeprecationWarning: This module was deprecated in version 0.18 in favor of the model_selection module into which all the refactored classes and functions are moved. This module will be removed in 0.20.
  DeprecationWarning)
/home/jovyan/Documents/git/3D-convolutional-speaker-recognition/speacker-rec-py35/lib/python3.5/site-packages/sklearn/learning_curve.py:22: DeprecationWarning: This module was deprecated in version 0.18 in favor of the model_selection module into which all the functions are moved. This module will be removed in 0.20
  DeprecationWarning)
Train data shape: (12, 80, 40, 20)Train label shape: (12,)Test data shape: (12, 80, 40, 20)
Test label shape: (12,)
Traceback (most recent call last):  File "./code/1-development/train_softmax.py", line 602, in <module>    tf.app.run()
  File "/home/jovyan/Documents/git/3D-convolutional-speaker-recognition/speacker-rec-py35/lib/python3.5/site-packages/tensorflow/python/platform/app.py", line 125, in run
    _sys.exit(main(argv))
  File "./code/1-development/train_softmax.py", line 414, in main
    logits, end_points_speech = model_speech_fn(batch_speech[i * step: (i + 1) * step])  File "/home/jovyan/Documents/git/3D-convolutional-speaker-recognition/code/1-development/nets/nets_factory.py", line 59, in network_fn
    return func(images, num_classes, is_training=is_training)
  File "/home/jovyan/Documents/git/3D-convolutional-speaker-recognition/code/1-development/nets/cnn_speech.py", line 118, in speech_cnn
    net = slim.conv2d(inputs, 16, [3, 1, 5], stride=[1, 1, 1], scope='conv11')
  File "/home/jovyan/Documents/git/3D-convolutional-speaker-recognition/speacker-rec-py35/lib/python3.5/site-packages/tensorflow/contrib/framework/python/ops/arg_scope.py", line 183, in func_with_args
    return func(*args, **current_args)
  File "/home/jovyan/Documents/git/3D-convolutional-speaker-recognition/speacker-rec-py35/lib/python3.5/site-packages/tensorflow/contrib/layers/python/layers/layers.py", line 1154, in convolution2d
    conv_dims=2)
  File "/home/jovyan/Documents/git/3D-convolutional-speaker-recognition/speacker-rec-py35/lib/python3.5/site-packages/tensorflow/contrib/framework/python/ops/arg_scope.py", line 183, in func_with_args
    return func(*args, **current_args)
  File "/home/jovyan/Documents/git/3D-convolutional-speaker-recognition/speacker-rec-py35/lib/python3.5/site-packages/tensorflow/contrib/layers/python/layers/layers.py", line 1025, in convolution
    (conv_dims + 2, input_rank))
ValueError: Convolution expects input with rank 4, got 5
Closing remaining open files:data/development_sample_dataset_speaker.hdf5...done
/home/jovyan/Documents/git/3D-convolutional-speaker-recognition/speacker-rec-py35/lib/python3.5/importlib/_bootstrap.py:222: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
  return f(*args, **kwds)
/home/jovyan/Documents/git/3D-convolutional-speaker-recognition/speacker-rec-py35/lib/python3.5/importlib/_bootstrap.py:222: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
  return f(*args, **kwds)
/home/jovyan/Documents/git/3D-convolutional-speaker-recognition/speacker-rec-py35/lib/python3.5/importlib/_bootstrap.py:222: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
  return f(*args, **kwds)
/home/jovyan/Documents/git/3D-convolutional-speaker-recognition/speacker-rec-py35/lib/python3.5/importlib/_bootstrap.py:222: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
  return f(*args, **kwds)
Enrollment data shape: (108, 80, 40, 1)
Enrollment label shape: (108,)
Evaluation data shape: (12, 80, 40, 1)
Evaluation label shape: (12,)
Traceback (most recent call last):
  File "./code/2-enrollment/enrollment.py", line 330, in <module>
    tf.app.run()
  File "/home/jovyan/Documents/git/3D-convolutional-speaker-recognition/speacker-rec-py35/lib/python3.5/site-packages/tensorflow/python/platform/app.py", line 125, in run
    _sys.exit(main(argv))
  File "./code/2-enrollment/enrollment.py", line 201, in main
    for i in xrange(FLAGS.num_clones):
NameError: name 'xrange' is not defined
Closing remaining open files:data/development_sample_dataset_speaker.hdf5...donedata/enrollment-evaluation_sample_dataset.hdf5...done
/home/jovyan/Documents/git/3D-convolutional-speaker-recognition/speacker-rec-py35/lib/python3.5/importlib/_bootstrap.py:222: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
  return f(*args, **kwds)
/home/jovyan/Documents/git/3D-convolutional-speaker-recognition/speacker-rec-py35/lib/python3.5/importlib/_bootstrap.py:222: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
  return f(*args, **kwds)
/home/jovyan/Documents/git/3D-convolutional-speaker-recognition/speacker-rec-py35/lib/python3.5/importlib/_bootstrap.py:222: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
  return f(*args, **kwds)
/home/jovyan/Documents/git/3D-convolutional-speaker-recognition/speacker-rec-py35/lib/python3.5/importlib/_bootstrap.py:222: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
  return f(*args, **kwds)
Enrollment data shape: (108, 80, 40, 1)
Enrollment label shape: (108,)
Evaluation data shape: (12, 80, 40, 1)
Evaluation label shape: (12,)
Traceback (most recent call last):
  File "./code/3-evaluation/evaluation.py", line 380, in <module>
    tf.app.run()
  File "/home/jovyan/Documents/git/3D-convolutional-speaker-recognition/speacker-rec-py35/lib/python3.5/site-packages/tensorflow/python/platform/app.py", line 125, in run
    _sys.exit(main(argv))
  File "./code/3-evaluation/evaluation.py", line 202, in main
    for i in xrange(FLAGS.num_clones):
NameError: name 'xrange' is not defined
Closing remaining open files:data/enrollment-evaluation_sample_dataset.hdf5...donedata/development_sample_dataset_speaker.hdf5...done
/home/jovyan/Documents/git/3D-convolutional-speaker-recognition/speacker-rec-py35/lib/python3.5/importlib/_bootstrap.py:222: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
  return f(*args, **kwds)
/home/jovyan/Documents/git/3D-convolutional-speaker-recognition/speacker-rec-py35/lib/python3.5/importlib/_bootstrap.py:222: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
  return f(*args, **kwds)
/home/jovyan/Documents/git/3D-convolutional-speaker-recognition/speacker-rec-py35/lib/python3.5/importlib/_bootstrap.py:222: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
  return f(*args, **kwds)
/home/jovyan/Documents/git/3D-convolutional-speaker-recognition/speacker-rec-py35/lib/python3.5/importlib/_bootstrap.py:222: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
  return f(*args, **kwds)
/home/jovyan/Documents/git/3D-convolutional-speaker-recognition/speacker-rec-py35/lib/python3.5/site-packages/sklearn/cross_validation.py:41: DeprecationWarning: This module was deprecated in version 0.18 in favor of the model_selection module into which all the refactored classes and functions are moved. Also note that the interface of the new CV iterators are different from that of this module. This module will be removed in 0.20.
  "This module will be removed in 0.20.", DeprecationWarning)
/home/jovyan/Documents/git/3D-convolutional-speaker-recognition/speacker-rec-py35/lib/python3.5/site-packages/sklearn/ensemble/weight_boosting.py:29: DeprecationWarning: numpy.core.umath_tests is an internal NumPy module and should not be imported. It will be removed in a future NumPy release.
  from numpy.core.umath_tests import inner1d
/home/jovyan/Documents/git/3D-convolutional-speaker-recognition/speacker-rec-py35/lib/python3.5/site-packages/sklearn/grid_search.py:42: DeprecationWarning: This module was deprecated in version 0.18 in favor of the model_selection module into which all the refactored classes and functions are moved. This module will be removed in 0.20.
  DeprecationWarning)
/home/jovyan/Documents/git/3D-convolutional-speaker-recognition/speacker-rec-py35/lib/python3.5/site-packages/sklearn/learning_curve.py:22: DeprecationWarning: This module was deprecated in version 0.18 in favor of the model_selection module into which all the functions are moved. This module will be removed in 0.20
  DeprecationWarning)
Traceback (most recent call last):
  File "./code/4-ROC_PR_curve/calculate_roc.py", line 23, in <module>
    score = np.load(os.path.join(FLAGS.evaluation_dir,'score_vector.npy'))
  File "/home/jovyan/Documents/git/3D-convolutional-speaker-recognition/speacker-rec-py35/lib/python3.5/site-packages/numpy/lib/npyio.py", line 384, in load
    fid = open(file, "rb")
FileNotFoundError: [Errno 2] No such file or directory: 'results/SCORES/score_vector.npy'
/home/jovyan/Documents/git/3D-convolutional-speaker-recognition/speacker-rec-py35/lib/python3.5/importlib/_bootstrap.py:222: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
  return f(*args, **kwds)
/home/jovyan/Documents/git/3D-convolutional-speaker-recognition/speacker-rec-py35/lib/python3.5/importlib/_bootstrap.py:222: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
  return f(*args, **kwds)
/home/jovyan/Documents/git/3D-convolutional-speaker-recognition/speacker-rec-py35/lib/python3.5/importlib/_bootstrap.py:222: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
  return f(*args, **kwds)
/home/jovyan/Documents/git/3D-convolutional-speaker-recognition/speacker-rec-py35/lib/python3.5/importlib/_bootstrap.py:222: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
  return f(*args, **kwds)
/home/jovyan/Documents/git/3D-convolutional-speaker-recognition/speacker-rec-py35/lib/python3.5/site-packages/sklearn/cross_validation.py:41: DeprecationWarning: This module was deprecated in version 0.18 in favor of the model_selection module into which all the refactored classes and functions are moved. Also note that the interface of the new CV iterators are different from that of this module. This module will be removed in 0.20.
  "This module will be removed in 0.20.", DeprecationWarning)
/home/jovyan/Documents/git/3D-convolutional-speaker-recognition/speacker-rec-py35/lib/python3.5/site-packages/sklearn/ensemble/weight_boosting.py:29: DeprecationWarning: numpy.core.umath_tests is an internal NumPy module and should not be imported. It will be removed in a future NumPy release.
  from numpy.core.umath_tests import inner1d
/home/jovyan/Documents/git/3D-convolutional-speaker-recognition/speacker-rec-py35/lib/python3.5/site-packages/sklearn/grid_search.py:42: DeprecationWarning: This module was deprecated in version 0.18 in favor of the model_selection module into which all the refactored classes and functions are moved. This module will be removed in 0.20.
  DeprecationWarning)
/home/jovyan/Documents/git/3D-convolutional-speaker-recognition/speacker-rec-py35/lib/python3.5/site-packages/sklearn/learning_curve.py:22: DeprecationWarning: This module was deprecated in version 0.18 in favor of the model_selection module into which all the functions are moved. This module will be removed in 0.20
  DeprecationWarning)
Traceback (most recent call last):
  File "./code/4-ROC_PR_curve/PlotROC.py", line 73, in <module>
    score = np.load(os.path.join(FLAGS.evaluation_dir,'score_vector.npy'))
  File "/home/jovyan/Documents/git/3D-convolutional-speaker-recognition/speacker-rec-py35/lib/python3.5/site-packages/numpy/lib/npyio.py", line 384, in load
    fid = open(file, "rb")
FileNotFoundError: [Errno 2] No such file or directory: 'results/SCORES/score_vector.npy'
/home/jovyan/Documents/git/3D-convolutional-speaker-recognition/speacker-rec-py35/lib/python3.5/importlib/_bootstrap.py:222: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
  return f(*args, **kwds)
/home/jovyan/Documents/git/3D-convolutional-speaker-recognition/speacker-rec-py35/lib/python3.5/importlib/_bootstrap.py:222: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
  return f(*args, **kwds)
/home/jovyan/Documents/git/3D-convolutional-speaker-recognition/speacker-rec-py35/lib/python3.5/importlib/_bootstrap.py:222: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
  return f(*args, **kwds)
/home/jovyan/Documents/git/3D-convolutional-speaker-recognition/speacker-rec-py35/lib/python3.5/importlib/_bootstrap.py:222: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
  return f(*args, **kwds)
/home/jovyan/Documents/git/3D-convolutional-speaker-recognition/speacker-rec-py35/lib/python3.5/site-packages/sklearn/cross_validation.py:41: DeprecationWarning: This module was deprecated in version 0.18 in favor of the model_selection module into which all the refactored classes and functions are moved. Also note that the interface of the new CV iterators are different from that of this module. This module will be removed in 0.20.
  "This module will be removed in 0.20.", DeprecationWarning)
/home/jovyan/Documents/git/3D-convolutional-speaker-recognition/speacker-rec-py35/lib/python3.5/site-packages/sklearn/ensemble/weight_boosting.py:29: DeprecationWarning: numpy.core.umath_tests is an internal NumPy module and should not be imported. It will be removed in a future NumPy release.
  from numpy.core.umath_tests import inner1d
/home/jovyan/Documents/git/3D-convolutional-speaker-recognition/speacker-rec-py35/lib/python3.5/site-packages/sklearn/grid_search.py:42: DeprecationWarning: This module was deprecated in version 0.18 in favor of the model_selection module into which all the refactored classes and functions are moved. This module will be removed in 0.20.
  DeprecationWarning)
/home/jovyan/Documents/git/3D-convolutional-speaker-recognition/speacker-rec-py35/lib/python3.5/site-packages/sklearn/learning_curve.py:22: DeprecationWarning: This module was deprecated in version 0.18 in favor of the model_selection module into which all the functions are moved. This module will be removed in 0.20
  DeprecationWarning)
Traceback (most recent call last):
  File "./code/4-ROC_PR_curve/PlotPR.py", line 58, in <module>
    score = np.load(os.path.join(FLAGS.evaluation_dir,'score_vector.npy'))
  File "/home/jovyan/Documents/git/3D-convolutional-speaker-recognition/speacker-rec-py35/lib/python3.5/site-packages/numpy/lib/npyio.py", line 384, in load
    fid = open(file, "rb")
FileNotFoundError: [Errno 2] No such file or directory: 'results/SCORES/score_vector.npy'
/home/jovyan/Documents/git/3D-convolutional-speaker-recognition/speacker-rec-py35/lib/python3.5/importlib/_bootstrap.py:222: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
  return f(*args, **kwds)
/home/jovyan/Documents/git/3D-convolutional-speaker-recognition/speacker-rec-py35/lib/python3.5/importlib/_bootstrap.py:222: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
  return f(*args, **kwds)
/home/jovyan/Documents/git/3D-convolutional-speaker-recognition/speacker-rec-py35/lib/python3.5/importlib/_bootstrap.py:222: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
  return f(*args, **kwds)
/home/jovyan/Documents/git/3D-convolutional-speaker-recognition/speacker-rec-py35/lib/python3.5/importlib/_bootstrap.py:222: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
  return f(*args, **kwds)
/home/jovyan/Documents/git/3D-convolutional-speaker-recognition/speacker-rec-py35/lib/python3.5/site-packages/sklearn/cross_validation.py:41: DeprecationWarning: This module was deprecated in version 0.18 in favor of the model_selection module into which all the refactored classes and functions are moved. Also note that the interface of the new CV iterators are different from that of this module. This module will be removed in 0.20.
  "This module will be removed in 0.20.", DeprecationWarning)
/home/jovyan/Documents/git/3D-convolutional-speaker-recognition/speacker-rec-py35/lib/python3.5/site-packages/sklearn/ensemble/weight_boosting.py:29: DeprecationWarning: numpy.core.umath_tests is an internal NumPy module and should not be imported. It will be removed in a future NumPy release.
  from numpy.core.umath_tests import inner1d
/home/jovyan/Documents/git/3D-convolutional-speaker-recognition/speacker-rec-py35/lib/python3.5/site-packages/sklearn/grid_search.py:42: DeprecationWarning: This module was deprecated in version 0.18 in favor of the model_selection module into which all the refactored classes and functions are moved. This module will be removed in 0.20.
  DeprecationWarning)
/home/jovyan/Documents/git/3D-convolutional-speaker-recognition/speacker-rec-py35/lib/python3.5/site-packages/sklearn/learning_curve.py:22: DeprecationWarning: This module was deprecated in version 0.18 in favor of the model_selection module into which all the functions are moved. This module will be removed in 0.20
  DeprecationWarning)
Traceback (most recent call last):
  File "./code/4-ROC_PR_curve/PlotHIST.py", line 53, in <module>
    score = np.load(os.path.join(FLAGS.evaluation_dir,'score_vector.npy'))
  File "/home/jovyan/Documents/git/3D-convolutional-speaker-recognition/speacker-rec-py35/lib/python3.5/site-packages/numpy/lib/npyio.py", line 384, in load
    fid = open(file, "rb")
FileNotFoundError: [Errno 2] No such file or directory: 'results/SCORES/score_vector.npy'

Extracting VAD for our own dataset

Hi @astorfi,
Thanks for your work its really helpful to me.
I would like to know what is the algorithm which you have used for VAD task.

Thank you.

Enrollment and Evaluation Dataset Problem

how the shape is obtained in utterance_enrollment in enrollment-evaluation_sample_dataset.hdf5 : no_of_sample x 80 x 40 x1

Retrain on new and own dataset

Hi,

First, that is a great job and ver well done :)
Now I am trying to use your source code and maybe contribute to it, I am working on a speaker recognition problem to detect if a teacher tutorial is recorded by his voic. I have about 10 hours of historical recordings for 6 teachers. First I used the speechpy to get 3d npy from wav files and used your create_development.py to create the hdf5 files for train and eval. Is that correct? Specially I got 13 instead of 40 regarding the features vector length in the npy files! I ran the run.bash file and it gave me also error saying something like that: ValueError: Negative dimension size caused by subtracting 2 from 1 for 'MaxPool_7' (op: 'MaxPool') with input shapes: [?,1,112,128].

Can you provide input pipeline?

Can you provide input pipeline? Thanks!

Regarding input data

Hi @astorfi
I have gone through your code. While extracting mfcc features for sample audio file it contains shape (420,40) here, 420 is number of frames and 40 is number of features.But In sample data of your code youre applying mfec feature file contains shape (3209,40,3). As per my understanding 3209 is Number of Frames,40 is Number of Features,3 Is number of Channels. I didn't understand the number of channels usage.can you please suggest how to create Feature_mfec.npy file in your format.

minibatch loss not change

I have use some utterances to test the code, and the program is running well, but the console prints that minibatch loss always during 7~9 and not decrease, accuracy=0, what's wrong about this? Thx!

Can i use this code with speech recognition?

Mean and standard deviation comes NAN

Hi Astorfi,

Your paper is awesome.
I am trying to train speech data using 3D CNN.
I have prepared data according to mention in paper. but during development phase I am getting mean and standard deviation "nan" in each epoch.
I am getting following output:

Epoch 1, Minibatch 1 of 3 , Minibatch Loss= 2.1972, TRAIN ACCURACY= 0.000
Epoch 1, Minibatch 2 of 3 , Minibatch Loss= 2.2215, TRAIN ACCURACY= 0.000
Epoch 1, Minibatch 3 of 3 , Minibatch Loss= 2.2637, TRAIN ACCURACY= 0.000
TESTING after finishing the training on: epoch 1
Test Accuracy 1, Mean= nan, std= nan

Can you please help me, why am I getting this problem?

Data pipeline example

I'm trying to figure out your pipeline including reading the paper with no luck so far.
Clearly base on the open and and closed issues I'm not the only one. It seems a lot of work has been done here and quality work too.
However this repository cries for a solid example from WAV file through feature extraction development enrollment and prediction.
I know that each case need to customize it's pipeline by itself but in my point of view the example, paper and documentation doesn't give enough infrastructure to continue on your own.

Again it really seems I'm not the only one. Can you please upload a pipeline example , refer me to one or at least upload a clear description from WAV file to prediction.

Nan output during training

Hi @astorfi
In the development step, i tried running train_softmax.py over VoxCeleb2 dataset and then i got nan output (logits variable in code). how to solve this?

validate accuracy on development dataset

Hi astorfi,
Thanks for your great job. These days I am running your code on my dataset but I found it the validate accuracy is low in my experiments. I have no idea if there is something wrong. What's the validate accuracy of your experiments when the network is converged?

.wav inputs specifics

Hi guys,
I have a question regarding the input wav files used for training.
What are the audio format specifications?
I used voxceleb ( http://www.robots.ox.ac.uk/~vgg/data/voxceleb/ ) as dataset, but it is giving me some troubles.
Do you know about any other usable dataset?

Thank you ;)

EER vs i-vector

Hi astorfi,
Thanks for such a great work. The pipeline is really great.
But I try ai-shell dataset the kaldi i-vector is around 2% eer. 3D-convolutional-speaker-recognitionwith LDA is 17% eer.
What's wrong? Any help will thank a lot!

Prepare data

Hi Astorfi,
I am trying train speaker model with your model,
How can I prepare my speech data (train_data, development_data, evaluation_data) for your model?
Thank you very much!

Do you mean "train_files_subjects_list.append(subject)" rather than "train_files_subjects_list.append(file_name.split('/')[7]"

3D-convolutional-speaker-recognition/code/0-input/create_hdf5/create_development.py

Line 132 in 61969eb

train_files_subjects_list.append(file_name.split('/')[7])

What does low-level and high-level features extraction mean?

Hello everyone,

In the context of 3D-CNN audio features extraction, what does low-level and high-level features extraction mean?

Thank you,

speaker identification input WAV file

Hi, I am using your function to recognize speaker identification.

I am new in machine learning, could you tell me, can I use this function to do the speaker recognition?

Now, I am able to run your function and out put some graphic, but I do not know how to use those graphic to recognize speaker.

And, I am trying to use my own WAV file to do the training.
However, I get some error:

...3D-convolutional-speaker-recognition-master\code\0-input\create_hdf5\pair_generation.py", line 42, in feed_to_hdf5
    :, 0]
IndexError: too many indices for array
Closing remaining open files:development.hdf5...done

Default file, the feature_mfec.npy file like this:

array([[[  1.41430335e+01,   1.38114970e+00,   8.35106419e-02],
        [  1.41430335e+01,   1.36457288e+00,   8.67885390e-03],
        [  1.39772653e+01,   1.11641647e+00,  -2.56141085e-02],

        ...,
        [  9.24432067e+00,   9.21401209e-01,   1.09648406e-01],
        [  9.19465798e+00,   9.45404618e-01,   1.09012088e-01],
        [  9.37358081e+00,   9.68176377e-01,   1.03772331e-01]]])

I change my WAV change to npy and content like this:

array([[ 143.,  143.],
       [ 136.,  136.],
       [ 121.,  121.],
       ...,
       [  72.,   72.],
       [  81.,   81.],
       [  90.,   90.]])

My transform function:

import numpy as np
import scipy.io.wavfile as wav

temp_npy =wav.read('...\\19-198-0000.wav')

print(temp_npy)

result = np.array(temp_npy[1],dtype=float)

np.save('test_wav_r_values.npy', result)

Do I need to change other type WAV file or I should change other transform function from WAV to npy file?

I also find the import scipy.io.wavfile as wav in the create_development.py, can I just input the WAV for training feed?

Thank your for you provide this function.

FileNotFoundError: [Errno 2] No such file or directory: 'results/SCORES\\score_vector.npy'

Hi Astorfi,
Your work is prefect! I've read your paper and it's actually great.
So im new in the field of tensorflow and all and im trying to learn.
Im having a problem when executing ./run.sh and this is the text error :

Traceback (most recent call last):
File "./code/4-ROC_PR_curve/PlotHIST.py", line 53, in
score = np.load(os.path.join(FLAGS.evaluation_dir,'score_vector.npy'))
File "C:\Users\Boulbaba Zitouni\AppData\Local\Programs\Python\Python36\lib\site-packages\numpy\lib\npyio.py", line 372, in load
fid = open(file, "rb")
FileNotFoundError: [Errno 2] No such file or directory: 'results/SCORES\score_vector.npy'

i just cloned the project and tried to run it thats it
i have already installed all the requierements (on the requierements.txt )

Any solutions ?

i cant built the code

slim.conv2d(inputs, 16, [3, 1, 5], stride=[1, 1, 1], scope='conv11')

throw exception:
The kernel_size argument must be a tuple of 2 integers. Received: [3, 1, 5]

explain how to take a single wav file and extract features

I've read your paper and it's really impresive.

Would like to ask you regarding the input preprocessing:

Assume I've got a wav file consisting 0.8 sec
fs, signal = wav.read(file_name)
Then I use mfec=speechpy.feature.mfe(signal,fs)
the size if mfec is [79,40] so I changed the input file to be 0.81sec
and then I received [80,40]...

according to your paper I need [20,80,40] to create one training example so I can create this by duplication my original [80,40] by 20 (this is how you did at testing phase) or by concatenating 20 different utterances of 0.81sec. Is that correct?

Any clarifications would be appreciated!

Alan

the feature of input data

Hi Astorfi,
First your work is prefect! I've read your paper and it's really great
In sample data of your previous code， you are applying mfec feature file(feature_mfec.npy) contains shape (3209,40,3). As per my understanding 3209 is Number of Frames,40 is Number of Features,3(The 3 is the number of channels which consists of static, first order and second order derivative features.) Is number of Channels. (using speechpy.feature.extract_derivative_feature(feature)) right?
but in the input_feature.py function (you provided these days),the feature file output contains shape (1, 20, 80, 40),80 is Number of Frames,40 is Number of Features,20 is number of utterances,1 represents a cube of one speaker,right?
so in the input_feature.py just use the first channel of the MFEC features of the audio?
Thanks!

loss = 0,train acc = 0

Hi astorfi,
I'm trying to use your code to train a model with 31 labels, 60 samples for each label. However, when i use train_softmax.py, last minibatches return loss = 0 while train acc = 0. Do you have any idea to fix it?

Thank you.

Before running

Run time error in the demo

When I ran the run.sh, the execution terminated saying:
FileNotFoundError: [Errno 2] No such file or directory: 'results/SCORES/score_vector.npy'

Where do i get this score file from? Do I need to create one? I just ran the run.sh for demo.

Can you please help?

Regards!

Error importing Tables

Hello, first of all thank you for releasing the code. Unfortunately, I'm stuck at step 1 (step 0 works.)
I installed all the requirements, this is my setup:
Win7 64
Python 3.5.4
Tensorflow 1.6 (installed in a separate Anaconda environment but still wit pip install)
Tables 3.4.3
I also installed pytables from conda, I thought it was missing, but the result is still the same:

When running train_softmax, at 'import tables', I get:
File "C:\Users...\AppData\Local\Continuum\Anaconda3\envs\tensorflow\lib\site-packages\tables_init_.py", line 90, in
from .utilsextension import (

ImportError: DLL load failed: The specified procedure could not be found.

The thing is, if i simpy import tables with no code preceding it, it's fine. If I import it after tensorflow (as in your code), it gives me the error. If I move 'import tables' before 'import tensorflow', then python crashes.

I tried to find answers on the net but none was useful...

Thanks

Data for Enrollment and Development

Should the data for Enrollment and Development be same?

What is the exact meaning of "utterances"?

The term utterances has not been defined anywhere in the paper. I am new to the field of speaker recognition.
Can someone tell me what utterances means in the context of this project?

Thanks in advance

Problem with evaluation.

Hi @astorfi ,thank for your great work, i also use all the same settings but use hdf5 to store training data instead of Audio Dataset. However, my evaluation result is low, EER is up to 40%. I think there is something wrong with my work. Do you have any idea to fix this?
I use VoxCeleb dataset for background model and only use 1 sample per speaker.
50 people for enrollment, 50 for un-enrollment (reject).
4 samples for evaluation.

Thank for your help.

Dataset for evaluation

Hi Astorfi,
I'm trying to train speaker recognition model with your model.
Since I'm a beginner at programming, I don't understand your code nicely.
For enrollment and evaluation phase, I just have to prepare the data (shape of (sample, 1, 80, 40))??
I read the paper and I don't know if I have to copy the data of single utterance to make the data (shape of (sample, 20, 80, 40)).

Also I prepare the data for development (shape of (97, 20, 80, 40)) using input_feature.py, but do I have to prepare the data (shape of (97, 80, 40, 20))??

Thank you very much.

Testing accuracy is not increasing

Hi,

I am getting a training accuracy of 95% on voxceleb but testing accuracy is around 10% only. What can be reason of this?
Speakers=1211
Batch size =100
epoch = 50
Data size = 24000

prediction difference between batch=1 and batch=16

Any ideas why I'm receiving different prediction values when running with batch_size=1,16?
find code below:
Thanks!

def predict(self,speech_input):
labels = np.empty(0, int)
labels = np.append(labels, range(speech_input.shape[0]), axis=0)
feature,logits,_ = self.session.run(
[self.features,self.logits,self.end_points_speech],
feed_dict={self.is_training: False, self.batch_dynamic: labels.shape[0],
self.margin_imp_tensor: 50,
self.batch_speech: speech_input})
#self.batch_labels: labels.reshape([labels.shape[0], 1])})

    # Extracting the associated numpy array.
    #print (feature[0])

    return  feature,logits

No such file or directory: 'results/SCORES/score_vector.npy

i follow all instruction given in video and i got this error

No such file or directory: 'results/SCORES/score_vector.npy

please tell me how to resolve it?

where is score_vector.npy

fid = open(os_fspath(file), "rb")
FileNotFoundError: [Errno 2] No such file or directory: 'results/SCORES/score_vector.npy'
Traceback (most recent call last):
File "./code/4-ROC_PR_curve/PlotROC.py", line 73, in

Testing error

C:\Users\user\AppData\Local\Programs\Python\Python36\lib\site-packages\h5py_init_.py:36: FutureWarning: Conversion of the second argument of issubdtype from float to np.floating is deprecated. In future, it will be treated as np.float64 == np.dtype(float).type.
from ._conv import register_converters as _register_converters
WARNING:tensorflow:From C:\Users\user\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow\contrib\learn\python\learn\datasets\base.py:198: retry (from tensorflow.contrib.learn.python.learn.datasets.base) is deprecated and will be removed in a future version.
Instructions for updating:
Use the retry module or similar alternatives.
C:\Users\user\AppData\Local\Programs\Python\Python36\lib\site-packages\sklearn\cross_validation.py:41: DeprecationWarning: This module was deprecated in version 0.18 in favor of the model_selection module into which all the refactored classes and functions are moved. Also note that the interface of the new CV iterators are different from that of this module. This module will be removed in 0.20.
"This module will be removed in 0.20.", DeprecationWarning)
C:\Users\user\AppData\Local\Programs\Python\Python36\lib\site-packages\sklearn\grid_search.py:42: DeprecationWarning: This module was deprecated in version 0.18 in favor of the model_selection module into which all the refactored classes and functions are moved. This module will be removed in 0.20.
DeprecationWarning)
C:\Users\user\AppData\Local\Programs\Python\Python36\lib\site-packages\sklearn\learning_curve.py:22: DeprecationWarning: This module was deprecated in version 0.18 in favor of the model_selection module into which all the functions are moved. This module will be removed in 0.20
DeprecationWarning)
Train data shape: (12, 80, 40, 20)
Train label shape: (12,)
Test data shape: (12, 80, 40, 20)
Test label shape: (12,)
WARNING:tensorflow:From ./code/1-development/train_softmax.py:423: softmax_cross_entropy_with_logits (from tensorflow.python.ops.nn_ops) is deprecated and will be removed in a future version.
Instructions for updating:

Future major versions of TensorFlow will allow gradients to flow
into the labels input on backprop by default.

See tf.nn.softmax_cross_entropy_with_logits_v2.

2018-06-08 08:58:18.940789: I T:\src\github\tensorflow\tensorflow\core\platform\cpu_feature_guard.cc:140] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2
Epoch 1, Minibatch 1 of 4 , Minibatch Loss= 1.3863, TRAIN ACCURACY= 0.000
Epoch 1, Minibatch 2 of 4 , Minibatch Loss= 1.2341, TRAIN ACCURACY= 100.000
Epoch 1, Minibatch 3 of 4 , Minibatch Loss= 0.0000, TRAIN ACCURACY= 0.000
Epoch 1, Minibatch 4 of 4 , Minibatch Loss= 1.0951, TRAIN ACCURACY= 100.000
TESTING after finishing the training on: epoch 1
Test Accuracy 1, Mean= 50.0000, std= 50.000
Closing remaining open files:data/development_sample_dataset_speaker.hdf5...done
C:\Users\user\AppData\Local\Programs\Python\Python36\lib\site-packages\h5py_init_.py:36: FutureWarning: Conversion of the second argument of issubdtype from float to np.floating is deprecated. In future, it will be treated as np.float64 == np.dtype(float).type.
from ._conv import register_converters as _register_converters
WARNING:tensorflow:From C:\Users\user\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow\contrib\learn\python\learn\datasets\base.py:198: retry (from tensorflow.contrib.learn.python.learn.datasets.base) is deprecated and will be removed in a future version.
Instructions for updating:
Use the retry module or similar alternatives.
Enrollment data shape: (108, 80, 40, 1)
Enrollment label shape: (108,)
Evaluation data shape: (12, 80, 40, 1)
Evaluation label shape: (12,)
Traceback (most recent call last):
File "./code/2-enrollment/enrollment.py", line 330, in
tf.app.run()
File "C:\Users\user\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow\python\platform\app.py", line 126, in run
sys.exit(main(argv))
File "./code/2-enrollment/enrollment.py", line 201, in main
for i in xrange(FLAGS.num_clones):
NameError: name 'xrange' is not defined
Closing remaining open files:data/development_sample_dataset_speaker.hdf5...donedata/enrollment-evaluation_sample_dataset.hdf5...done
C:\Users\user\AppData\Local\Programs\Python\Python36\lib\site-packages\h5py_init.py:36: FutureWarning: Conversion of the second argument of issubdtype from float to np.floating is deprecated. In future, it will be treated as np.float64 == np.dtype(float).type.
from ._conv import register_converters as _register_converters
WARNING:tensorflow:From C:\Users\user\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow\contrib\learn\python\learn\datasets\base.py:198: retry (from tensorflow.contrib.learn.python.learn.datasets.base) is deprecated and will be removed in a future version.
Instructions for updating:
Use the retry module or similar alternatives.
Enrollment data shape: (108, 80, 40, 1)
Enrollment label shape: (108,)
Evaluation data shape: (12, 80, 40, 1)
Evaluation label shape: (12,)
Traceback (most recent call last):
File "./code/3-evaluation/evaluation.py", line 380, in
tf.app.run()
File "C:\Users\user\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow\python\platform\app.py", line 126, in run
sys.exit(main(argv))
File "./code/3-evaluation/evaluation.py", line 202, in main
for i in xrange(FLAGS.num_clones):
NameError: name 'xrange' is not defined
Closing remaining open files:data/enrollment-evaluation_sample_dataset.hdf5...donedata/development_sample_dataset_speaker.hdf5...done
C:\Users\user\AppData\Local\Programs\Python\Python36\lib\site-packages\h5py_init.py:36: FutureWarning: Conversion of the second argument of issubdtype from float to np.floating is deprecated. In future, it will be treated as np.float64 == np.dtype(float).type.
from ._conv import register_converters as register_converters
C:\Users\user\AppData\Local\Programs\Python\Python36\lib\site-packages\sklearn\cross_validation.py:41: DeprecationWarning: This module was deprecated in version 0.18 in favor of the model_selection module into which all the refactored classes and functions are moved. Also note that the interface of the new CV iterators are different from that of this module. This module will be removed in 0.20.
"This module will be removed in 0.20.", DeprecationWarning)
C:\Users\user\AppData\Local\Programs\Python\Python36\lib\site-packages\sklearn\grid_search.py:42: DeprecationWarning: This module was deprecated in version 0.18 in favor of the model_selection module into which all the refactored classes and functions are moved. This module will be removed in 0.20.
DeprecationWarning)
C:\Users\user\AppData\Local\Programs\Python\Python36\lib\site-packages\sklearn\learning_curve.py:22: DeprecationWarning: This module was deprecated in version 0.18 in favor of the model_selection module into which all the functions are moved. This module will be removed in 0.20
DeprecationWarning)
Traceback (most recent call last):
File "./code/4-ROC_PR_curve/calculate_roc.py", line 23, in
score = np.load(os.path.join(FLAGS.evaluation_dir,'score_vector.npy'))
File "C:\Users\user\AppData\Local\Programs\Python\Python36\lib\site-packages\numpy\lib\npyio.py", line 372, in load
fid = open(file, "rb")
FileNotFoundError: [Errno 2] No such file or directory: 'results/SCORES\score_vector.npy'
C:\Users\user\AppData\Local\Programs\Python\Python36\lib\site-packages\h5py_init.py:36: FutureWarning: Conversion of the second argument of issubdtype from float to np.floating is deprecated. In future, it will be treated as np.float64 == np.dtype(float).type.
from ._conv import register_converters as register_converters
C:\Users\user\AppData\Local\Programs\Python\Python36\lib\site-packages\sklearn\cross_validation.py:41: DeprecationWarning: This module was deprecated in version 0.18 in favor of the model_selection module into which all the refactored classes and functions are moved. Also note that the interface of the new CV iterators are different from that of this module. This module will be removed in 0.20.
"This module will be removed in 0.20.", DeprecationWarning)
C:\Users\user\AppData\Local\Programs\Python\Python36\lib\site-packages\sklearn\grid_search.py:42: DeprecationWarning: This module was deprecated in version 0.18 in favor of the model_selection module into which all the refactored classes and functions are moved. This module will be removed in 0.20.
DeprecationWarning)
C:\Users\user\AppData\Local\Programs\Python\Python36\lib\site-packages\sklearn\learning_curve.py:22: DeprecationWarning: This module was deprecated in version 0.18 in favor of the model_selection module into which all the functions are moved. This module will be removed in 0.20
DeprecationWarning)
Traceback (most recent call last):
File "./code/4-ROC_PR_curve/PlotROC.py", line 73, in
score = np.load(os.path.join(FLAGS.evaluation_dir,'score_vector.npy'))
File "C:\Users\user\AppData\Local\Programs\Python\Python36\lib\site-packages\numpy\lib\npyio.py", line 372, in load
fid = open(file, "rb")
FileNotFoundError: [Errno 2] No such file or directory: 'results/SCORES\score_vector.npy'
C:\Users\user\AppData\Local\Programs\Python\Python36\lib\site-packages\h5py_init.py:36: FutureWarning: Conversion of the second argument of issubdtype from float to np.floating is deprecated. In future, it will be treated as np.float64 == np.dtype(float).type.
from ._conv import register_converters as register_converters
C:\Users\user\AppData\Local\Programs\Python\Python36\lib\site-packages\sklearn\cross_validation.py:41: DeprecationWarning: This module was deprecated in version 0.18 in favor of the model_selection module into which all the refactored classes and functions are moved. Also note that the interface of the new CV iterators are different from that of this module. This module will be removed in 0.20.
"This module will be removed in 0.20.", DeprecationWarning)
C:\Users\user\AppData\Local\Programs\Python\Python36\lib\site-packages\sklearn\grid_search.py:42: DeprecationWarning: This module was deprecated in version 0.18 in favor of the model_selection module into which all the refactored classes and functions are moved. This module will be removed in 0.20.
DeprecationWarning)
C:\Users\user\AppData\Local\Programs\Python\Python36\lib\site-packages\sklearn\learning_curve.py:22: DeprecationWarning: This module was deprecated in version 0.18 in favor of the model_selection module into which all the functions are moved. This module will be removed in 0.20
DeprecationWarning)
Traceback (most recent call last):
File "./code/4-ROC_PR_curve/PlotPR.py", line 58, in
score = np.load(os.path.join(FLAGS.evaluation_dir,'score_vector.npy'))
File "C:\Users\user\AppData\Local\Programs\Python\Python36\lib\site-packages\numpy\lib\npyio.py", line 372, in load
fid = open(file, "rb")
FileNotFoundError: [Errno 2] No such file or directory: 'results/SCORES\score_vector.npy'
C:\Users\user\AppData\Local\Programs\Python\Python36\lib\site-packages\h5py_init.py:36: FutureWarning: Conversion of the second argument of issubdtype from float to np.floating is deprecated. In future, it will be treated as np.float64 == np.dtype(float).type.
from ._conv import register_converters as _register_converters
C:\Users\user\AppData\Local\Programs\Python\Python36\lib\site-packages\sklearn\cross_validation.py:41: DeprecationWarning: This module was deprecated in version 0.18 in favor of the model_selection module into which all the refactored classes and functions are moved. Also note that the interface of the new CV iterators are different from that of this module. This module will be removed in 0.20.
"This module will be removed in 0.20.", DeprecationWarning)
C:\Users\user\AppData\Local\Programs\Python\Python36\lib\site-packages\sklearn\grid_search.py:42: DeprecationWarning: This module was deprecated in version 0.18 in favor of the model_selection module into which all the refactored classes and functions are moved. This module will be removed in 0.20.
DeprecationWarning)
C:\Users\user\AppData\Local\Programs\Python\Python36\lib\site-packages\sklearn\learning_curve.py:22: DeprecationWarning: This module was deprecated in version 0.18 in favor of the model_selection module into which all the functions are moved. This module will be removed in 0.20
DeprecationWarning)
Traceback (most recent call last):
File "./code/4-ROC_PR_curve/PlotHIST.py", line 53, in
score = np.load(os.path.join(FLAGS.evaluation_dir,'score_vector.npy'))
File "C:\Users\user\AppData\Local\Programs\Python\Python36\lib\site-packages\numpy\lib\npyio.py", line 372, in load
fid = open(file, "rb")
FileNotFoundError: [Errno 2] No such file or directory: 'results/SCORES\score_vector.npy'

About the data pipeline

Hi astorfi:
Can you show an example of how to prepare data for the enrollment stage?
I met some problem in this stage, I process the data as the generation of data for development,
but it doesn't work, the hdf5 format is somewhat annoying m can you show how to implement it just by
using some random data as an example? Thanks a lot

About voice/speaker verification task.

Hi astorfi,
Thanks for such a great work. I want to ask you some questions, because I am quite in DL but new to speech field so please forgive me if I ask any dump question :D

What do you mean by 'input as stacked utterances', utterance here means a sentence, a word or something else?
How large should the dataset be (the number of different speakers, the number of samples) for the model to work well on voice verification task?

shape mismatch problem in input_feature.py

Hello! We are trying to make our own input pipeline. However, when we follow the getitem method in Audioset (with the setting that cube_shape is (20,80,40)), there is a shape mismatch when the model tries to feed data for batch_speech (placeholder with the shape of (20,80,.40,1)).

After carefully review the code in train_softmax.py, we find that the input shape will conflict with the transpose operation in following code:

speech_train = np.transpose(speech_train[None, :, :, :, :], axes=(1, 4, 2, 3, 0))

What is the solution? Could you give us any help?

ValueError: axes don't match array

To make multiple models from multiple wav files, I added the following to the input_features.py in order to generate .hdf5 file for all wav files I have:

idx = 0
f = open('file_path_test1.txt','r')
for line in f:
idx = idx + 1

lab = []
feat = []
for i in range(idx):
    feature, label = dataset.__getitem__(i)
    lab.append(label)
    feat.append(feature)
    print(feature.shape) 
    print(label) 
######################
## creating hdf5 file ##
######################
h5file = tables.open_file('/root/3D_CNN/3D-convolutional-speaker-recognition/data/evaluation_test.hdf5', 'w')
label_test = h5file.create_carray(where = '/', name = 'label_enrollment', obj = lab, byteorder = 'little')
label_array = h5file.create_carray(where = '/', name = 'label_evaluation', obj = lab, byteorder = 'little')
utterance_test = h5file.create_earray(where = '/', name = 'utterance_enrollment', chunkshape = [1,20,80,40], obj = feat, byteorder = 'little')
utterance_train = h5file.create_earray(where = '/', name = 'utterance_evaluation', `chunkshape = [1,20,80,40]`, obj = feat, byteorder = 'little')
n5file.close()`

When I ran input_features.py, it gave me the following error:
ValueError: the shape ((0, 1, 20, 80, 40)) and chunkshape ((1, 20, 80, 40)) ranks must be equal.
I recognized that lab and feat are arrays and each one has 9 elements (# of wav files I want to test). Each element of the feat array has the features of each wav file in my wav list. So what I did is changing chunkshape values to be chunkshape = [9,1,20,80,40] and the evaluation_test.hdf5 file was created with no errors.

When I used hdf5 file that I created to run run.sh I got this:


Train data shape: (12, 80, 40, 20)
Train label shape: (12,)
Test data shape: (12, 80, 40, 20)
Test label shape: (12,)
Epoch 1, Minibatch 1 of 4 , Minibatch Loss= 0.0000, TRAIN ACCURACY= 100.000
Epoch 1, Minibatch 2 of 4 , Minibatch Loss= 1.2341, TRAIN ACCURACY= 100.000
Epoch 1, Minibatch 3 of 4 , Minibatch Loss= 1.4641, TRAIN ACCURACY= 0.000
Epoch 1, Minibatch 4 of 4 , Minibatch Loss= 1.4434, TRAIN ACCURACY= 0.000
TESTING after finishing the training on: epoch 1
Test Accuracy 1, Mean= 50.0000, std= 50.000
Closing remaining open files:data/development_sample_dataset_speaker.hdf5...done
Enrollment data shape: (9, 1, 20, 80, 40)
Enrollment label shape: (9,)
Evaluation data shape: (9, 1, 20, 80, 40)
Evaluation label shape: (9,)
INFO:tensorflow:Scale of 0 disables regularizer.
.
.
Traceback (most recent call last):
  File "./code/2-enrollment/enrollment.py", line 330, in <module>
    tf.app.run()
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/platform/app.py", line 44, in run
    _sys.exit(main(_sys.argv[:1] + flags_passthrough))
  File "./code/2-enrollment/enrollment.py", line 289, in main
    assert len(speaker_index) >= NumUtterance, "At least %d utterances is needed for each speaker" % NumUtterance
AssertionError: At least 20 utterances is needed for each speaker
Closing remaining open files:data/development_sample_dataset_speaker.hdf5...donedata/eval_try.hdf5...done
Enrollment data shape: (9, 1, 20, 80, 40)
Enrollment label shape: (9,)
Evaluation data shape: (9, 1, 20, 80, 40)
Evaluation label shape: (9,)
INFO:tensorflow:Scale of 0 disables regularizer.
.
.
Traceback (most recent call last):
  File "./code/3-evaluation/evaluation.py", line 380, in <module>
    tf.app.run()
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/platform/app.py", line 44, in run
    _sys.exit(main(_sys.argv[:1] + flags_passthrough))
  File "./code/3-evaluation/evaluation.py", line 329, in main
    speech_evaluation = np.transpose(speech_evaluation[None, :, :, :, :], axes=(1, 4, 2, 3, 0))
  File "/usr/local/lib/python2.7/dist-packages/numpy/core/fromnumeric.py", line 598, in transpose
    return _wrapfunc(a, 'transpose', axes)
  File "/usr/local/lib/python2.7/dist-packages/numpy/core/fromnumeric.py", line 51, in _wrapfunc
    return getattr(obj, method)(*args, **kwds)
ValueError: axes don't match array
Closing remaining open files:data/development_sample_dataset_speaker.hdf5...donedata/eval_try.hdf5...done
('EER=', 43.75, 0.0)
('AUC=', 48.4375, 0.0)
('EER = ', 0.44)
('AUC = ', 0.48)
('AP = ', 0.33)

I'm not sure how to fix this: ValueError: axes don't match array

astorfi / 3d-convolutional-speaker-recognition Goto Github PK

3d-convolutional-speaker-recognition's People

Stargazers

Watchers

Forkers

3d-convolutional-speaker-recognition's Issues

Recommend Projects

Recommend Topics

Recommend Org

Jobs