ppwwyyxx / speaker-recognition Goto Github PK

View Code? Open in Web Editor NEW

674.0 65.0 275.0 24.62 MB

A Speaker Recognition System

License: Apache License 2.0

Python 25.82% Shell 0.13% Makefile 4.31% C++ 63.52% MATLAB 5.38% M 0.05% Dockerfile 0.80%

speaker-recognition's Introduction

About

This is a Speaker Recognition system with GUI.

For more details of this project, please see:

Our presentation slides
Our complete report

Dependencies

The Dockerfile can be used to get started with the project easier.

Linux, Python 2

scikit-learn, scikits.talkbox, pyssp, PyAudio:

pip install --user scikit-learn scikits.talkbox pyssp PyAudio

PyQt4, usually can be installed by your package manager.

(Optional)Python bindings for bob:

install blitz, openblas, boost, then:

 for p in bob.extension bob.blitz bob.core bob.sp bob.ap; do
 	pip install --user $p
 done

Note: We have a MFCC implementation on our own which will be used as a fallback when bob is unavailable. But it's not so efficient as the C implementation in bob.

Algorithms Used

Voice Activity Detection(VAD):

Long-Term Spectral Divergence (LTSD)

Feature:

Mel-Frequency Cepstral Coefficient (MFCC)
Linear Predictive Coding (LPC)

Model:

Gaussian Mixture Model (GMM)
Universal Background Model (UBM)
Continuous Restricted Boltzman Machine (CRBM)
Joint Factor Analysis (JFA)

GUI Demo

Our GUI has basic functionality for recording, enrollment, training and testing, plus a visualization of real-time speaker recognition:

You can See our demo video (in Chinese). Note that real-time speaker recognition is extremely hard, because we only use corpus of about 1 second length to identify the speaker. Therefore the system doesn't work very perfect.

The GUI part is quite hacky for demo purpose and is not maintained anymore today. Take it as a reference, but don't expect it to work out of the box. Use command line tools to try the algorithms instead.

Command Line Tools

usage: speaker-recognition.py [-h] -t TASK -i INPUT -m MODEL

Speaker Recognition Command Line Tool

optional arguments:
  -h, --help            show this help message and exit
  -t TASK, --task TASK  Task to do. Either "enroll" or "predict"
  -i INPUT, --input INPUT
                        Input Files(to predict) or Directories(to enroll)
  -m MODEL, --model MODEL
                        Model file to save(in enroll) or use(in predict)

Wav files in each input directory will be labeled as the basename of the directory.
Note that wildcard inputs should be *quoted*, and they will be sent to glob module.

Examples:
    Train:
    ./speaker-recognition.py -t enroll -i "./bob/ ./mary/ ./person*" -m model.out

    Predict:
    ./speaker-recognition.py -t predict -i "./*.wav" -m model.out

speaker-recognition's People

Contributors

Stargazers

Watchers

Forkers

kitekiting zxytim rrodrigues70 menthas stevenlol zhangaustin xunchangqing denghw qboticslabspvtltd ck8275411 stjjhi felixlou newtonmwai hieik xuanhan863 itmgr samim23 wavelets wangyx0055 aqwertaqwert hariag liangkai mogito89 zona284 liangnet keeganren moonsl agarwalnaimish zjucsxxd caomw sayiho furkan7 diskang hihiy bullud clever-scientist haojian manvig dagiopia chagge vlinhd11 leetz lkathke orsonwang ghaithoo chenxiao60 ekapujiw2002 xaoo99 ubuntuevangelist lvaleriu matteofu francoismartin ayham-hassan ianboyanzhang mayanksuman moushuai nkhuyu johndpope zmgmgm hdubey patengelbert btyouth runngezhang baaslaawe joedaniels29 crouchred0117 wikijm minganlin ml-lab lezabour kckaiwei lijian8 pdaicode mriveralee bluetronics-india punit-kulal wakusei-meron- 9crk qacollective hudsantos jiehui-eason aj-av somtts cibuildorg akiraaisha zshanwei varunjuneja levanhong05 toosyou xiaoleitw narendoraiswamy briandbl mkingupta floix waiiwaii zhencang alfredocdmiranda privet56 coderx7 kjeanclaude

speaker-recognition's Issues

errors in prediction

Trained the software on different voices.
Each voice sample used for training is length at least one minute long.
For each voice a separate .out model file is created.

Now I matched new sample files each of length 5 seconds and the predictions were very random. Sometimes they were accurate and other times they showed wrong results. Even testing voices that weren't in the database showed a close match. Can you tell me what I am doing wrong ? And whats the correct way to feed background noise and how much does it matter ?

Note: I edited the /testbench/gmmset.py file to get scores and the scores came around -5 ~ -10 for wrong matches.

Can this run on embedded linux system

Hello,
Did you try it on embedded linux system ?

Can I use this in iOS?

Hi @ppwwyyxx

Can I use this for mobile platform? I want to do same like this in iOS so I am searching but not get anything in iOS. I saw your demo video its working file. So I want to implement this in iOS. Can you guide me how can I do ? Or how can I run this on mac?

-Ekta

ImportError

Hello,
Please i am trying to run the GUI to test your program and when i type "\speaker-recognition-master\src\gui>python gui.py"

i get the following:

Traceback (most recent call last): File "gui.py", line 23, in <module> from interface import ModelInterface File "C:\Important Documents GP\speaker-recognition-master\src\gui\interface.py", line 16, in <module> from feature import mix_feature ImportError: No module named feature

any help ?

Thank You

installation

Do you have installation instruction?
Moreover, how can you enroll before a UBM? Who trains the UBM? was it pre-trained? Or do you assume close-set and use all speaker models to create a UBM?

using cRBM?

Is the continuous RBM you have mentioned used in the UI Demo?
I ask this because it seems that the UI demo works better on my dataset than the provided python script (speaker_recognition.py)

why can't I use fast-gmm?

Hi,Yuxin Wu,
Thank you for your excellent project.
But I meet a problem.I hope I can get some hlep!
When I was training my database,the system warned me that "Warning: failed to import fast-gmm,use gmm from scikit-learn instead".So I try to force to use fast-mm,error alerts appear as follow,
"

Traceback (most recent call last):
  File "./speaker-recognition.py", line 17, in <module>
    from gui.interface import ModelInterface
  File "/home/taowzzz/Downloads/speaker-recognition-master/src/gui/interface.py", line 20, in <module>
    from gmmset import GMMSetPyGMM as GMMSet
  File "/home/taowzzz/Downloads/speaker-recognition-master/src/gui/gmmset.py", line 14, in <module>
    from gmm.python.pygmm import GMM
  File "/home/taowzzz/Downloads/speaker-recognition-master/src/gui/gmm/python/pygmm.py", line 16, in <module>
    pygmm = cdll.LoadLibrary(path.join(dirname, '../lib/pygmm.so'))
  File "/usr/lib/python2.7/ctypes/__init__.py", line 443, in LoadLibrary
    return self._dlltype(name)
  File "/usr/lib/python2.7/ctypes/__init__.py", line 365, in __init__
    self._handle = _dlopen(self._name, mode)
OSError: /home/taowzzz/Downloads/speaker-recognition-master/src/gui/gmm/python/../lib/pygmm.so: cannot open shared object file: No such file or directory

"
Could you please give me some advice?
Thank you!

NoiseFilter Not Initialized

I'm trying to figure out how to use the GUI to see how this works. I want to train a user with multiple wave files.

Reproducible steps:

Command Run (in Conda environment): python gui.py
Go To Enrollment, fill in User Info, choose training wav file for Voice Enrollment
Press "Train!"
Press "Enroll!" and it gives this error:

No module named ap
Warning: failed to import Bob, will use a slower version of MFCC instead.
Warning: failed to import fast-gmm, use gmm from scikit-learn instead
libpng warning: iCCP: known incorrect sRGB profile
avatar/doubao.jpg doubao
avatar/ltz.jpg ltz
avatar/wyx.jpg wyx
avatar/zxy.jpg zxy
Warning: failed to import gmmset. You may forget to compile gmm:
dlopen(/Users/conzor/Downloads/speaker-recognition-master/src/gui/gmm/python/../lib/pygmm.so, 6): image not found
Try running `make -C src/gmm` to compile gmm module.
But gmm from sklearn will work as well! Using it now!
Start training...
1.59740447998e-05  seconds
Traceback (most recent call last):
  File "gui.py", line 300, in do_enroll
    new_signal = self.backend.filter(*self.enrollWav)
  File "/Users/conzor/Downloads/speaker-recognition-master/src/gui/interface.py", line 47, in filter
    ret, intervals = self.vad.filter(fs, signal)
  File "/Users/conzor/Downloads/speaker-recognition-master/src/gui/filters/VAD.py", line 29, in filter
    raise Exception("NoiseFilter Not Initialized")
Exception: NoiseFilter Not Initialized

Cool project by the way!

Thanks in advance,
Connor

Implement speaker-recognition to alexylem/jarvis project

Hi everyone,

I'd like to know if your team could help us to integrate your solution to Jarvis, which is a lightweight configurable multi-lang jarvis-like bot, meant for home automation running on slow computer.

More information here: https://github.com/alexylem/jarvis
Even if the creator is French, the project is most of the time carried in English.

We would like to implement your solution to adapt commands regarding to the requester.
Ex: Read a personal music playlist
=> If John say: "Play some music", Jarvis answer "OK, let's play some rock!"
=> If Jane say: "Play some music", Jarvis answer "OK, let's play some pop."

Thanks in advance.

Training UBM issue

What does function 'get_all_data_fpaths()' in file speaker-recognition/tree/master/src/testbench/train-ubm.py is expected to read from 'test-data/mfcc-lpc-data/' (even directory doesn't exist)?
Is it some '.mfcc-lpc' file? If so, what does it contains?
Can't train UBM on my machine due to this issue.

Automating adding of new speakers

Can it be possible to add a 10-15 min audio clip of 4-5 people conversing and it recognizes different voices in the audio clip without a sample voice clip of each person ?
Figuring out the voices without a separate sample of each voice in the 10-15 min audio clip.

LPC contains NaN or Infinity

Hello，
I've test the tool on my simple data set. When the gmm from sklearn is used, it raised the error:

Warning: failed to import fast-gmm, use gmm from scikit-learn instead
Label person_xcq has files data/person_xcq/2.wav,data/person_xcq/1.wav
Label person_wzw has files data/person_wzw/wzw3.wav,data/person_wzw/wzw2.wav,data/person_wzw/wzw1.wav
Label person_ffy has files data/person_ffy/ffy3.wav,data/person_ffy/ffy1.wav,data/person_ffy/ffy2.wav
Label person_lj has files data/person_lj/2.wav,data/person_lj/1.wav,data/person_lj/3.wav
Label person_czy has files data/person_czy/czy2.wav,data/person_czy/czy3.wav,data/person_czy/czy1.wav
Start training...
Traceback (most recent call last):
File "./src/speaker-recognition.py", line 89, in
task_enroll(args.input, args.model)
File "./src/speaker-recognition.py", line 73, in task_enroll
m.train()
File "/home/xcq/work/speaker-recognition/src/gui/interface.py", line 72, in train
self.gmmset.fit_new(feats, name)
File "/home/xcq/work/speaker-recognition/src/gui/skgmm.py", line 21, in fit_new
gmm.fit(x)
File "/usr/lib/python2.7/dist-packages/sklearn/mixture/gmm.py", line 437, in fit
random_state=self.random_state).fit(X).cluster_centers_
File "/usr/lib/python2.7/dist-packages/sklearn/cluster/k_means_.py", line 702, in fit
X = self.check_fit_data(X)
File "/usr/lib/python2.7/dist-packages/sklearn/cluster/k_means.py", line 668, in _check_fit_data
X = atleast2d_or_csr(X, dtype=np.float64)
File "/usr/lib/python2.7/dist-packages/sklearn/utils/validation.py", line 134, in atleast2d_or_csr
"tocsr", force_all_finite)
File "/usr/lib/python2.7/dist-packages/sklearn/utils/validation.py", line 111, in _atleast2d_or_sparse
force_all_finite=force_all_finite)
File "/usr/lib/python2.7/dist-packages/sklearn/utils/validation.py", line 93, in array2d
_assert_all_finite(X_2d)
File "/usr/lib/python2.7/dist-packages/sklearn/utils/validation.py", line 27, in _assert_all_finite
raise ValueError("Array contains NaN or infinity.")
ValueError: Array contains NaN or infinity.

If I do make -C src/gmm, no error will be reported, but the result models contain NaN.

32
0.03125 0.03125 0.03125 0.03125 0.03125 0.03125 0.03125 0.03125 0.03125 0.03125 0.03125 0.03125 0.03125 0.03125 0.03125 0.03125 0.03125 0.03125 0.03125 0.03125 0.03125 0.03125 0.03125 0.03125 0.03125 0.03125 0.03125 0.03125 0.03125 0.03125 0.03125 0.03125
28 1
-nan -nan -nan -nan -nan -nan -nan -nan -nan -nan -nan -nan -nan -nan -nan -nan -nan -nan -nan -nan -nan -nan -nan -nan -nan -nan -nan -nan
-nan -nan -nan -nan -nan -nan -nan -nan -nan -nan -nan -nan -nan -nan -nan -nan -nan -nan -nan -nan -nan -nan -nan -nan -nan -nan -nan -nan

Finnally, I found that the LPC feature contains NaN. I modify the mix_feature in src/feature/init.py to use the mfcc only, then it works.
I have upload the used simple dataset to my forked repos.

Thanks.

Failed to import Bob, use a slower version of mfcc instead

Hello, I've been trying to use your application but everytime I try to run gui.py or even speaker-recognition.py I get this warning message : "Failed to import Bob, use a slower version of mfcc instead". The thing is bob and his packages are installes and I have no idea how to use a slower version of mfcc.
Can I get some help please ?

is the filter very sensitatve to background noise?

Yuxin, thanks for the great work. I'm using your package and has been worked out the online conversation mode based on you gui version. The only problem I have is the filter seems doesn't work well.
If I turn off the filter, it runs normally but it can not recognize out when nobody is speaking. It works good though when I keep talking. But If I turn on the filtering, it will filter out all the recorded content and regarded it is nobody even when I keep talking. So I wondering if the filter is too sensitivate to the background noise?
Or if there is any trick to make it work good?

高斯模型训练

您好，非常感谢分享，受益不少。
在我使用的过程中遇到一点问题，请问，如果注册时的数据量偏少的话，是不是学习到的高斯混合模型效果不会很好呢？如果想要模型学习得更好的话，能否有什么别的办法？因为不可能将一个人每个状况下说的话都记录下来进行训练嘛～

Predicting wrong speaker many times

Hi Yuxin Wu,

I've cloned your nice solution for some testing, but after training, it is not predicting as expected.
I am not using the gui. Only using ./speaker-recognition.py
In fact, it got trained well and is responding to all prediction attempts, but pointing to a person in which is not the one who is speaking.
Follows an explanation on how I am handling it:

I've created a training/ directory with the following content:

training/
├── fegens
│   ├── Fegens2.wav
│   └── Fegens.wav
├── hudson
│   ├── hudson2.wav
│   ├── hudson3.wav
│   └── Hudson.wav
├── jan
│   └── Jan.wav
├── paulo
│   └── Paulo.wav
└── pedreau
├── Pedreau2.wav
├── pedreau3.wav
└── Pedreau.wav
5 directories, 10 files

..and ran enroll task like you've documented:

$ ./speaker-recognition.py -t enroll -i "./training/*" -m model.out

Then I've tried predictions this way:

$ ./speaker-recognition.py -t predict -i "already_trained_file.wav" -m model.out

If, and only if, predicting input file a) is exactly the same or b) isn't the same file but is a re-recording of it with exactly the same content, and same duration or near, then obviously it's predicting very well! Even if I predict those same audios simulating with a smartphone, recording them again, and predicting, like if some of those peole were talking to microfone.. then software can say who is talking!! It's amazing!! Very nice!

But if I play only five or seven seconds of a random voice, even with a voice software should know, it is predicting with a error rate too high. Seems random.
Here is the duration of my voice files:

Duration 5.120 seconds
Duration 3.560 seconds
Duration 40.320 seconds
Duration 28.860 seconds
Duration 20.480 seconds
Duration 13.200 seconds
Duration 14.160 seconds
Duration 19.320 seconds
Duration 15.880 seconds
Duration 23.360 seconds

Note: I am working at home, with good silence conditions. No significant SNR affecting the audios.

And I've already tried to multiply those same voice samples, with different filenames but under each person directory (just for testing, cause I think for machine learning is means not so much difference), remaining 5 directories, and 70 files..and training it again to a new model_2, predicting using this new model_2 and even so error rate too high, like randomizing.

I would like to let you know it is a very nice software! Very nice documentation PDFs. Congratulations!
And if you can, it would be nice if you can give us some help on what do you think is going on... Am I using right methods to get it trained? Is the directory structure fine? Does quantities I've used are fine? Do you have any other suggestions?

Thank you very much! Greetings from Brazil.

The program does not run

It gives the following error:

`No module named bob
Warning: failed to import Bob, will use a slower version of MFCC instead.
Traceback (most recent call last):
File "C:/Users/User/Desktop/speaker-recognition-master/speaker-recognition-master/src/speaker-recognition.py", line 17, in
from gui.interface import ModelInterface
File "C:\Users\User\Desktop\speaker-recognition-master\speaker-recognition-master\src\gui\interface.py", line 17, in
from filters.VAD import VAD
File "C:\Users\User\Desktop\speaker-recognition-master\speaker-recognition-master\src\filters\VAD.py", line 7, in
from noisered import NoiseReduction
File "C:\Users\User\Desktop\speaker-recognition-master\speaker-recognition-master\src\filters\noisered.py", line 13, in
from utils import monophonic
File "C:\Users\User\Desktop\speaker-recognition-master\speaker-recognition-master\src\filters\utils.py", line 1
../gui/utils.py
^
SyntaxError: invalid syntax

Process finished with exit code 1`

Adding dynamic recognition of speakers - possible? Practical?

Hi team!
Your project is great because it's fast (real-time!) and the GMMs seem quite flexible. For example, from my reading of the source code, it seems possible to run enrolment and prediction on a GMM as audio comes in piece by piece. So I hope those assumptions are correct to start with!

I have an idea that I'd like to get your opinion on whether you think it will be possible and practical.

I want to diarize speakers in real-time and then train GMMs to recognise a voice, and then use that GMM to recognise that voice in future.

My idea is:

Train ~ 30 GMMs (?!?) (15 male, 15 female) on selected different voices - they are marked as 'generic GMMs'.
Run audio though VAD then GMMs purely to detect speaker change in, say, 5 second blocks of audio stream. Train a new GMM on current voice until a speaker change is detected. Add the newly trained GMM to the GMM set (30+1), marked as a 'non-generic GMM'.
For this step, you would need to spend time to define things like:
a) Minimum probability for same speaker threshold
b) Minimum speaking time to add new speaker threshold
c) ... probably more

I'm guessing that the probabilities for A - C may also change depending on the number of speakers.

Repeat #2 until end of audio stream.
Dump to disk all GMMs marked as 'non-generic'. Now you have a set of GMMs which have a good chance of recognising the speakers in your audio file and you can:

Extract metadata from the file (number of speakers, when and for how long each person spoke etc)
Assign person labels to the trained GMMs later (fine for my purposes)

So of course all of this rests on the assumption that the probabilities coming from the pre-trained GMMs set will change significantly whenever a speaker change happens. But this seems like a reasonable assumption?

I'm struggling to find a flaw in my idea. Welcome anyone to add some thinking before I start to code ... !

Andrew

Only one speaker is predicted.

Hi, Yuxin Wu.
Thank you for publishing a cool library.

Unfortunately, I faced a problem while testing to use it.
The test environment is as follows.
Four voices were added to the learning set, and each voice consists of two to four files.
and Voices are registered through the command,
"speaker-recognition.py -t enroll -i "./voice/*" -m model.out"
enrollment was successfully completed.

Then, when prediction was performed with unused file from the test process,
the following results were obtained.

./voice/test/hoseok/hoseok.wav -> seonyoung [failed]
./voice/test/christi/christi.wav -> seonyoung [failed]
./voice/test/seongjun/seongjun.wav -> seonyoung [failed]
./voice/test/seonyoung/seonyoung.wav -> seonyoung
./voice/test/ziye/ziye.wav -> seonyoung [failed]

The results of all test sets appear to be the same.
One suspicious point is that seonyoung's learning data was the longest.
Do you have any idea what is the problem?

cannot run noisered.py

Hi ,
i am new with python. I tried to run filters/noisered.py in command line, but got error:
File "", line 1, in
File "utils.py", line 1
../gui/utils.py
^
SyntaxError: invalid syntax

I think I can use sys.path.insert(0, "../gui/") to resolve this problem. But i wonder why the original way of setting dos not work.
In this directory, the file utils.py just have one line: ../gui/utils.py, how does this work?
I am using python 2.7.6

Thanks,
Vic

What's the problem?

Sorry to bother but I am doing my dissertation and need to take your code as an example.
Can you tell me what's wrong with this? I just enrolled myself and do conversation with myself, the problem occurred. I am so grateful if you can help me out!
#################################
Traceback (most recent call last):
File "gui.py", line 188, in do_conversation
signal = self.backend.filter(Main.FS, signal)
File "/home/bruce/speaker-recognition-master/src/gui/interface.py", line 47, in filter
ret, intervals = self.vad.filter(fs, signal)
File "/home/bruce/speaker-recognition-master/src/gui/filters/VAD.py", line 33, in filter
filtered, intervals = self.ltsd.filter(signal)
File "/home/bruce/speaker-recognition-master/src/gui/filters/ltsd.py", line 54, in filter
res, ltsds = self._get_ltsd().compute_with_noise(signal, self.noise_signal)
File "/home/bruce/.local/lib/python2.7/site-packages/pyssp/vad/ltsd.py", line 70, in compute_with_noise
return self._compute(signal)
File "/home/bruce/.local/lib/python2.7/site-packages/pyssp/vad/ltsd.py", line 73, in _compute
ltsds = sp.zeros(self._windownum)
ValueError: negative dimensions are not allowed

negative dimensions are not allowed
###################################

run on smartphone?

Have you ever seen this implementation built for and running on a smartphone?
Would it be possible to build and embed the software stack in an (android|ios) app?

Not working when UBM is used

After UBM is loaded, enroll task gives following error:
python2: src/gmm.cc:177: real_t Gaussian::probability_of_fast_exp(std::vector&, double*): Assertion `(int)x.size() == dim' failed.
Aborted (core dumped)

gmm_order=32
UBM_MODEL_FILE = 'gui/model/ubm.mixture-32.utt-300.model'

I am able to use this program in terminal but not with the GUI

I can't load the 'model.out' that i saved using the speaker-recognition.py into the GUI.I used the GUI like this
I recorded or loaded the file, trained on it, and tried enrolling and then dumbing the trained model so that i can use it again next time.CanI do that train on multiple users and then use that model to make real time predictions using the gui.The enrolling always fails and the training is so fast I have trained on a 2min wav file using the speaker-recognition.py and it was able to recognize the voice on a wav file which was not in the training set but I am not able to do this on GUI>
Thank you.

Segmentation Fault : 11, while running enroll on OSX

Running the following command python speaker-recognition.py -t enroll -i "./ ./sanjay" -m model.out gives the following output.

Start training...
nr_instance   :   94
nr_dim        :   28
nr_mixture    :   32
min_covar     :   0.001000
threshold     :   0.010000
nr_iteration  :   200
init_with_kmeans: 0
concurrency   :   4
verbosity     :   0
Segmentation fault: 11

The program closes saying Segmentation fault : 11. Can you please help me with it?

Accuracy, number of enrolled speakers and length of speech samples

Hello!

Firstly, congratulations on a fantastic project. It is the ONLY project on the Internet that is practical, easy to understand and able to operate in real time. Awesome stuff! 👍 I hope my Dockerfile helps. I'm happy to answer any issues you have about that. Soon, I'll put the image binary on the docker hub so anyone can just download a pre-built version, if that's okay by you.

I would like to be able to recognize hundreds, possibly thousands of speakers, so I read section 5.6 'Accuracy Curve on Different Number of Speakers' of your complete report with high interest. For each of my speakers, I will have 60 to 600+ seconds of speech for enroll data.

Could you please give me some advice on a few questions? Sorry if they seem silly, I'm very new to this.

With enroll data, I understand that generally, 'more is better' per speaker. However, how many seconds do you think becomes too much (no longer having benefit)?
I note that you tested with up to 80 speakers and up to 5 seconds of enroll data (did I read your graph on p.18 correctly ... 5s = 5seconds?). If I were to enroll and predict on a far larger set of speakers, like 600, am I right to expect that the accuracy in the downward trend on p.19 would continue?
What is the relationship between volume of enroll data per speaker and number of speakers? e.g. if my accuracy got too low with 200 speakers, could I fix that by adding enroll data for those 200 speakers?

In these 3 questions, I am simply trying to understand the best solution for my problem - so if you have any general advice outside these questions - that would be much appreciated!

Thank you in advance! I hope to help contribute to your project if I ever can!
Andrew (安德鲁)

Unable to use the dockerized version nor a local version

root@e0f1565c4cc3:~/speaker-recognition/src# python speaker-recognition.py -t enroll -i "in/" -m m.out
Label in has files in/noise.wav,in/in_dat.wav
Traceback (most recent call last):
  File "speaker-recognition.py", line 89, in <module>
    task_enroll(args.input, args.model)
  File "speaker-recognition.py", line 73, in task_enroll
    m.train()
  File "/root/speaker-recognition/src/gui/interface.py", line 78, in train
    self.gmmset = self._get_gmm_set()
  File "/root/speaker-recognition/src/gui/interface.py", line 64, in _get_gmm_set
    if os.path.isfile(self.UBM_MODEL_FILE):
  File "/usr/lib/python2.7/genericpath.py", line 37, in isfile
    st = os.stat(path)
TypeError: coercing to Unicode: need string or buffer, NoneType found

I followed the instructions in your readme and tried to enroll a person using a mono audio wav file recorded in Audacity but it keeps throwing this error.

Dimension of ubm models does not match dimensions of enrolled models

Hi,

I am trying to use the saved ubm models, but am having troubles since the saved ubm models have a dimension of 34 whereras the features extracted at enroll time have a dimension of 28. Hence, training from ubm fails.

Getting Error

Whenever I try to train it with my own voice; it sends me an error and aborted the GUI.

python: src/gmm.cc:177: real_t Gaussian::probability_of_fast_exp(std::vector&, double*): Assertion (int)x.size() == dim' failed. python: src/gmm.cc:177: real_t Gaussian::probability_of_fast_exp(std::vector<double>&, double*): Assertion (int)x.size() == dim' failed.
python: src/gmm.cc:177: real_t Gaussian::probability_of_fast_exp(std::vector&, double*): Assertion `(int)x.size() == dim' failed.
Aborted (core dumped)

Please suggest !!

VAD not used in the code?

Even though Voice Activity Detection is present in the repository, I don't see the "interface.py" call the given VAD procedure helper methods (the init_noise() method and the filter() method, respectively) on input data. Seems like the GMM's are trained straight on generated features. Why is it this way?

If I am wrong, can you point me to the location where VAD is being done on enrolled speech data?
Thanks

UBM training guidelines

Can you explain to me how to train the UBM on my dataset?
I know we can use testbench/train-ubm.py to train the ubm model.

I want to know:

what's the use of adapt-ubm.py?
how to setup the directory structure of the data set for training.
why does read_raw_data() in src/testbench/datautil.py uses numpy.loadtxt?

Conversation mode from command line?

I can train the model at the command line and have it predict successfully too. But I can't get the UI to work for whatever reason. I don't need the UI but I do want conversation mode. Can you outline how this might be done? I would be happy to help with a pull request but need some guidance. Thanks!

Multiple Models

Hi, i would like to train multiple models from the UBM, however the UBM means are dumped into a file and gets overwritten everytime a new model adapts, as a result it seems that all data is lost and only the last train model is availiable in gmm-training-intermediate-dump.model.
could you please explain how can i use multiple models ( for multiple speakers) adapted from a UBM?

ValueError: need at least one array to concatenate

I'm seeing the following crash when trying to enroll:

Traceback (most recent call last):
  File "./speaker-recognition.py", line 89, in <module>
    task_enroll(args.input, args.model)
  File "./speaker-recognition.py", line 71, in task_enroll
    m.enroll(label, fs, signal)
  File "/Users/richardpenner/Downloads/speaker-recognition-master/src/gui/interface.py", line 60, in enroll
    feat = mix_feature((fs, signal))
  File "/Users/richardpenner/Downloads/speaker-recognition-master/src/gui/feature/__init__.py", line 26, in mix_feature
    mfcc = MFCC.extract(tup)
  File "/Users/richardpenner/Downloads/speaker-recognition-master/src/gui/feature/MFCC.py", line 128, in extract
    ret = get_mfcc_extractor(fs, **kwargs).extract(signal)
  File "/Users/richardpenner/Downloads/speaker-recognition-master/src/gui/feature/MFCC.py", line 70, in extract
    feature = row_stack(feature)
  File "/Users/richardpenner/PROJECTS/miniconda2/envs/speaker3/lib/python2.7/site-packages/numpy/core/shape_base.py", line 230, in vstack
    return _nx.concatenate([atleast_2d(_m) for _m in tup], 0)
ValueError: need at least one array to concatenate

Any ideas what might be wrong?

Audio spoofing

Hello,

Nice project.
There is any function or future feature to accomplish audio spoofing (something like detect fake voices) over this project?

Thanks
Javenosa

Android Source

Hi, can I make the same application in android. and can u explain me their is some mathematics required to make this application.

Thanks in advance

have some errors

hi:
I have gcc>=4.7 on mac osx10.11. when I make gmm, this have some errors .can you help me?
[cc] src//kmeans.cc ...
clang: warning: -lm: 'linker' input unused
clang: warning: -lpthread: 'linker' input unused
src//kmeans.cc:165:19: error: use of undeclared identifier 'ceil'
int block_size = ceil((double)n / concurrency);
^
src//kmeans.cc:193:7: error: use of undeclared identifier 'fabs'
if (fabs(last_distsqr_sum - distsqr_sum) < 1e-6) { // || fabs(dist...
^
src//kmeans.cc:266:19: error: use of undeclared identifier 'ceil'
int block_size = ceil((double)n / concurrency);
^
src//kmeans.cc:298:7: error: use of undeclared identifier 'fabs'
if (fabs(last_distsqr_sum - distsqr_sum) < 1e-6)// || fabs(distsqr...
^
4 errors generated.
make: *** [obj/src//kmeans.o] Error 1

Unpredictable results.

Over the same dataset I trained on different machines, the misclassified samples are different.
Although few of the all data samples are misclassified over the given dataset, the results varies from machine to machine. Test samples properly identified on one machine are mislabeled on the other.
Do you have any idea about this?

Means extraction

Hi i would like to extract the means for creating of supervectors from my models using python, how can i do this?

Could you please clarify how to run the script?

First, your results seem pretty good. I have some problems hope you can help.

This command works fine:
make -C src/gmm

Then I run ./speaker-recognition.py, some errors occur
Traceback (most recent call last):
File "./speaker-recognition.py", line 17, in
from gui.interface import ModelInterface
File "/root/speaker-recognition/src/gui/interface.py", line 16, in
from feature import mix_feature
File "/root/speaker-recognition/src/gui/feature/init.py", line 17, in
import LPC
File "/root/speaker-recognition/src/gui/feature/LPC.py", line 9, in
from scikits.talkbox.linpred import levinson_lpc
File "/usr/local/lib/python2.7/dist-packages/scikits/talkbox/init.py", line 3, in
from tools import *
File "/usr/local/lib/python2.7/dist-packages/scikits/talkbox/tools/init.py", line 7, in
import cffilter
File "numpy.pxd", line 30, in cffilter (scikits/talkbox/tools/src/cffilter.c:2795)
ValueError: numpy.dtype does not appear to be the correct type object

Do you have ever encounter such a problems?
BTW how to open the GUI? THANKS

SyntaxError:invalid syntax

filters/utils.py，在这个文件里，只有如下这一行代码。
../gui/utils.py

在这里报错了，错误说是语法错误，这种写法好像确实不符合语法。不知道为什么在我这里会报错。恳请大神帮忙解答下，谢谢。

Failed to run application

Tried to run this application. Copied all repository, Compiled GMM, installed all needed dependencies.
In src folder created train folder and added some wav files. Then I run it like this:

python speaker-recognition.py -t enroll -i "/home/username/Desktop/speaker-recognition-master/src/train" -m model.out

then I'm getting this warnings.

Warning: failed to import fast-gmm, use gmm from scikit-learn instead
Label train has files /home/username/Desktop/speaker-recognition-master/src/train/Test1.wav,/home/username/Desktop/speaker-recognition-master/src/train/Test2.wav
Start training...
/home/username/.local/lib/python2.7/site-packages/sklearn/utils/deprecation.py:52: DeprecationWarning: Class GMM is deprecated; The class GMM is deprecated in 0.18 and will be removed in 0.20. Use class GaussianMixture instead.
warnings.warn(msg, category=DeprecationWarning)
/home/username/.local/lib/python2.7/site-packages/sklearn/utils/deprecation.py:70: DeprecationWarning: Function distribute_covar_matrix_to_match_covariance_type is deprecated; The functon distribute_covar_matrix_to_match_covariance_typeis deprecated in 0.18 and will be removed in 0.20.
warnings.warn(msg, category=DeprecationWarning)
/home/username/.local/lib/python2.7/site-packages/sklearn/utils/deprecation.py:70: DeprecationWarning: Function log_multivariate_normal_density is deprecated; The function log_multivariate_normal_density is deprecated in 0.18 and will be removed in 0.20.
warnings.warn(msg, category=DeprecationWarning)
/home/username/.local/lib/python2.7/site-packages/sklearn/utils/deprecation.py:70: DeprecationWarning: Function log_multivariate_normal_density is deprecated; The function log_multivariate_normal_density is deprecated in 0.18 and will be removed in 0.20.
warnings.warn(msg, category=DeprecationWarning)
/home/username/.local/lib/python2.7/site-packages/sklearn/utils/deprecation.py:70: DeprecationWarning: Function log_multivariate_normal_density is deprecated; The function log_multivariate_normal_density is deprecated in 0.18 and will be removed in 0.20.
warnings.warn(msg, category=DeprecationWarning)
/home/username/.local/lib/python2.7/site-packages/sklearn/utils/deprecation.py:70: DeprecationWarning: Function log_multivariate_normal_density is deprecated; The function log_multivariate_normal_density is deprecated in 0.18 and will be removed in 0.20.
warnings.warn(msg, category=DeprecationWarning)
/home/username/.local/lib/python2.7/site-packages/sklearn/utils/deprecation.py:70: DeprecationWarning: Function log_multivariate_normal_density is deprecated; The function log_multivariate_normal_density is deprecated in 0.18 and will be removed in 0.20.
warnings.warn(msg, category=DeprecationWarning)
/home/username/.local/lib/python2.7/site-packages/sklearn/utils/deprecation.py:70: DeprecationWarning: Function log_multivariate_normal_density is deprecated; The function log_multivariate_normal_density is deprecated in 0.18 and will be removed in 0.20.
warnings.warn(msg, category=DeprecationWarning)
/home/username/.local/lib/python2.7/site-packages/sklearn/utils/deprecation.py:70: DeprecationWarning: Function log_multivariate_normal_density is deprecated; The function log_multivariate_normal_density is deprecated in 0.18 and will be removed in 0.20.
warnings.warn(msg, category=DeprecationWarning)
/home/username/.local/lib/python2.7/site-packages/sklearn/utils/deprecation.py:70: DeprecationWarning: Function log_multivariate_normal_density is deprecated; The function log_multivariate_normal_density is deprecated in 0.18 and will be removed in 0.20.
warnings.warn(msg, category=DeprecationWarning)
/home/username/.local/lib/python2.7/site-packages/sklearn/utils/deprecation.py:70: DeprecationWarning: Function log_multivariate_normal_density is deprecated; The function log_multivariate_normal_density is deprecated in 0.18 and will be removed in 0.20.
warnings.warn(msg, category=DeprecationWarning)
/home/username/.local/lib/python2.7/site-packages/sklearn/utils/deprecation.py:70: DeprecationWarning: Function log_multivariate_normal_density is deprecated; The function log_multivariate_normal_density is deprecated in 0.18 and will be removed in 0.20.
warnings.warn(msg, category=DeprecationWarning)
/home/username/.local/lib/python2.7/site-packages/sklearn/utils/deprecation.py:70: DeprecationWarning: Function log_multivariate_normal_density is deprecated; The function log_multivariate_normal_density is deprecated in 0.18 and will be removed in 0.20.
warnings.warn(msg, category=DeprecationWarning)
/home/username/.local/lib/python2.7/site-packages/sklearn/utils/deprecation.py:70: DeprecationWarning: Function log_multivariate_normal_density is deprecated; The function log_multivariate_normal_density is deprecated in 0.18 and will be removed in 0.20.
warnings.warn(msg, category=DeprecationWarning)
/home/username/.local/lib/python2.7/site-packages/sklearn/utils/deprecation.py:70: DeprecationWarning: Function log_multivariate_normal_density is deprecated; The function log_multivariate_normal_density is deprecated in 0.18 and will be removed in 0.20.
warnings.warn(msg, category=DeprecationWarning)
/home/username/.local/lib/python2.7/site-packages/sklearn/utils/deprecation.py:70: DeprecationWarning: Function log_multivariate_normal_density is deprecated; The function log_multivariate_normal_density is deprecated in 0.18 and will be removed in 0.20.
warnings.warn(msg, category=DeprecationWarning)
/home/username/.local/lib/python2.7/site-packages/sklearn/utils/deprecation.py:70: DeprecationWarning: Function log_multivariate_normal_density is deprecated; The function log_multivariate_normal_density is deprecated in 0.18 and will be removed in 0.20.
warnings.warn(msg, category=DeprecationWarning)
/home/username/.local/lib/python2.7/site-packages/sklearn/utils/deprecation.py:70: DeprecationWarning: Function log_multivariate_normal_density is deprecated; The function log_multivariate_normal_density is deprecated in 0.18 and will be removed in 0.20.
warnings.warn(msg, category=DeprecationWarning)
/home/username/.local/lib/python2.7/site-packages/sklearn/utils/deprecation.py:70: DeprecationWarning: Function log_multivariate_normal_density is deprecated; The function log_multivariate_normal_density is deprecated in 0.18 and will be removed in 0.20.
warnings.warn(msg, category=DeprecationWarning)
/home/username/.local/lib/python2.7/site-packages/sklearn/utils/deprecation.py:70: DeprecationWarning: Function log_multivariate_normal_density is deprecated; The function log_multivariate_normal_density is deprecated in 0.18 and will be removed in 0.20.
warnings.warn(msg, category=DeprecationWarning)
/home/username/.local/lib/python2.7/site-packages/sklearn/utils/deprecation.py:70: DeprecationWarning: Function log_multivariate_normal_density is deprecated; The function log_multivariate_normal_density is deprecated in 0.18 and will be removed in 0.20.
warnings.warn(msg, category=DeprecationWarning)
/home/username/.local/lib/python2.7/site-packages/sklearn/utils/deprecation.py:70: DeprecationWarning: Function log_multivariate_normal_density is deprecated; The function log_multivariate_normal_density is deprecated in 0.18 and will be removed in 0.20.
warnings.warn(msg, category=DeprecationWarning)
/home/username/.local/lib/python2.7/site-packages/sklearn/utils/deprecation.py:70: DeprecationWarning: Function log_multivariate_normal_density is deprecated; The function log_multivariate_normal_density is deprecated in 0.18 and will be removed in 0.20.
warnings.warn(msg, category=DeprecationWarning)
/home/username/.local/lib/python2.7/site-packages/sklearn/utils/deprecation.py:70: DeprecationWarning: Function log_multivariate_normal_density is deprecated; The function log_multivariate_normal_density is deprecated in 0.18 and will be removed in 0.20.
warnings.warn(msg, category=DeprecationWarning)
/home/username/.local/lib/python2.7/site-packages/sklearn/utils/deprecation.py:70: DeprecationWarning: Function log_multivariate_normal_density is deprecated; The function log_multivariate_normal_density is deprecated in 0.18 and will be removed in 0.20.
warnings.warn(msg, category=DeprecationWarning)
/home/username/.local/lib/python2.7/site-packages/sklearn/utils/deprecation.py:70: DeprecationWarning: Function log_multivariate_normal_density is deprecated; The function log_multivariate_normal_density is deprecated in 0.18 and will be removed in 0.20.
warnings.warn(msg, category=DeprecationWarning)
/home/username/.local/lib/python2.7/site-packages/sklearn/utils/deprecation.py:70: DeprecationWarning: Function log_multivariate_normal_density is deprecated; The function log_multivariate_normal_density is deprecated in 0.18 and will be removed in 0.20.
warnings.warn(msg, category=DeprecationWarning)
/home/username/.local/lib/python2.7/site-packages/sklearn/utils/deprecation.py:70: DeprecationWarning: Function log_multivariate_normal_density is deprecated; The function log_multivariate_normal_density is deprecated in 0.18 and will be removed in 0.20.
warnings.warn(msg, category=DeprecationWarning)
/home/username/.local/lib/python2.7/site-packages/sklearn/utils/deprecation.py:70: DeprecationWarning: Function log_multivariate_normal_density is deprecated; The function log_multivariate_normal_density is deprecated in 0.18 and will be removed in 0.20.
warnings.warn(msg, category=DeprecationWarning)
/home/username/.local/lib/python2.7/site-packages/sklearn/utils/deprecation.py:70: DeprecationWarning: Function log_multivariate_normal_density is deprecated; The function log_multivariate_normal_density is deprecated in 0.18 and will be removed in 0.20.
warnings.warn(msg, category=DeprecationWarning)
/home/username/.local/lib/python2.7/site-packages/sklearn/utils/deprecation.py:70: DeprecationWarning: Function log_multivariate_normal_density is deprecated; The function log_multivariate_normal_density is deprecated in 0.18 and will be removed in 0.20.
warnings.warn(msg, category=DeprecationWarning)
/home/username/.local/lib/python2.7/site-packages/sklearn/utils/deprecation.py:70: DeprecationWarning: Function log_multivariate_normal_density is deprecated; The function log_multivariate_normal_density is deprecated in 0.18 and will be removed in 0.20.
warnings.warn(msg, category=DeprecationWarning)
/home/username/.local/lib/python2.7/site-packages/sklearn/utils/deprecation.py:70: DeprecationWarning: Function log_multivariate_normal_density is deprecated; The function log_multivariate_normal_density is deprecated in 0.18 and will be removed in 0.20.
warnings.warn(msg, category=DeprecationWarning)
/home/username/.local/lib/python2.7/site-packages/sklearn/utils/deprecation.py:70: DeprecationWarning: Function log_multivariate_normal_density is deprecated; The function log_multivariate_normal_density is deprecated in 0.18 and will be removed in 0.20.
warnings.warn(msg, category=DeprecationWarning)
/home/username/.local/lib/python2.7/site-packages/sklearn/utils/deprecation.py:70: DeprecationWarning: Function log_multivariate_normal_density is deprecated; The function log_multivariate_normal_density is deprecated in 0.18 and will be removed in 0.20.
warnings.warn(msg, category=DeprecationWarning)
/home/username/.local/lib/python2.7/site-packages/sklearn/utils/deprecation.py:70: DeprecationWarning: Function log_multivariate_normal_density is deprecated; The function log_multivariate_normal_density is deprecated in 0.18 and will be removed in 0.20.
warnings.warn(msg, category=DeprecationWarning)
/home/username/.local/lib/python2.7/site-packages/sklearn/utils/deprecation.py:70: DeprecationWarning: Function log_multivariate_normal_density is deprecated; The function log_multivariate_normal_density is deprecated in 0.18 and will be removed in 0.20.
warnings.warn(msg, category=DeprecationWarning)
/home/username/.local/lib/python2.7/site-packages/sklearn/utils/deprecation.py:70: DeprecationWarning: Function log_multivariate_normal_density is deprecated; The function log_multivariate_normal_density is deprecated in 0.18 and will be removed in 0.20.
warnings.warn(msg, category=DeprecationWarning)
/home/username/.local/lib/python2.7/site-packages/sklearn/utils/deprecation.py:70: DeprecationWarning: Function log_multivariate_normal_density is deprecated; The function log_multivariate_normal_density is deprecated in 0.18 and will be removed in 0.20.
warnings.warn(msg, category=DeprecationWarning)
/home/username/.local/lib/python2.7/site-packages/sklearn/utils/deprecation.py:70: DeprecationWarning: Function log_multivariate_normal_density is deprecated; The function log_multivariate_normal_density is deprecated in 0.18 and will be removed in 0.20.
warnings.warn(msg, category=DeprecationWarning)
/home/username/.local/lib/python2.7/site-packages/sklearn/utils/deprecation.py:70: DeprecationWarning: Function log_multivariate_normal_density is deprecated; The function log_multivariate_normal_density is deprecated in 0.18 and will be removed in 0.20.
warnings.warn(msg, category=DeprecationWarning)
/home/username/.local/lib/python2.7/site-packages/sklearn/utils/deprecation.py:70: DeprecationWarning: Function log_multivariate_normal_density is deprecated; The function log_multivariate_normal_density is deprecated in 0.18 and will be removed in 0.20.
warnings.warn(msg, category=DeprecationWarning)
/home/username/.local/lib/python2.7/site-packages/sklearn/utils/deprecation.py:70: DeprecationWarning: Function log_multivariate_normal_density is deprecated; The function log_multivariate_normal_density is deprecated in 0.18 and will be removed in 0.20.
warnings.warn(msg, category=DeprecationWarning)
/home/username/.local/lib/python2.7/site-packages/sklearn/utils/deprecation.py:70: DeprecationWarning: Function log_multivariate_normal_density is deprecated; The function log_multivariate_normal_density is deprecated in 0.18 and will be removed in 0.20.
warnings.warn(msg, category=DeprecationWarning)
/home/username/.local/lib/python2.7/site-packages/sklearn/utils/deprecation.py:70: DeprecationWarning: Function log_multivariate_normal_density is deprecated; The function log_multivariate_normal_density is deprecated in 0.18 and will be removed in 0.20.
warnings.warn(msg, category=DeprecationWarning)
/home/username/.local/lib/python2.7/site-packages/sklearn/utils/deprecation.py:70: DeprecationWarning: Function log_multivariate_normal_density is deprecated; The function log_multivariate_normal_density is deprecated in 0.18 and will be removed in 0.20.
warnings.warn(msg, category=DeprecationWarning)
/home/username/.local/lib/python2.7/site-packages/sklearn/utils/deprecation.py:70: DeprecationWarning: Function log_multivariate_normal_density is deprecated; The function log_multivariate_normal_density is deprecated in 0.18 and will be removed in 0.20.
warnings.warn(msg, category=DeprecationWarning)
/home/username/.local/lib/python2.7/site-packages/sklearn/utils/deprecation.py:70: DeprecationWarning: Function log_multivariate_normal_density is deprecated; The function log_multivariate_normal_density is deprecated in 0.18 and will be removed in 0.20.
warnings.warn(msg, category=DeprecationWarning)
3.25849986076 seconds

I assume it's not working. How to fix this?

Open or closed set

What would happen if I have a trained model with 3 speakers, then, I try to identify a 4th speaker which is not in the trained model, what would be the answer in this case?

work flow and other questions

Your package is really a good one. But I have a lot of question on using it. I just started reading, and

is it free to use your package? Should i put any citation?
you reported MFCC and LPC performance, they are separate experiments, not combined, right? is there any comparison between MFCC and LPC?
There are several methods for removing noise, which one is the best? For VAD one, it just remove noise, but for the noisered one, you also do some noise cancellation?
what is the sampling frequency of the speech data for the wav? 16KHz by default? did not see this in your doc.
in Final-report: Page 10 under 3. Implementation, it said: we have found that the following parameters are optimal" and "number of cepstral coefficient is 15", but in Page16, under 5.5, it stated that 19 cepstrums MFCC is the best. Please clarify and confirm.
can you give me a work flow of the processes? especially i want to use it piece by piece, not in the gui, for example:
the speech files from a few speakers --> remove_noise, ---> ubm
--->gmms
i am new to python, hope have more detail info, like the parameters for running your tools.

Thanks
Vic

How can I use the conversation mode to run an audio file instead of using the computer's speaker for live audio ?

How do i use the conversation mode to detect voices on an audio clip of a conversation between multiple speakers ?

Error Loading UBM for training

Hello,

When I try to train the UBM Model I receive the following error

the model file:  /var/webapps/speaker-recognition-master/speaker-recognition-master/src/gui/model/ubm.mixture-32.person-20.immature.model

Start training...
training from ubm ...
nr_instance   :   2099
nr_dim        :   28
nr_mixture    :   32
min_covar     :   0.001000
threshold     :   0.010000
nr_iteration  :   200
init_with_kmeans: 0
concurrency   :   1
verbosity     :   0
python2: src/gmm.cc:177: real_t Gaussian::probability_of_fast_exp(std::vector<double>&, double*): Assertion `(int)x.size() == dim' failed.
Aborted (core dumped)

Any idea why I get the Assertion fail error

GUI - Version calculation of score

Hi,
thx for the great work. I have learned a lot and the speaker-recognition on command line works fine. The gui has in my environment linux python2.7 a problem. The calculation of scores is not working.
Is there a option to check if the "learned models" are available and consistent?
On my investigation I have found one little mistake in gui.py in def updateUserInfo there the line
349 u = self.serdata[userindex] should be updated to
u = self.userdata[userindex]
Regards from Germany

Unknown speaker recognition with predict_one_with_rejection - Assertion `(int)x.size() == dim' failed

Hi Again,

I am also trying to conduct unknown speaker recognition so I can later add unknown speaker samples to future training data.

It seems that the predict_one_with_rejection method of the GMMSet object is written to help identify unknown speakers?

Unfortunately, when I begin enrollment with the UBM, I receive the following error:

Start training...
training from ubm ...
nr_instance   :   18175
nr_dim        :   28
nr_mixture    :   32
min_covar     :   0.001000
threshold     :   0.010000
nr_iteration  :   200
init_with_kmeans: 0
concurrency   :   4
verbosity     :   0
python: src/gmm.cc:177: real_t Gaussian::probability_of_fast_exp(std::vector<double>&, double*): Assertion `(int)x.size() == dim' failed.
python: src/gmm.cc:177: real_t Gaussian::probability_of_fast_exp(std::vector<double>&, double*): Assertion `(int)x.size() == dim' failed.
Aborted (core dumped)

However when I enroll without the UBM, everything works fine.

Do you know what this problem is? Could you explain to me what this assertion is checking?

Is it:
x.size() the number of features
dim the number of dimensions in the GMM
???

Note : for me, training without the UBM was the default setting because the path to the UBM in interface.py was incorrect for my system and _get_gmm_set automatically returns a non-ubm GMM set if the model file can't be found. To fix this, I changed:

UBM_MODEL_FILE = 'model/ubm.mixture-32.utt-300.model'
to
UBM_MODEL_FILE = 'gui/model/ubm.mixture-32.utt-300.model'

Andrew

version of bob libraries not known, undefined symbol: _ZN5boost9iostreams4zlib8deflatedE in libbob_core.so

while installing bob.sp , following traceback occurs
Traceback (most recent call last):
```
File "", line 1, in
File "/tmp/pip-build-kQnxpH/bob.sp/setup.py", line 45, in
bob_packages = bob_packages,
File "/project/local/lib/python2.7/site-packages/bob/blitz/extension.py", line 52, in init
BobExtension.init(self, *args, **kwargs)
File "/project/local/lib/python2.7/site-packages/bob/extension/init.py", line 301, in init
bob_includes, bob_libraries, bob_library_dirs, bob_macros = get_bob_libraries(self.bob_packages)
File "/project/local/lib/python2.7/site-packages/bob/extension/init.py", line 193, in get_bob_libraries
pkg = importlib.import_module(package)
File "/usr/lib/python2.7/importlib/init.py", line 37, in import_module
import(name)
File "/project/local/lib/python2.7/site-packages/bob/core/init.py", line 3, in
bob.extension.load_bob_library('bob.core', file)
File "/project/local/lib/python2.7/site-packages/bob/extension/init.py", line 244, in load_bob_library
ctypes.cdll.LoadLibrary(full_libname)
File "/usr/lib/python2.7/ctypes/init.py", line 440, in LoadLibrary
return self._dlltype(name)
File "/usr/lib/python2.7/ctypes/init.py", line 362, in init
self._handle = _dlopen(self._name, mode)
OSError: /project/local/lib/python2.7/site-packages/bob/core/libbob_core.so: undefined symbol: _ZN5boost9iostreams4zlib8deflatedE

The conversation mode is not working

Using gui.py I tried running the conversation mode. It wasn't successful. It is randomly selecting one voice sample and showing it for every conversation. Is the conversation mode working fine for you ?