GithubHelp home page GithubHelp logo

Comments (16)

astorfi avatar astorfi commented on July 19, 2024 1

Please read the following posts for further information:

What's state of the art for speaker recognition and verification?
Is there any open source deep learning tool available for speaker recognition?
For speaker recognition, what are the best ML algorithms? What are the features that I should get from a voice?
How can I understand the speaker recognition, speaker identification and speaker verification?
https://www.quora.com/What-are-some-speaker-recognition-toolkits

from 3d-convolutional-speaker-recognition.

SpongebBob avatar SpongebBob commented on July 19, 2024

I try ai-shell dataset the kaldi i-vector is around 2% eer.
But this 3D-convolutional-speaker-recognition is 20% eer.
I think it is not a so novel model actually.

from 3d-convolutional-speaker-recognition.

duynguyen5896 avatar duynguyen5896 commented on July 19, 2024

@SpongebBob , how can you get 20% eer in the evaluation phase, do you reuse this code?

from 3d-convolutional-speaker-recognition.

astorfi avatar astorfi commented on July 19, 2024

@duynguyen5896
I think one sample per speaker is not too much for training the background model.
What do you mean by unenrollment??

The samples for evaluations cannot project the correct statistics as well

from 3d-convolutional-speaker-recognition.

SpongebBob avatar SpongebBob commented on July 19, 2024

@duynguyen5896 more details #33

from 3d-convolutional-speaker-recognition.

astorfi avatar astorfi commented on July 19, 2024

@SpongebBob Please make sure to do the correct experiments ... This is a deep learning method and for a new dataset, it needs a lot of tweaking.

from 3d-convolutional-speaker-recognition.

astorfi avatar astorfi commented on July 19, 2024

@SpongebBob Please reopen #33 if the problem has not been resolved.

from 3d-convolutional-speaker-recognition.

duynguyen5896 avatar duynguyen5896 commented on July 19, 2024

@astorfi, unenrollment mean that rejection ( i don't enroll those people) and want to recognize if the model can classify them or not.
For the samples for evaluations, how many samples you use for evaluation per person?

from 3d-convolutional-speaker-recognition.

duynguyen5896 avatar duynguyen5896 commented on July 19, 2024

@SpongebBob can you update your evaluation source code? I don't understand how you evaluate the model through the #33.

from 3d-convolutional-speaker-recognition.

astorfi avatar astorfi commented on July 19, 2024

@duynguyen5896 I don't think SpongebBob did a similar experimental setup for 3D-Conv and Kaldi i-vector. 2% EER is not very realistic for 0.8 seconds of data and text-independent setting.

About VoxCeleb, I am trying to use Pytorch for the same setup. However, the VoxCeleb is huge and parameter tuning does not seem to be trivial.

from 3d-convolutional-speaker-recognition.

duynguyen5896 avatar duynguyen5896 commented on July 19, 2024

@astorfi Can you give me more detail about your experiment in enrollment and evaluation phase, i can see in your paper that you used 100 speakers for enrollment and evaluation.
Did you enroll all 100 speakers?

from 3d-convolutional-speaker-recognition.

astorfi avatar astorfi commented on July 19, 2024

Yes, all 100 speakers are enrolled. In the evaluation, different enrollments of the same speakers are used.

from 3d-convolutional-speaker-recognition.

duynguyen5896 avatar duynguyen5896 commented on July 19, 2024

@astorfi , For the enrollment phase, i see that you merge 20 utterances of 1 speaker. Are those utterances selected randomly or they are the continuous chain of speech?

I tried to enroll all 100 speakers and 50 enrolled-50 not enrolled for testing. However, the results are not good for both, the result seem to be regardless to the number of enrolled speakers, they are still about 40% EER

from 3d-convolutional-speaker-recognition.

astorfi avatar astorfi commented on July 19, 2024

@duynguyen5896 For selecting utterances, either of the cases works. However, I did that selection randomly.

May I know why you are splitting like that? 50 enrolled and 50 unenrolled?
I think you are making a mistake. Please read the paper as I have to emphasize once again.

All 100 speakers must be used in enrollment and evaluation stages as we are comparing the known speakers with the speaker models. For unenrolled subjects, we do not have any model since this model is not end-to-end. Please make sure that you understand the speaker verification setup we are using.

from 3d-convolutional-speaker-recognition.

duynguyen5896 avatar duynguyen5896 commented on July 19, 2024

@astorfi Actually, i want to try if the model can predict unenrolled well or not. However, when i do the same setup as you, 100 enrolled the result also not good. I think the reason is different dataset.
Anyway, thank for your help and kindness.
I'm trying if the model work well on small dataset. I have a small dataset (only 46 speakers) and i want to try if the result will be better or not when using 10 utterances per sample. How you setup the 10 utterances model structure?

from 3d-convolutional-speaker-recognition.

astorfi avatar astorfi commented on July 19, 2024

@duynguyen5896 Yes, unfortunately, the dataset is not public and tune it for a new dataset needs tuning.
I don't think it works for small datasets.

from 3d-convolutional-speaker-recognition.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.