GithubHelp home page GithubHelp logo

Comments (6)

imranparuk avatar imranparuk commented on August 20, 2024

Voxceleb is a good one, could you be more specific on what issues you are having?
From what I know, you need mono wav files for one. The shape needs to be 2 dimensional.

from 3d-convolutional-speaker-recognition.

loregagliard avatar loregagliard commented on August 20, 2024

Does it work for you?
Because I am checking the files and sometimes there are noisy audios, even though the interviewed talks most of the time (like the interviewer talking, a guitar playing, ecc...).
I made it run with a batch size of 3 and it gives me a train accuracy of 100 or 0,no middle values.
With larger batches I don't have much more fortune.
I know that the dataset has to have a Voice Activity Detection to remove silence and be effective, maybe it is that. What algorithm did you use?
I'd like to know if there were boundaries on the 'quality' of the audio files.

from 3d-convolutional-speaker-recognition.

MSAlghamdi avatar MSAlghamdi commented on August 20, 2024

@loregagliard

I made it run with a batch size of 3 and it gives me a train accuracy of 100 or 0,no middle values.

I have the same issue due to the features map values. Did you use input_feature.py published with the project as it is? If you did, then the problem is in the input features coming out from input_feature.py. I think (correct me if I'm wrong) that's because it uses the log-energy which most likely negative values.

Please let me know once you solve this issue since I'm stuck with it.

from 3d-convolutional-speaker-recognition.

imranparuk avatar imranparuk commented on August 20, 2024

Hi guys, if you read the paper the author does do VAD, however he did state it was done in Matlab if I'm not mistaken. You will be able to find some VAD solutions in python but they do not produce good results. My advice is to not worry about the VAD. The models will work without them. Please try out the keras implication here - > https://github.com/imranparuk/speaker-recognition-3d-cnn and see if that works for you. It's a working progress, then if that works it will help you understand what is being accomplished in this repository.

from 3d-convolutional-speaker-recognition.

loregagliard avatar loregagliard commented on August 20, 2024

@MSAlghamdi

Did you use input_feature.py published with the project as it is?

Yes, the input_feature.py file is the same as the project.
I just added code to generate the hdf5 files, 'development_sample_dataset_speaker.hdf5' and 'enrollment-evaluation_sample_dataset.hdf5'. I took the code from other issues discussions here.
Just to be completely clear, VoxCeleb appears to me as a directory containing directories of various identities.
Each of those speaker-directories cointains sub-directories containing the wav files (finally!).
Thus I generated the dataset by copying the audios and adding the name of the speaker-directory and the sub-directory to the file name.
The speaker labels has been generated by applying ASCII table to the name of the speaker-directories and then reindexed to be 0,1,2,3,... .
The audios have a duration range from a bunch of seconds to minutes.
Maybe should I merge the audio of a certain speaker?
Anyway I chose the even files to be my training set and the odds to be the testing set, so that each speaker have a sufficient number of audios.
Is there a way to feed just a feature to the network and see what is the outcome?

Thank you! ( and happy new year!!)

from 3d-convolutional-speaker-recognition.

astorfi avatar astorfi commented on August 20, 2024

Dear all,

Please refer to the Pytorch Implementation which uses VoxCeleb dataset.

from 3d-convolutional-speaker-recognition.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.