Hi guys, I have a question regarding the input wav files used for training. Wh

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Dear all, Please refer to the <a href="https://github.com/astorfi/3D

.wav inputs specifics about 3d-convolutional-speaker-recognition HOT 6 CLOSED

astorfi commented on August 20, 2024

.wav inputs specifics

from 3d-convolutional-speaker-recognition.

Comments (6)

imranparuk commented on August 20, 2024

Voxceleb is a good one, could you be more specific on what issues you are having?
From what I know, you need mono wav files for one. The shape needs to be 2 dimensional.

from 3d-convolutional-speaker-recognition.

loregagliard commented on August 20, 2024

Does it work for you?
Because I am checking the files and sometimes there are noisy audios, even though the interviewed talks most of the time (like the interviewer talking, a guitar playing, ecc...).
I made it run with a batch size of 3 and it gives me a train accuracy of 100 or 0,no middle values.
With larger batches I don't have much more fortune.
I know that the dataset has to have a Voice Activity Detection to remove silence and be effective, maybe it is that. What algorithm did you use?
I'd like to know if there were boundaries on the 'quality' of the audio files.

from 3d-convolutional-speaker-recognition.

MSAlghamdi commented on August 20, 2024

@loregagliard

I made it run with a batch size of 3 and it gives me a train accuracy of 100 or 0,no middle values.

I have the same issue due to the features map values. Did you use input_feature.py published with the project as it is? If you did, then the problem is in the input features coming out from input_feature.py. I think (correct me if I'm wrong) that's because it uses the log-energy which most likely negative values.

Please let me know once you solve this issue since I'm stuck with it.

from 3d-convolutional-speaker-recognition.

imranparuk commented on August 20, 2024

Hi guys, if you read the paper the author does do VAD, however he did state it was done in Matlab if I'm not mistaken. You will be able to find some VAD solutions in python but they do not produce good results. My advice is to not worry about the VAD. The models will work without them. Please try out the keras implication here - > https://github.com/imranparuk/speaker-recognition-3d-cnn and see if that works for you. It's a working progress, then if that works it will help you understand what is being accomplished in this repository.

from 3d-convolutional-speaker-recognition.

loregagliard commented on August 20, 2024

@MSAlghamdi

Did you use input_feature.py published with the project as it is?

Yes, the input_feature.py file is the same as the project.
I just added code to generate the hdf5 files, 'development_sample_dataset_speaker.hdf5' and 'enrollment-evaluation_sample_dataset.hdf5'. I took the code from other issues discussions here.
Just to be completely clear, VoxCeleb appears to me as a directory containing directories of various identities.
Each of those speaker-directories cointains sub-directories containing the wav files (finally!).
Thus I generated the dataset by copying the audios and adding the name of the speaker-directory and the sub-directory to the file name.
The speaker labels has been generated by applying ASCII table to the name of the speaker-directories and then reindexed to be 0,1,2,3,... .
The audios have a duration range from a bunch of seconds to minutes.
Maybe should I merge the audio of a certain speaker?
Anyway I chose the even files to be my training set and the odds to be the testing set, so that each speaker have a sufficient number of audios.
Is there a way to feed just a feature to the network and see what is the outcome?

Thank you! ( and happy new year!!)

from 3d-convolutional-speaker-recognition.

astorfi commented on August 20, 2024

Dear all,

Please refer to the Pytorch Implementation which uses VoxCeleb dataset.

from 3d-convolutional-speaker-recognition.

.wav inputs specifics about 3d-convolutional-speaker-recognition HOT 6 CLOSED

Comments (6)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs