GithubHelp home page GithubHelp logo

Comments (4)

philipperemy avatar philipperemy commented on August 15, 2024

Hi Robin,

Thank you for your question.

  1. Here is the paragraph where the authors mention about the normalization:

image

From: https://arxiv.org/pdf/1705.02304.pdf

It's not crystal clear how they did so I had to make a guess. I used instance normalization as you pointed out and did not normalize with batch normalization. The 26D FBanks get normalized within a single frame for a given speaker. I'm not a big fan of BN because it's batch dependent.

Instance normalization looked like the most natural (and easiest) way to get the normalization working well. What would be ideal is to make sure it's what they did in the paper. Your guess is that they used a batch normalization for FBank normalization instead? In your example m is the batch and v the frame (just to make sure my understanding is correct).

  1. Regarding the naming I agree with you. It should be more read_fbank() than read_mfcc().

PS: I'm not an expert in audio analysis.

from deep-speaker.

RobinROAR avatar RobinROAR commented on August 15, 2024

Hi Philip, Thanks for your quick reply!

I'm not a audio expert as well, but I doubt the IN(Instance Normalization) in each frame because I feel it would remove the difference between audio frames like [0,0,0,0,0..] and [100,100,100,100,100]. Besides, to my knowledge in computer vision field, the IN is usually used in style transfer, GAN-based image synthesize or image-to-image translation tasks. When handling classifications, the BN is mostly applied.

To verify my thoughts, I make a minor revise on your code(use 64D MFCC and do BN) and retrain the model. The test script result is SAME SPEAKER [0.73709786] DIFF SPEAKER [-0.00997655] (yours: SAME SPEAKER [0.8112024] DIFF SPEAKER [0.02534033]) . No significant improvement...

I find a similar discuss in https://www.kaggle.com/c/freesound-audio-tagging/discussion/54082
I'm also trying to contact the original paper authors to find the truth.

Best,

-Robin

from deep-speaker.

philipperemy avatar philipperemy commented on August 15, 2024

@RobinROAR thank you for making those experiments. It's very insightful. I would expect it to work better too but seems like in practice it yields similar results. I tried to contact the original paper authors but it has been unsuccessful. It's a paper by Baidu and they probably don't want to release too much information. I had to "guess" a lot of small details when implementing the paper. Anyway, let me know if they reply!

from deep-speaker.

philipperemy avatar philipperemy commented on August 15, 2024

I'll close this issue for inactivity. Please re-open it if you have something new!

from deep-speaker.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.