GithubHelp home page GithubHelp logo

Comments (11)

sayakpaul avatar sayakpaul commented on May 27, 2024 1

I think to keep it tidy we could use this repo and once we have fixated on something we could incorporate that inside the GSoC repo. WDYT?

I will check out with results tomorrow and share my comments.

from gsoc-wav2vec2.

sayakpaul avatar sayakpaul commented on May 27, 2024 1

Gotcha. Thank you.

from gsoc-wav2vec2.

sayakpaul avatar sayakpaul commented on May 27, 2024 1

@vasudevgupta7 I get a 404 after clicking on the above-mentioned link.

I think we need to think of an augmentation pipeline to regularize the student training so that it is able to match the underlying teacher. The FunMatch paper circumvents this by using an aggressive form of MixUp and an excessively longer training schedule to compensate for it.

Translating that to speech is difficult, I agree and this is where we have opportunities I believe. It might be worth taking a look at AugLy which is an open-source framework providing augmentation transformations for different data modalities including audio. This might help us curate an augmentation pipeline for our purpose.

On the other hand, your last thought on this comment also seems a pretty good direction. If we do try to figure out that mapping (two conv blocks from teacher = 1 conv block in the student, for example) I think we could introduce another bottleneck layer to help make that transfer learnable.

from gsoc-wav2vec2.

thevasudevgupta avatar thevasudevgupta commented on May 27, 2024

I think to keep it tidy we could use this repo and once we have fixated on something we could incorporate that inside the GSoC repo. WDYT?

Yeah! that would be good

from gsoc-wav2vec2.

sayakpaul avatar sayakpaul commented on May 27, 2024

@vasudevgupta7 seems like the training is now done? The training progress (loss-wise) looks good to me.

Also just for my own reference, this is in regards to distilling the wav2vec2 model fine-tuned on speech recognition, correct?

Wanted to know a bit more about the student architectures. Could you provide brief overviews?

from gsoc-wav2vec2.

thevasudevgupta avatar thevasudevgupta commented on May 27, 2024

@sayakpaul,

Above experiments are just normal fine-tuning wav2vec2 on 100h of LibriSpeech data. Since, training on 960h takes lot of time, I want to establish some kinda baseline for small amount of data so that further experiments can be started on small data. (We will definitely train on 960h data finally, its just for cutting the experimentation time now as 100h model is also giving reasonable WER)
Further, since experiments involve 2 stage training, I wanted to check if we can follow only stage-1 for further experimentation.

I will post brief overviews for every training experiment (in the table) by tonight!

I am going to do distillation training today.

from gsoc-wav2vec2.

sayakpaul avatar sayakpaul commented on May 27, 2024

Got it. But didn't we have models fine-tuned on the LibriSpeech dataset (100h) already?

Further, since experiments involve 2 stage training, I wanted to check if we can follow only stage-1 for further experimentation.

By two-stage, do you mean training of both student and teacher models? In any case, I think when it's applicable we should be able to use the pre-trained (fine-tuned) models as teachers.

I want to establish some kinda baseline for small amount of data so that further experiments can be started on small data.

Perfectly fine.

from gsoc-wav2vec2.

thevasudevgupta avatar thevasudevgupta commented on May 27, 2024

Got it. But didn't we have models fine-tuned on the LibriSpeech dataset (100h) already?

No, I directly trained on 960h earlier.

By two-stage, do you mean training of both student and teacher models? In any case, I think when it's applicable we should be able to use the pre-trained (fine-tuned) models as teachers.

By 2 stages, I mean this: #17 (comment)

from gsoc-wav2vec2.

thevasudevgupta avatar thevasudevgupta commented on May 27, 2024

Hello @sayakpaul, I trained the first distillation model yesterday. Unfortunately, it didn't perform well. It's trying to learn (not all predicted tokens are random). I am trying to change initialisation strategy and some hyper parameters to get it working.

teacher: https://tfhub.dev/vasudevgupta7/wav2vec2-960h/1
student: smaller version of same architecture
loss: alpha*KL-divergence loss + (1-alpha)*(ctc-loss)
script: https://github.com/vasudevgupta7/compressed-wav2vec2/blob/part_2/src/train_distilled.py

from gsoc-wav2vec2.

sayakpaul avatar sayakpaul commented on May 27, 2024

Are you training the student for longer? How's the training progress?

What happens if we only use KL-divergence and completely get rid of the labeled signal?

from gsoc-wav2vec2.

thevasudevgupta avatar thevasudevgupta commented on May 27, 2024

Currently only for 10 epochs (logs: https://wandb.ai/7vasudevgupta/wav2vec2-distillation/runs/2h82mhgc?workspace=user-7vasudevgupta). I need to play around with alpha. Will do these experiments today.

from gsoc-wav2vec2.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.