Comments (11)
I think to keep it tidy we could use this repo and once we have fixated on something we could incorporate that inside the GSoC repo. WDYT?
I will check out with results tomorrow and share my comments.
from gsoc-wav2vec2.
Gotcha. Thank you.
from gsoc-wav2vec2.
@vasudevgupta7 I get a 404 after clicking on the above-mentioned link.
I think we need to think of an augmentation pipeline to regularize the student training so that it is able to match the underlying teacher. The FunMatch paper circumvents this by using an aggressive form of MixUp and an excessively longer training schedule to compensate for it.
Translating that to speech is difficult, I agree and this is where we have opportunities I believe. It might be worth taking a look at AugLy which is an open-source framework providing augmentation transformations for different data modalities including audio. This might help us curate an augmentation pipeline for our purpose.
On the other hand, your last thought on this comment also seems a pretty good direction. If we do try to figure out that mapping (two conv blocks from teacher = 1 conv block in the student, for example) I think we could introduce another bottleneck layer to help make that transfer learnable.
from gsoc-wav2vec2.
I think to keep it tidy we could use this repo and once we have fixated on something we could incorporate that inside the GSoC repo. WDYT?
Yeah! that would be good
from gsoc-wav2vec2.
@vasudevgupta7 seems like the training is now done? The training progress (loss-wise) looks good to me.
Also just for my own reference, this is in regards to distilling the wav2vec2 model fine-tuned on speech recognition, correct?
Wanted to know a bit more about the student architectures. Could you provide brief overviews?
from gsoc-wav2vec2.
Above experiments are just normal fine-tuning wav2vec2 on 100h of LibriSpeech data. Since, training on 960h takes lot of time, I want to establish some kinda baseline for small amount of data so that further experiments can be started on small data. (We will definitely train on 960h data finally, its just for cutting the experimentation time now as 100h model is also giving reasonable WER)
Further, since experiments involve 2 stage training, I wanted to check if we can follow only stage-1 for further experimentation.
I will post brief overviews for every training experiment (in the table) by tonight!
I am going to do distillation training today.
from gsoc-wav2vec2.
Got it. But didn't we have models fine-tuned on the LibriSpeech dataset (100h) already?
Further, since experiments involve 2 stage training, I wanted to check if we can follow only stage-1 for further experimentation.
By two-stage, do you mean training of both student and teacher models? In any case, I think when it's applicable we should be able to use the pre-trained (fine-tuned) models as teachers.
I want to establish some kinda baseline for small amount of data so that further experiments can be started on small data.
Perfectly fine.
from gsoc-wav2vec2.
Got it. But didn't we have models fine-tuned on the LibriSpeech dataset (100h) already?
No, I directly trained on 960h earlier.
By two-stage, do you mean training of both student and teacher models? In any case, I think when it's applicable we should be able to use the pre-trained (fine-tuned) models as teachers.
By 2 stages, I mean this: #17 (comment)
from gsoc-wav2vec2.
Hello @sayakpaul, I trained the first distillation model yesterday. Unfortunately, it didn't perform well. It's trying to learn (not all predicted tokens are random). I am trying to change initialisation strategy and some hyper parameters to get it working.
teacher: https://tfhub.dev/vasudevgupta7/wav2vec2-960h/1
student: smaller version of same architecture
loss: alpha*KL-divergence loss + (1-alpha)*(ctc-loss)
script: https://github.com/vasudevgupta7/compressed-wav2vec2/blob/part_2/src/train_distilled.py
from gsoc-wav2vec2.
Are you training the student for longer? How's the training progress?
What happens if we only use KL-divergence and completely get rid of the labeled signal?
from gsoc-wav2vec2.
Currently only for 10 epochs (logs: https://wandb.ai/7vasudevgupta/wav2vec2-distillation/runs/2h82mhgc?workspace=user-7vasudevgupta). I need to play around with alpha. Will do these experiments today.
from gsoc-wav2vec2.
Related Issues (20)
- Port original fine-tuned checkpoint to TFHub HOT 6
- unable to create TPU node HOT 2
- Training results HOT 11
- Questions about processor HOT 9
- About the README HOT 6
- hi,when I train wav2vec-xlsr, prompt “you should pass `attention_mask` when working with Wav2Vec2 new checkpoints”,what should I do? When do you plan to add the attention_mask for train? HOT 5
- How to change input signature HOT 5
- some issue with running sample notebook HOT 1
- Fused conv implementation error when running fine-tuning notebook HOT 2
- Ideas from the wav2vec2 repo HOT 17
- installing wav2vec2 package HOT 2
- Error in accompanying Colab: "Failed to get convolution algorithm" HOT 3
- Error in accompanying Colab: "InternalError: libdevice not found at ./libdevice.10.bc [Op:__inference_tf_forward_55230]" HOT 1
- Kaggle TPU loading/ initialization fails HOT 3
- TPU error: Input 2 to node `CTCLoss/ctc-loss/ctc_state_trans/ScatterNd_1` must be a compile-time constant
- Training related doubt
- URL issues
- checkpoints LICENCE discussion HOT 1
- Feedback on the fine-tuning notebook
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from gsoc-wav2vec2.