Comments (2)
We need to use interleave_datasets
for streaming datasets. Here we do not know the length of each dataset a-priori, and so mix them on-the-fly based on the sampling probabilities that we define, potentially truncating individual datasets when we completely iterate over one of datasets (see "stopping strategies" in the docs).
Whereas we use concatenate_datasets
for non-streaming datasets, since we know the lengths of each dataset a-priori, so can mix them entirely). See docs.
from community-events.
Ideally, this is the kind of logic that we want to implement, borrowed from the Distil-Whisper training code: https://github.com/huggingface/distil-whisper/blob/914dcdf3919552d5a3826a9d5db99b059ddcc16e/training/run_distillation.py#L600
from community-events.
Related Issues (20)
- How to use the resulting whisper checkpoints when finetuning HOT 9
- Fine-tuned Whisper models perform worse than OpenAI HOT 8
- Super large number of epoch HOT 1
- Using finetuned whisper checkpoints for inference HOT 1
- Whisper parameters HOT 1
- Add Keras Dreambooth notebook to this repo
- Lambda platform doesn't support Tensorflow-Gpu HOT 1
- Visualize ControlNet results using Weights & Biases Tables HOT 11
- WhisperPositionalEmbedding HOT 1
- this is more question than an issue HOT 2
- Padding conflict in loss computation HOT 2
- Whisper finetune HOT 1
- common_voice map error in the notebook of fine-tune-whisper-non-streaming HOT 1
- huggan.pytorch.lightweight_gan.lightweight_gan.LightweightGAN _from_pretrained requires use_auth_token but this is not passed by the from_pretrained method inherited from ModelHubMixin HOT 5
- Increasing WER & Validation Loss During Whisper Fine-Tuning HOT 1
- Poor Real-Time Performance of Whisper Models Fine-Tuned on Synthetic Data HOT 1
- Colab runtime crash HOT 2
- How to prepare audio dataset for whisper fine-tuning with timestamps?
- Using rename_column and remove_column method for a IterableDataset object leads to its feature property become None --- in the Whisper Fine-Tuning Event HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from community-events.