Comments (1)
Good question.
The reason for this implementation is to make the balance of random shuffling and GPU memory usage.
Actually, we had an experiment before (7 years ago) for ASR between utterance-level shuffling and batch-level shuffling, and the difference was marginal (but this experiment causes different effective batch sizes, and the comparison could have been better).
Also, some people even sort it from short to long for all utterances and report that it is better (due to curriculum learning effects).
So, the entirely random shuffling may not be needed.
However, this is an old experience.
Nowadays, many technologies have changed, and we may have different conclusions. It's worth revisiting.
Also, we started to use fixed-length utterances (with padding) in some projects, where we can perform random shuffling for all utterances.
It would be great if you could do some investigations.
from espnet.
Related Issues (20)
- TSE with Librimix: mismatch in number of speakers HOT 4
- Streaming ASR model latency issue HOT 6
- asr_train.py: error: unrecognized arguments: use_lora HOT 1
- Espnet Collect stats: s3prl Upstream 'hubert-large-ll60k' HOT 5
- How to use 960h LM? HOT 1
- about dc_crn training HOT 1
- Cannot retrieve the public link of the file when running espnet_tts_demo
- [ERROR] The torch version has been changed. Please report to espnet administrators make: *** [Makefile:203: fairscale.done] Error 1
- [ERROR] The torch version has been changed. Please report to espnet administrators make: *** [Makefile:203: fairscale.done] Error 1
- Which class or python script is used for training the linear preencoder used in the config file used during an ASR task which uses representations from XLSR-128 model??
- How to use whisper as frontend?
- Does conformer support batch_size > 1 in ASR task inference? HOT 1
- How to accelerate ESPnet models using TensorRT. HOT 1
- Multilingual ASR with Auxiliary CTC objectives HOT 5
- issue: frontend embed in the latest version.
- Bugs in reproducing VoxtLM v1 HOT 14
- Regarding the reconstruction of some models using keras3 code. HOT 1
- Lora finetune HOT 11
- Teacher forcing vs knowledge distillation
- How to extract voice embeddings? HOT 6
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from espnet.