Comments (2)
Yes, this is intentional. Basically, there are 2 options which I think make sense for CTCLoss:
- "mean" - average everything across sequence length and batch (Notice that this is the default behavior for Pytorch)
- Sum losses over sequence lengths and then average over the batch.
We found out empirically that option (2) works best. While longer sequences do make greater impact, in this case, keep in mind that in our setup: (1) we randomly shuffle examples and (2) cap the max duration to 16.7 seconds.
But, perhaps, we should expose (1) as an option.
from nemo.
(1) we randomly shuffle examples
Don't you sort by duration (so that duration is similar within the batch) by default?
https://github.com/NVIDIA/NeMo/blob/master/collections/nemo_asr/nemo_asr/parts/manifest.py#L129
But, perhaps, we should expose ("mean") as an option.
Yeah, I wonder if longer sequences indeed provide more reliable gradients. If it's not the case, then rising learning rate should have somewhat similar impact.
from nemo.
Related Issues (20)
- Prepare a dataset for speaker verification training
- Which container to use for nemofw-inference. HOT 1
- Python 3.11 torch.jit.script segmentation fault HOT 4
- Exception running inference with MCore Distributed Checkpoint with different TP setting than training
- Can't load MSDD model trained together with speaker embs extractor "speaker_model_cfg is not in struct" HOT 3
- ASR model not accepting audio argument for direct transcription of numpy or PyTorch tensor HOT 6
- using onnx model for language identification inference
- .nemo export to .onnx error after updating from tags/v1.21.0 to tags/v1.22.0
- fastconformer collapse after some improvment, seems to be irrelevant on the amount of data. HOT 6
- LoRA training with FSDP has spike in train loss HOT 4
- Typo this needs to be replace HOT 3
- Training conformer interCTC with init_from_nemo_model argument is broke HOT 7
- GPT fp8 pretrain not working on H20: Caught signal 8 (Floating point exception: integer divide by zero) HOT 2
- Error: convert clip model to nemo!
- Speaker_Diarization_Inference.
- canary-1b not proccessing json files correctly HOT 8
- Can't start to finetune and can't pull nvcr.io/nvidia/nemo:24.01 or nvcr.io/ea-bignlp/ga-participants/nemofw-training:24.01 HOT 2
- When I set PP_size > 1 I get DP errpr for llama2 7b PEFT experiment using P-tuning
- MSDD Bad performances in long form and clustering.
- Cannot install version 1.23.0 because of missing megatron-core version HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from nemo.