Comments (6)
Your training seems to break down as you are getting super large objective values
@fmetze Florian once saw this issue on the Switchboard 3x setup. Could you shed some light on how you solved this?
from eesen.
It might help to go back to the original values, or even set num_sequence=10 and lower frame_num_limit starting from the iteration in which training breaks down. Can you try this and let us know what you find?
On Jul 22, 2016, at 11:53 PM, Yajie Miao - Author of EESEN and PDNN [email protected] wrote:
Your training seems to break down as you are getting super large objective values
@fmetze https://github.com/fmetze Florian once saw this issue on the Switchboard 3x setup. Could you shed some light on how you solved this?—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub #73 (comment), or mute the thread https://github.com/notifications/unsubscribe-auth/AEnA8YEu5ogj1zB8KYE2EqvrpKGTMusqks5qYTvDgaJpZM4JTJ_S.
Florian Metze http://www.cs.cmu.edu/directory/florian-metze
Associate Research Professor
Carnegie Mellon University
from eesen.
I set the values back to the original and the outcome was the same, the
objective values are exploding in the 14th iteration of training.
Deepak Vinayak Kadetotad
Ph.D. Student
School of Electrical, Computer and Energy Engineering
Arizona State University
Office: ISTB 541
email: [email protected]
On Fri, Jul 22, 2016 at 3:00 PM, Florian Metze [email protected]
wrote:
It might help to go back to the original values, or even set
num_sequence=10 and lower frame_num_limit starting from the iteration in
which training breaks down. Can you try this and let us know what you find?On Jul 22, 2016, at 11:53 PM, Yajie Miao - Author of EESEN and PDNN <
[email protected]> wrote:Your training seems to break down as you are getting super large
objective values
@fmetze https://github.com/fmetze Florian once saw this issue on the
Switchboard 3x setup. Could you shed some light on how you solved this?—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub <
https://github.com/srvk/eesen/issues/73#issuecomment-234666268>, or mute
the thread <
https://github.com/notifications/unsubscribe-auth/AEnA8YEu5ogj1zB8KYE2EqvrpKGTMusqks5qYTvDgaJpZM4JTJ_S
.Florian Metze http://www.cs.cmu.edu/directory/florian-metze
Associate Research Professor
Carnegie Mellon University—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
#73 (comment), or mute
the thread
https://github.com/notifications/unsubscribe-auth/AS_rDBvPFVBcysvy-3fhtArSlysra9jBks5qYT1jgaJpZM4JTJ_S
.
from eesen.
My suggestion is to decrease frame_num_limit to a smaller number. This reduces deltas of gradients in each step and may improve training stability. I normally set it to 25000
If this still does not work, you may consider running train_ctc_parallel.sh (without frame skipping) for the purpose of diagnosis/debugging.
from eesen.
These steps helped me alot and I'm now not encountering any nans or huge values (-1e+30) for Obj(log[Pzx]) in training anymore:
(1) Check for utterances, where were are more targets than frames to align. Especially if your setup is similar to the v2 scripts with frame subsampling.
I added this check in src/netbin/train-ctc-parallel.cc, after the check for too many frames in line 152-155:
// Check that we have enough frames to align them to the targets
if (mat.NumRows() <= targets.size()*2+1) {
KALDI_WARN << utt << ", has not enough frames to align them to its targets; ignoring: no. frames " << mat.NumRows() << " < no. targets " << targets.size();
continue;
}
This will also output problematic sequences and give you an idea if your setup is ok. E.g. for my particular setup (German), the subsampling of 3 was too much and made a significant amount of sequences too short (>50%), 2 was much better.
(2) In the current Eesen version, only the LSTM-layer class does gradient clipping. I found it helpful to also add gradient clipping to the AffineLayer class, which is usually used at the end of the network.
(3) Try out my eesen branch with adaptive learning (Adagrad, RMSProp). I'm achieving good results with RMSProp. It effectively scales the learn rate individually for each parameter of the network in the updates, which results in better/faster convergence and more stable training. I've also added code for (2) to that branch.
from eesen.
Benjamin, thanks for all of this - I wanted to look at this for a long time, but am swamped with other things right now. Improving stability and looking at your other improvements is on the list of things to do.
Regarding the above: I had added that same check at the script level (train_ctc_parallel_x3a.sh), just before doing the sub-sampling, and it had also worked. I am simply dropping utterances that are too short at that point. I think this may not have made it into the release version of the scripts either, but that is also something that I want to work on before Christmas. By default, I'd rather treat this kind of stuff on the script level than in C code, because it is more transparent for the user. Will see what works best.
Anyway, thanks for all your contributions!
from eesen.
Related Issues (20)
- Clean up v2 for swb
- DeepBiLSTM HOT 2
- Missing label.counts HOT 3
- Query on LibriSpeech Character Error Rate HOT 2
- difference in output labels HOT 1
- Memory Leak HOT 1
- failed: Dim() == v.Dim() HOT 2
- Potential overflow when calculating exp
- Clarification Regarding Using WFST decoding HOT 1
- Installing error HOT 8
- Training Error when run tedlium recipe HOT 2
- LatticeFasterDecoder failed with "link_extra_cost == link_extra_cost" HOT 1
- Cannot install openfst-1.4.1 HOT 2
- Read failure in ReadBasicType, file position is -1, next char is -1
- KALDI_ASSERT: at train-ctc-parallel:AddMatMat:cuda-matrix.cc:570, failed: m == NumCols()
- Why do we need space and unk symbols in the char mode for acoustic model? HOT 6
- Why do we need to compile the tokens to FST in wsj recipe?
- Can not run training program with cuda 10.2 HOT 3
- Librispeech - Training starting error HOT 3
- Determinizability of TLG.fst in the phonetic case
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from eesen.