Comments (12)
It's clearly still a problem with the memory. Probably you can simply remove the last mini-batch (utterance set), or the longest several utterances, from your training set. I am redesigning the data IO manner, which hopefully can solve the issue more elegantly.
from eesen.
Hi @rightfront, are you using Linux ? If so, could you please provide the output of the command "free -h" before and after the execution of the script train_ctc_parallel.sh (preferably just after having rebooted the machine) ?
I'm also experiencing memory errors with eesen, where RAM is allocated by the program but weirdly not given back to the OS when one iteration of train-ctc-parallel is over.
from eesen.
@jlerouge - I've actually removed the last few thousand utterances from my set as @yajiemiao suggested, and that seems to have 'fixed' my issue - I've been able to run several iterations of train-ctc-parallel now.
If it fails again, I will let you know the results of the free memory test.
from eesen.
@rightfront I have the same problem with you. I just use more than one hundred utterances,but memory allocation failure.Can you give me some suggestions or thoughts?
from eesen.
as a follow up, there were updates last week which hopefully resolve this issue. now the size of the buffer is determined prior to (rather than after) adding the next utterance
from eesen.
@jlerouge FYI kaldi-asr/kaldi#473
from eesen.
Thank you very much! This is exactly what I've encountered.
Le 7 juin 2016 05:01, "Feiteng Li" [email protected] a Γ©crit :
@jlerouge https://github.com/jlerouge FYI kaldi-asr/kaldi#473
kaldi-asr/kaldi#473β
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#35 (comment), or mute
the thread
https://github.com/notifications/unsubscribe/ABwMRlfHgTY9h_9lZYBxiDLLXttCKsyNks5qJN8GgaJpZM4HVkoc
.
from eesen.
Hi there, I am trying to run the TEDLIUM (v2) recipe on a g2.2 AWS instance (4GB RAM on a K520) and I get the same (or similar error). What should be the memory requirement to run this recipe without issues?
VLOG1 After 35000 sequences (53.5094Hr): Obj(log[Pzx]) = -229.912 TokenAcc = 53.9965%
VLOG1 After 36000 sequences (56.0337Hr): Obj(log[Pzx]) = -235.598 TokenAcc = 54.5475%
WARNING (train-ctc-parallel:MallocInternal():cuda-device.cc:658) Allocation of 18620 rows, each of size 2560 bytes failed, releasing cached memory and retrying.
WARNING (train-ctc-parallel:MallocInternal():cuda-device.cc:665) Allocation failed for the second time. Printing device memory usage and exiting
LOG (train-ctc-parallel:PrintMemoryUsage():cuda-device.cc:334) Memory used: 4180164608 bytes.
ERROR (train-ctc-parallel:MallocInternal():cuda-device.cc:668) Memory allocation failure
WARNING (train-ctc-parallel:Close():kaldi-io.cc:446) Pipe copy-feats scp:exp/train_char_l5_c320/train_local.scp
I am running with the default settings of:
input_feat_dim=120 # dimension of the input features; we will use 40-dimensional fbanks with deltas and double deltas
lstm_layer_num=5 # number of LSTM layers
lstm_cell_dim=320 # number of memory cells in every LSTM layer
As I said, I have a AWS instance (g2.2) with 4GB RAM on K520. I tried with both CUDA 6.5 and 7.5. I seemed to recall that there was some weird memory leak in CUDA 7... but I get the same error.
To try to get one iteration to run, I have changed the nnet topo to
input_feat_dim=120
lstm_layer_num=3
lstm_cell_dim=240
Any thoughts?
Thank you!
from eesen.
@yajiemiao I was able to successfully run one iteration with the smaller network topology. So, maybe it's just that I need more than 4Gb of RAM on the GPU to run the recipe. Do you have a "rule of thumb" on how much RAM is needed for a given task? For example, this is just about 200 hours of speech... what if I run with 2000 hours? What about 10,000 hours? Thanks for your help!
from eesen.
The RAM you need depends on the amount of data that you attempt to hold in memory for one mini-batch update, not the total amount of data that you process. You should also play with the num_sequence and frame_num_limit (set to 10000, maybe) parameters, which determine how many utterances you process in parallel. By reducing these, you can keep the model size the same, but process less utterances in parallel, thus requiring less memory. Unfortunately we do not have a real rule of thumb for this, but we are also working successfully on AWS instances.
from eesen.
@fmetze thank you for your response! I will give that a try in the next few days!
from eesen.
Yep, that worked. I used '--frame-num-limit 10000' instead of the 25000 used in the recipe by default. Thank you!
from eesen.
Related Issues (20)
- Clean up v2 for swb
- DeepBiLSTM HOT 2
- Missing label.counts HOT 3
- Query on LibriSpeech Character Error Rate HOT 2
- difference in output labels HOT 1
- Memory Leak HOT 1
- failed: Dim() == v.Dim() HOT 2
- Potential overflow when calculating exp
- Clarification Regarding Using WFST decoding HOT 1
- Installing error HOT 8
- Training Error when run tedlium recipe HOT 2
- LatticeFasterDecoder failed with "link_extra_cost == link_extra_cost" HOT 1
- Cannot install openfst-1.4.1 HOT 2
- Read failure in ReadBasicType, file position is -1, next char is -1
- KALDI_ASSERT: at train-ctc-parallel:AddMatMat:cuda-matrix.cc:570, failed: m == NumCols()
- Why do we need space and unk symbols in the char mode for acoustic model? HOT 6
- Why do we need to compile the tokens to FST in wsj recipe?
- Can not run training program with cuda 10.2 HOT 3
- Librispeech - Training starting error HOT 3
- Determinizability of TLG.fst in the phonetic case
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
π Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. πππ
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google β€οΈ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from eesen.