seannaren / deepspeech.pytorch Goto Github PK

View Code? Open in Web Editor NEW

2.1K 2.1K 620.0 742 KB

Speech Recognition using DeepSpeech2.

License: MIT License

Python 99.46% Dockerfile 0.54%

deepspeech.pytorch's People

Contributors

Stargazers

Watchers

Forkers

jfsantos dengcy028 chagge fireae rintukutum cosmmb hedgefair likeucode seominlee dpressel sunghoon vikingmew waffle-iron egorlakomkin zzmjohn ryanleary xiao2mo kartikaudhkhasi rajivpoddar ybzhou animeshkoratana mit456 aitorbajo fanlamda siddgururani argilla-io aaronzira saurabhvyas winterstar ml-lab waltandrews artemiyfirsov slbinilkumar dmitriy-serdyuk dimatter sunnydreamrain 3dmm-icme2023 jinserk neozhangthe1 picopoco zironycho weixsong hylihitic qianqq archermmt amrit778 kiranscaria superhg2012 jiancao92 tanyufei sojvai willqucd chico2121 geniki lngao waikimlwk entn-at yopatapon mattmacy bugcheck tivaro arnaudmkonan pbaljeka agangzz ankitmundada starsalt shubhampachori12110095 deepestcyber pengfei2017 mschonwe silasxue azaz59 pustar hcheng826 youngstu xiaohujecky lab930boss sungjinlees zhangyaoyuan hyzcn lifelongeek minushuang taras-sereda amairesse cutlass90 shaitender-intg blair-young mehrdad-shokri alpha0422 sprapat sameerkhurana10 lihao0214 xjs924 guilk airc-keti hammashamzah joosthub tifosi528 kwnsiy vojtsek

deepspeech.pytorch's Issues

What kind of RTF should one expect?

Hi there,

Just out of curiosity, what kind of RTF can one expect from your code?
I want to test it out, but I was curious to know ahead of time if the decoding is fairly fast.

Thank you!
Miguel

Unable to run an4.py from data/ directory

Dear friends,

When trying to run an4.py from ./data I get:

dlm@vm001nc6:~/code/deepspeech.pytorch/data$ python an4.py
Traceback (most recent call last):
File "an4.py", line 8, in
from data.utils import create_manifest
ModuleNotFoundError: No module named 'data'

I guess when I try to run an4.py from data dir (as explained in the README.md), the data module is not resolved.

David

RuntimeError: bool value of Variable objects containing non-empty torch.IntTensor is ambiguous

The error happens after the end of the first epoch.
What might be the cause of this?

Epoch: [1][40175/40178]	Time 1.165 (0.568)	Data 0.003 (0.002)	Loss 140.8571 (87.2619)	
Epoch: [1][40176/40178]	Time 1.227 (0.568)	Data 0.001 (0.002)	Loss 149.9813 (87.2635)	
Epoch: [1][40177/40178]	Time 1.282 (0.568)	Data 0.002 (0.002)	Loss 106.1354 (87.2639)	
Epoch: [1][40178/40178]	Time 0.895 (0.568)	Data 0.001 (0.002)	Loss 113.1760 (87.2641)	
Training Summary Epoch: [1]	Average Loss 87.265	
---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
/home/lintangsutawika/deeplearning/deepspeech.pytorch/train.py in <module>()
    320 
    321 if __name__ == '__main__':
--> 322     main()

/home/lintangsutawika/deeplearning/deepspeech.pytorch/train.py in main()
    261             sizes = Variable(input_percentages.mul_(int(seq_length)).int())
    262 
--> 263             decoded_output = decoder.decode(out.data, sizes)
    264             target_strings = decoder.process_strings(decoder.convert_to_strings(split_targets))
    265             wer, cer = 0, 0

/home/lintangsutawika/deeplearning/deepspeech.pytorch/decoder.pyc in decode(self, probs, sizes)
    140         """
    141         _, max_probs = torch.max(probs.transpose(0, 1), 2)
--> 142         strings = self.convert_to_strings(max_probs.view(max_probs.size(0), max_probs.size(1)), sizes)
    143         return self.process_strings(strings, remove_repetitions=True)

/home/lintangsutawika/deeplearning/deepspeech.pytorch/decoder.pyc in convert_to_strings(self, sequences, sizes)
     44         for x in xrange(len(sequences)):
     45             string = self.convert_to_string(sequences[x])
---> 46             string = string[0:int(sizes.data[x])] if sizes else string
     47             strings.append(string)
     48         return strings

/home/lintangsutawika/anaconda2/lib/python2.7/site-packages/torch/autograd/variable.pyc in __bool__(self)
    119             return False
    120         raise RuntimeError("bool value of Variable objects containing non-empty " +
--> 121                            torch.typename(self.data) + " is ambiguous")
    122 
    123     __nonzero__ = __bool__

RuntimeError: bool value of Variable objects containing non-empty torch.IntTensor is ambiguous

Add parameter count on Training/Benchmarking

Continue_from checkpoint fails with --cuda

This is a reproduction of a traceback I didn't save from last week from a real run. Seems to happen only when the --cuda flag is enabled during the initial run and the continuation. If not using cuda for either, the run continues as expected. Made a quick hacky workaround last week, but it involved hardcoding the last learning rate, and gpu utilization was severely affected.

aaron@...$ python train.py --train_manifest data/ted_train_manifest.csv --val_manifest data/ted_val_manifest.csv --checkpoint --checkpoint_per_batch 20 --cuda --continue_from models/deepspeech_checkpoint_epoch_1_iter_20.pth.tar
Directory already exists.
Loading checkpoint model models/deepspeech_checkpoint_epoch_1_iter_20.pth.tar
Traceback (most recent call last):
  File "train.py", line 321, in <module>
    main()
  File "train.py", line 137, in main
    model.load_state_dict(package['state_dict'])
  File "/usr/local/lib/python2.7/dist-packages/torch/nn/modules/module.py", line 331, in load_state_dict
    .format(name))
KeyError: 'unexpected key "conv.0.weight" in state_dict'

Batch benchmarking script

To help determine what batch size to use, create a script that will test tensors through the entire training process to determine if a batch size will not cause OOM issues

train complete but TypeError: checkpoint() takes at least 5 arguments (4 given)

Epoch: [70][48/48]	Time 12.067 (9.480)	Data 0.003 (0.019)	Loss 0.2493 (6.2196)	
Training Summary Epoch: [70]	Average Loss 0.184	
Validation Summary Epoch: [70]	Average WER 25	Average CER 14	
Learning rate annealed to: 0.000000
Traceback (most recent call last):
  File "train.py", line 318, in <module>
    main()
  File "train.py", line 314, in main
    torch.save(checkpoint(model, optimizer, args, len(labels)), args.final_model_path)
TypeError: checkpoint() takes at least 5 arguments (4 given)

Segfault: warp-ctc pytorch bindings

Hello, I'm trying to use CTC for an OCR which uses a similar architecture, following the examples at https://github.com/SeanNaren/warp-ctc/blob/pytorch_bindings/pytorch_binding/README.md.

I was initially having a problem with the types (IntTensor/FloatTensor), which I believe I fixed with some trial and error, but right now I'm stuck at a segfault.

Can you tell me what probs_sizes, and label sizes are? I don't understand this in deepspeech.pytorch:

 sizes = Variable(input_percentages.mul_(int(seq_length)).int(), requires_grad=False)

Noise injection

Add noise injection capabilities that were dumped here into our dataloader.

Noise injection - files used

What sorts of durations are folks using for injected noise files? Any ideas on whether it's likely to be problematic if the ~2000 files I'm using are all less than one second?

librosa.core.load is significantly slower than scipy.io.wavfile.read

I have observed that loading wav files through librosa.core.load is slower than with scipy's wavfile.read function.

IHere are two examples(same setup: LibriSpeech train, batch size 64, two GPU 1080, single layer GRU 1024 hidden units, 4 workers ):

with librosa.core.load

Epoch: [1][800/4395]	Time 0.534 (0.571)	Data 0.002 (0.215)	Loss 177.7152 (132.5715)	
Epoch: [1][801/4395]	Time 2.908 (0.574)	Data 2.372 (0.218)	Loss 178.4938 (132.6288)	
Epoch: [1][802/4395]	Time 1.918 (0.576)	Data 1.382 (0.219)	Loss 170.5456 (132.6761)	
Epoch: [1][803/4395]	Time 1.359 (0.577)	Data 0.821 (0.220)	Loss 172.4971 (132.7257)	
Epoch: [1][804/4395]	Time 1.007 (0.577)	Data 0.473 (0.221)	Loss 173.2764 (132.7761)	
Epoch: [1][805/4395]	Time 2.924 (0.580)	Data 2.387 (0.223)	Loss 170.5575 (132.8231)	
Epoch: [1][806/4395]	Time 1.580 (0.581)	Data 0.993 (0.224)	Loss 171.3481 (132.8709)	
Epoch: [1][807/4395]	Time 1.711 (0.583)	Data 1.173 (0.225)	Loss 171.6878 (132.9190)	
Epoch: [1][808/4395]	Time 1.234 (0.584)	Data 0.684 (0.226)	Loss 180.7255 (132.9781)	
Epoch: [1][809/4395]	Time 1.770 (0.585)	Data 1.232 (0.227)	Loss 165.3421 (133.0181)	
Epoch: [1][810/4395]	Time 3.278 (0.588)	Data 2.740 (0.230)	Loss 166.3688 (133.0593)	
Epoch: [1][811/4395]	Time 1.039 (0.589)	Data 0.503 (0.231)	Loss 174.8942 (133.1109)	
Epoch: [1][812/4395]	Time 0.936 (0.589)	Data 0.398 (0.231)	Loss 173.7952 (133.1610)	
Epoch: [1][813/4395]	Time 2.226 (0.591)	Data 1.685 (0.233)	Loss 177.2127 (133.2152)	
Epoch: [1][814/4395]	Time 3.571 (0.595)	Data 3.034 (0.236)	Loss 167.6428 (133.2575)	
Epoch: [1][815/4395]	Time 0.548 (0.595)	Data 0.002 (0.236)	Loss 169.2704 (133.3017)	
Epoch: [1][816/4395]	Time 0.867 (0.595)	Data 0.330 (0.236)	Loss 166.2436 (133.3420)	
Epoch: [1][817/4395]	Time 2.086 (0.597)	Data 1.538 (0.237)	Loss 181.1502 (133.4006)	
Epoch: [1][818/4395]	Time 3.903 (0.601)	Data 3.295 (0.241)	Loss 171.9818 (133.4477)	
Epoch: [1][819/4395]	Time 0.543 (0.601)	Data 0.002 (0.241)	Loss 175.6589 (133.4993)	
Epoch: [1][820/4395]	Time 0.545 (0.601)	Data 0.002 (0.241)	Loss 174.9106 (133.5498)	
Epoch: [1][821/4395]	Time 2.316 (0.603)	Data 1.776 (0.242)	Loss 168.7184 (133.5926)	
Epoch: [1][822/4395]	Time 3.403 (0.607)	Data 2.852 (0.246)	Loss 174.2117 (133.6420)	
Epoch: [1][823/4395]	Time 0.700 (0.607)	Data 0.164 (0.246)	Loss 166.9638 (133.6825)	
Epoch: [1][824/4395]	Time 0.548 (0.607)	Data 0.002 (0.245)	Loss 183.0482 (133.7424)	
Epoch: [1][825/4395]	Time 2.556 (0.609)	Data 1.995 (0.247)	Loss 170.0922 (133.7865)

and with scipy.io.wavfile.read

Epoch: [1][800/4395]	Time 0.998 (0.486)	Data 0.002 (0.005)	Loss 182.8188 (137.1815)	
Epoch: [1][801/4395]	Time 0.988 (0.486)	Data 0.002 (0.005)	Loss 182.8905 (137.2385)	
Epoch: [1][802/4395]	Time 0.978 (0.487)	Data 0.002 (0.005)	Loss 177.0392 (137.2881)	
Epoch: [1][803/4395]	Time 0.966 (0.488)	Data 0.002 (0.005)	Loss 176.3289 (137.3368)	
Epoch: [1][804/4395]	Time 0.993 (0.488)	Data 0.002 (0.005)	Loss 177.1688 (137.3863)	
Epoch: [1][805/4395]	Time 0.951 (0.489)	Data 0.002 (0.005)	Loss 173.7921 (137.4315)	
Epoch: [1][806/4395]	Time 0.999 (0.490)	Data 0.002 (0.005)	Loss 176.1973 (137.4796)	
Epoch: [1][807/4395]	Time 0.995 (0.490)	Data 0.002 (0.005)	Loss 176.3397 (137.5278)	
Epoch: [1][808/4395]	Time 0.968 (0.491)	Data 0.002 (0.005)	Loss 184.6344 (137.5861)	
Epoch: [1][809/4395]	Time 0.997 (0.491)	Data 0.002 (0.005)	Loss 170.4413 (137.6267)	
Epoch: [1][810/4395]	Time 0.951 (0.492)	Data 0.002 (0.005)	Loss 169.0714 (137.6655)	
Epoch: [1][811/4395]	Time 1.005 (0.493)	Data 0.002 (0.005)	Loss 176.8274 (137.7138)	
Epoch: [1][812/4395]	Time 0.998 (0.493)	Data 0.002 (0.005)	Loss 178.3829 (137.7639)	
Epoch: [1][813/4395]	Time 1.010 (0.494)	Data 0.002 (0.005)	Loss 181.8083 (137.8181)	
Epoch: [1][814/4395]	Time 1.006 (0.494)	Data 0.002 (0.005)	Loss 168.5279 (137.8558)	
Epoch: [1][815/4395]	Time 0.990 (0.495)	Data 0.002 (0.005)	Loss 171.7180 (137.8973)	
Epoch: [1][816/4395]	Time 1.025 (0.496)	Data 0.002 (0.005)	Loss 169.9290 (137.9366)	
Epoch: [1][817/4395]	Time 1.006 (0.496)	Data 0.002 (0.005)	Loss 183.6840 (137.9926)	
Epoch: [1][818/4395]	Time 0.992 (0.497)	Data 0.002 (0.005)	Loss 176.9357 (138.0402)	
Epoch: [1][819/4395]	Time 1.038 (0.498)	Data 0.002 (0.005)	Loss 180.7055 (138.0923)	
Epoch: [1][820/4395]	Time 0.994 (0.498)	Data 0.002 (0.005)	Loss 179.9813 (138.1434)	
Epoch: [1][821/4395]	Time 1.000 (0.499)	Data 0.002 (0.005)	Loss 171.4475 (138.1839)	
Epoch: [1][822/4395]	Time 1.025 (0.499)	Data 0.002 (0.005)	Loss 180.0951 (138.2349)	
Epoch: [1][823/4395]	Time 0.989 (0.500)	Data 0.002 (0.005)	Loss 172.6573 (138.2767)	
Epoch: [1][824/4395]	Time 1.010 (0.501)	Data 0.002 (0.005)	Loss 185.8300 (138.3345)	
Epoch: [1][825/4395]	Time 1.040 (0.501)	Data 0.002 (0.005)	Loss 175.1313 (138.3791)

RuntimeError: CUDNN_STATUS_INTERNAL_ERROR

this happened when i set cuda=True

)
Traceback (most recent call last):
  File "train.py", line 318, in <module>
    main()
  File "train.py", line 182, in main
    out = model(inputs)
  File "/usr/local/lib/python2.7/dist-packages/torch/nn/modules/module.py", line 206, in __call__
    result = self.forward(*input, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/torch/nn/parallel/data_parallel.py", line 61, in forward
    outputs = self.parallel_apply(replicas, inputs, kwargs)
  File "/usr/local/lib/python2.7/dist-packages/torch/nn/parallel/data_parallel.py", line 71, in parallel_apply
    return parallel_apply(replicas, inputs, kwargs)
  File "/usr/local/lib/python2.7/dist-packages/torch/nn/parallel/parallel_apply.py", line 45, in parallel_apply
    raise output
RuntimeError: CUDNN_STATUS_INTERNAL_ERROR

Batch_norm W, b gradients for RNN0 layer are None

I was in the process of adding Tensorboard logging to the training process when I noticed that some of the gradients were None which led me to investigate this further.

To investigate, I passed a single batch of data and ran a backward pass. I then iterate over the named parameters and check for any None gradients. In doing so, I found this:

module.rnns.0.batch_norm.module.weight True
module.rnns.0.batch_norm.module.bias True

True indicates that they have require_grad set to True.

It doesn't seem like these gradients are meant to be zero so I've raised an issue. Can someone explain to me why this would be the case?

Timeline for TED?

Any idea when will you be expanding other than an4? https://github.com/mozilla/DeepSpeech/blob/master/util/importers/ted.py

How can I help/contribute?

Create release script for model

Currently if a model is trained via cuda, due to data parallel it has to be loaded with the cuda flag set to true. Create a script to take a cuda based model and save a CPU based model that can be loaded.

Validation loss increasing while WER decreases

I would like to believe that the model is overfitting but why would the WER keep decreasing if it was overfitting?

The architecture is as follows:
500 hidden size
5 RNN layers
default LR
1.001 annealing factor which I'm increasing by 0.001 every epoch.

I'm training using Librispeech train-clean-100.tar.gz and validating on dev-clean.tar.gz

Implement Row Convolution in PyTorch

This will give us the ability to do a real time model, being able to predict just using forward only RNNs.

Loss is 0.000 When using --cuda parameter.

The loss seems to be always zero when i use train.py with --cuda

Epoch: [1][1/14063]	Time 1.228 (1.228)	Data 0.105 (0.105)	Loss 0.0000 (0.0000)	
Epoch: [1][2/14063]	Time 0.135 (0.681)	Data 0.002 (0.053)	Loss 0.0000 (0.0000)	
Epoch: [1][3/14063]	Time 0.140 (0.501)	Data 0.002 (0.036)	Loss 0.0000 (0.0000)	
Epoch: [1][4/14063]	Time 0.138 (0.410)	Data 0.002 (0.028)	Loss 0.0000 (0.0000)	
Epoch: [1][5/14063]	Time 0.140 (0.356)	Data 0.002 (0.022)	Loss 0.0000 (0.0000)	
Epoch: [1][6/14063]	Time 0.145 (0.321)	Data 0.003 (0.019)	Loss 0.0000 (0.0000)	
Epoch: [1][7/14063]	Time 0.142 (0.295)	Data 0.000 (0.016)	Loss 0.0000 (0.0000)	
Epoch: [1][8/14063]	Time 0.151 (0.277)	Data 0.002 (0.015)	Loss 0.0000 (0.0000)	
Epoch: [1][9/14063]	Time 0.154 (0.264)	Data 0.002 (0.013)	Loss 0.0000 (0.0000)	
Epoch: [1][10/14063]	Time 0.156 (0.253)	Data 0.002 (0.012)	Loss 0.0000 (0.0000)

default train.py works normally.

Epoch: [1][1/14063]	Time 2.203 (2.203)	Data 0.100 (0.100)	Loss 113.9308 (113.9308)	
Epoch: [1][2/14063]	Time 2.132 (2.167)	Data 0.000 (0.050)	Loss 130.0348 (121.9828)	
Epoch: [1][3/14063]	Time 2.235 (2.190)	Data 0.002 (0.034)	Loss 118.3163 (120.7607)	
Epoch: [1][4/14063]	Time 2.307 (2.219)	Data 0.014 (0.029)	Loss 101.6445 (115.9816)	
Epoch: [1][5/14063]	Time 2.482 (2.272)	Data 0.020 (0.027)	Loss 75.0004 (107.7854)	
Epoch: [1][6/14063]	Time 2.604 (2.327)	Data 0.014 (0.025)	Loss 62.9188 (100.3076)

What seems to be the problem? Could this be due to my misconfigurations?

flags --visdom and --continue_from break the code

I was in the process of introducing Tensorboard logging when I noticed bugs in how visdom was dealing with continuing from a checkpoint model.

Essentially, there are two bugs that I found and fixed.

The variable epoch is being incremented wrongly inside the visdom logging section in line 289 of train.py
When loading the loss_results, wer_results, cer_results from the package, the current statement assigns the variables directly from the dictionary. However, these variables are shorter buffers that only contain the results until the checkpoint epoch. Since they were assigned directly, the three variables are no longer of length = args.epochs and therefore give an IndexError when they are eventually accessed.

I've fixed these bugs and also added tensorboard logging in my fork: https://github.com/SiddGururani/deepspeech.pytorch

Shall I submit a pull request and we can discuss the tensorboard logging part in further detail there?

Missing `<stdexcept>`

In order to build on ubuntu 17.04 using gcc-4.7 I had to add this include.

torch/lib/THPP/Tensor.hpp:#include <stdexcept>

It may not have been the best place but at least you now know.
Otherwise, many pages of warnings and errors.

Also, sudo was required to install to /usr folders.

KenLM integration (Beam search)

To fully get deepspeech integration, there needs to be a beam search across a language model constrained to a dictionary. I know a few people have been working on this recently and this issue will monitor progress!

In addition there is C code for KenLM beam search here for Tensorflow that should be portable from what I can see here.

Memory leak

Training process introduce memory leak that prevents model to be trained on larger dataset for long time.

I haven't dig into issue deep enough but here are my observations (i've adapted code to work with vctk dataset): each iteration increase memory consumption of main process by some number (~1.5G for vctk, don't have exact number), on each iteration worker processes double memory requirements (they're fork()ed on each iteration).

Right now I'm thinking the problem is inside data loader and/or it integration. But again, I haven't dig into that because of time constraints.

I'm able to train the model on VCTK for 9 iterations with default parameters. My current workaround is to resume training from last checkpoint.

Language model to predict.py

I am very new in this field, I ran train.py on librispeech dataset after complete training I ran test.py as following -

python test.py --model_path models/deepspeech_final.pth.tar --val_manifest data/libri_test_manifest.csv --cuda

I got following output:
Validation Summary Average WER 3.972 Average CER 0.747

One more thing I noticed in test.py argument is --val_manifest, that is validation manifest file, But I think it should be --test_manifest?

Now I wanted to test the same model on unseen data using predict.py, but how do I include language model?

Pre-trained models tracker

On each of the datasets provided, we must train a Deepspeech model. The overall architecture is encompassed in this command:

python train.py  --rnn_type gru --hidden_size 800 --hidden_layers 5 --checkpoint --visdom --train_manifest /path/to/train_manifest.csv --val_manifest /path/to/val_manifest.csv --epochs 100 --num_workers $(nproc) --cuda

In the above command you must replace the manifests paths with the correct paths to the dataset. A few notes:

No noise injection for the pre-trained models, or augmentations
Train till convergence (should get a nice smooth training curve hopefully!)
For smaller datasets, you may need to reduce the learning rate annealing by adding the flag --learning anneal and setting it to a smaller value, like 1.01. For larger datasets, the default is fine (up to around 4.5k hours from internal testing on the deepspeech.torch version)

A release will be cut from the DeepSpeech package that will have the models, and a reference to the latest release added to the README to find latest models!

Progress tracker for datasets:

AN4
TEDLium
LibriSpeech

Let me know if you plan on working on running any of these, and I'll update the ticket with details!

Convergence speed difference between pytorch and torch

I installed both torch and pytoch version of deep speech2.
However, I noticed that torch version converges much faster than pytorch on AN4 dataset.

In pytorch version, I followed default setting of torch version by using the following command:
python train.py --cuda --train_manifest data/an4_train_manifest.csv --val_manifest data/an4_val_manifest.csv --rnn_type rnn --hidden_size 1760 --hidden_layers 7 --noise_prob 0

I can not figure out what makes a big convergence speed difference.

mismatch in supported batch size between benchmarking script and actual training script

Hi! I'm playing around with deepspeech with the Librispeech dataset. Initially, when I blindly ran the training script with a large batch size I ran into an out-of-memory error, as expected.

So then I checked the largest sequence length in the librispeech data(around 2900 for my the default audio config) and ran the benchmarking script with --seconds 30. Turns out I can only support a batch size of 10. But when I let my model train using this batch-size I run into an OOM error in the final few batches.

Why is this the case? Since the benchmarking script is testing the exact same model configuration (I'm going with the default config) with random input data of size larger than any of the batches in Librispeech, shouldn't it run to completion?

what torch and pytorch version? TypeError: forward() takes exactly 3 arguments (2 given) when training

DataParallel (
  (module): DeepSpeech (
    (conv): Sequential (
      (0): Conv2d(1, 32, kernel_size=(41, 11), stride=(2, 2))
      (1): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True)
      (2): Hardtanh (min_val=0, max_val=20, inplace)
      (3): Conv2d(32, 32, kernel_size=(21, 11), stride=(2, 1))
      (4): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True)
      (5): Hardtanh (min_val=0, max_val=20, inplace)
    )
    (rnns): Sequential (
      (0): BatchLSTM (
        (batch_norm): SequenceWise (
        Sequential (
          (0): Linear (672 -> 400)
          (1): BatchNorm1d(400, eps=1e-05, momentum=0.1, affine=True)
        ))
        (rnn): LSTM(672, 400, bias=False, bidirectional=True)
      )
      (1): BatchLSTM (
        (batch_norm): SequenceWise (
        Sequential (
          (0): Linear (400 -> 400)
          (1): BatchNorm1d(400, eps=1e-05, momentum=0.1, affine=True)
        ))
        (rnn): LSTM(400, 400, bias=False, bidirectional=True)
      )
      (2): BatchLSTM (
        (batch_norm): SequenceWise (
        Sequential (
          (0): Linear (400 -> 400)
          (1): BatchNorm1d(400, eps=1e-05, momentum=0.1, affine=True)
        ))
        (rnn): LSTM(400, 400, bias=False, bidirectional=True)
      )
      (3): BatchLSTM (
        (batch_norm): SequenceWise (
        Sequential (
          (0): Linear (400 -> 400)
          (1): BatchNorm1d(400, eps=1e-05, momentum=0.1, affine=True)
        ))
        (rnn): LSTM(400, 400, bias=False, bidirectional=True)
      )
    )
    (fc): Sequential (
      (0): SequenceWise (
      Sequential (
        (0): BatchNorm1d(400, eps=1e-05, momentum=0.1, affine=True)
        (1): Linear (400 -> 29)
      ))
    )
  )
)
Traceback (most recent call last):
  File "train.py", line 263, in <module>
    main()
  File "train.py", line 145, in main
    out = model(inputs)
  File "/usr/local/lib/python2.7/dist-packages/torch/nn/modules/module.py", line 210, in __call__
    result = self.forward(*input, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/torch/nn/parallel/data_parallel.py", line 40, in forward
    return self.module(input.cuda(self.device_ids[0]))
  File "/usr/local/lib/python2.7/dist-packages/torch/nn/modules/module.py", line 210, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/demobin/work/github/deepspeech.pytorch/model.py", line 94, in forward
    x = self.rnns(x)
  File "/usr/local/lib/python2.7/dist-packages/torch/nn/modules/module.py", line 210, in __call__
    result = self.forward(*input, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/torch/nn/modules/container.py", line 63, in forward
    input = module(input)
  File "/usr/local/lib/python2.7/dist-packages/torch/nn/modules/module.py", line 210, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/demobin/work/github/deepspeech.pytorch/model.py", line 48, in forward
    x, _ = self.rnn(x)
  File "/usr/local/lib/python2.7/dist-packages/torch/nn/modules/module.py", line 210, in __call__
    result = self.forward(*input, **kwargs)
TypeError: forward() takes exactly 3 arguments (2 given)

Error while executing training script

I have got following error:
ImportError: /usr/local/lib/python2.7/dist-packages/torch/lib/libgomp.so.1: version `GOMP_4.0' not found (required by /home/nitin/warp-ctc/build/libwarpctc.so)

I installed pytorch from pip ,my current gcc version is 4.9..

Pre-trained models

Any pre-trained models available?

function augment_audio_with_sox check

In dataloader.py
under func augment_audio_with_sox
I think
y = load_audio(path)
should be replaced to
y = load_audio(augmented_filename)

Am I right?
and do I have to post this kind of thing in the Issues? (I am newbie to github)

Support checkpointing/continue training

Support Batch RNNs

Typical Deepspeech architecture uses batch normalized BRNNs. Implement this to stay true to the architecture.

Issue w/ TED Dataloader

It seems (at least in my setup) that something breaks at the end of the first epoch -- probably due to an issue in the dataloader(?). Haven't dug much into it yet.

Epoch: [1][4244/4549]	Time 0.584 (0.309)	Data 0.002 (0.004)	Loss 190.2870 (149.7884)
Epoch: [1][4245/4549]	Time 0.576 (0.309)	Data 0.002 (0.004)	Loss 220.6672 (149.8051)
Epoch: [1][4246/4549]	Time 0.582 (0.309)	Data 0.002 (0.004)	Loss 206.2761 (149.8184)
Epoch: [1][4247/4549]	Time 0.598 (0.309)	Data 0.002 (0.004)	Loss 194.7978 (149.8290)
Epoch: [1][4248/4549]	Time 0.603 (0.309)	Data 0.002 (0.004)	Loss 190.6617 (149.8386)
Epoch: [1][4249/4549]	Time 0.583 (0.309)	Data 0.002 (0.004)	Loss 189.1278 (149.8478)
Epoch: [1][4250/4549]	Time 0.601 (0.310)	Data 0.002 (0.004)	Loss 204.6577 (149.8607)
Epoch: [1][4251/4549]	Time 0.585 (0.310)	Data 0.001 (0.004)	Loss 247.6785 (149.8837)
Epoch: [1][4252/4549]	Time 0.616 (0.310)	Data 0.002 (0.004)	Loss 223.9250 (149.9012)
Epoch: [1][4253/4549]	Time 0.592 (0.310)	Data 0.002 (0.004)	Loss 201.2460 (149.9132)
Epoch: [1][4254/4549]	Time 0.590 (0.310)	Data 0.002 (0.004)	Loss 226.0778 (149.9311)
Epoch: [1][4255/4549]	Time 0.592 (0.310)	Data 0.002 (0.004)	Loss 183.1511 (149.9389)
Epoch: [1][4256/4549]	Time 0.601 (0.310)	Data 0.002 (0.004)	Loss 202.0417 (149.9512)
Epoch: [1][4257/4549]	Time 0.585 (0.310)	Data 0.001 (0.004)	Loss 192.3692 (149.9611)
Epoch: [1][4258/4549]	Time 0.617 (0.310)	Data 0.002 (0.004)	Loss 210.9887 (149.9755)
Epoch: [1][4259/4549]	Time 0.616 (0.310)	Data 0.002 (0.004)	Loss 222.2541 (149.9925)
Epoch: [1][4260/4549]	Time 0.605 (0.310)	Data 0.002 (0.004)	Loss 171.1521 (149.9974)
Epoch: [1][4261/4549]	Time 0.606 (0.310)	Data 0.002 (0.004)	Loss 234.3608 (150.0172)
Epoch: [1][4262/4549]	Time 0.608 (0.310)	Data 0.001 (0.004)	Loss 215.2132 (150.0325)
Traceback (most recent call last):
  File "train.py", line 318, in <module>
    main()
  File "train.py", line 169, in main
    for i, (data) in enumerate(train_loader, start=start_iter):
  File "/home/ryan/anaconda3/envs/pytorch/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 210, in __next__
    return self._process_next_batch(batch)
  File "/home/ryan/anaconda3/envs/pytorch/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 237, in _process_next_batch
    raise batch.exc_type(batch.exc_msg)
RuntimeError: Traceback (most recent call last):
  File "/home/ryan/anaconda3/envs/pytorch/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 41, in _worker_loop
    samples = collate_fn([dataset[i] for i in batch_indices])
  File "/home/ryan/devel/deepspeech.pytorch/data/data_loader.py", line 125, in _collate_fn
    targets = torch.IntTensor(targets)
RuntimeError: tried to construct a tensor from a int sequence, but found an item of type NoneType at index (2794)```

Batch sizes

I'm trying to get some insight into batch sizes and whether or not the performance I'm seeing is expected. It seems that I can't set batch sizes much more than say, 32, w/ my dual Titan Xs. It's further my understanding that dataparallel will split that batch of 32 across the two GPUs for an effective batch size of 16 per gpu per batch. The model I'm training is all default: 4 LSTM layers w/ 400 hidden units. Now this is a fair amount different than many of the DeepSpeech 2 configurations in the paper, but I am seeing references to them having batch sizes of 512 spread over 8 Titan X's. This implies that whatever system they're running allows them to support batches of 64 per gpu. Seems to me we should be able to get closer to this number unless I'm missing something. Any thoughts?

visdom assert 'AssertionError: Y should be same size as X'

Epoch: [70][48/48]	Time 0.377 (0.251)	Data 0.002 (0.012)	Loss 0.1478 (4.6092)	
Training Summary Epoch: [70]	Average Loss 0.095	
Validation Summary Epoch: [70]	Average WER 25	Average CER 11	
Traceback (most recent call last):
  File "train.py", line 324, in <module>
    main()
  File "train.py", line 306, in main
    update='replace',
  File "/usr/local/lib/python2.7/dist-packages/visdom/__init__.py", line 179, in result
    return fn(*args, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/visdom/__init__.py", line 603, in line
    append=update == 'append', opts=opts)
  File "/usr/local/lib/python2.7/dist-packages/visdom/__init__.py", line 179, in result
    return fn(*args, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/visdom/__init__.py", line 461, in updateTrace
    assert Y.shape == X.shape, 'Y should be same size as X'
AssertionError: Y should be same size as X

Installation on Anaconda

Hi,

I have trouble installing the py_torch bindings using anaconda. I'm using Ubuntu 14.04.3, and gcc 5.4.1 and cuda 8.
Everythings works fine untill i try to import the warp bindings in python:

>>> from warpctc_pytorch import CTCLoss
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/sorson/anaconda3/lib/python3.6/site-packages/warpctc_pytorch/__init__.py", line 7, in <module>
    from ._warp_ctc import lib as _lib, ffi as _ffi
ImportError: /home/sorson/anaconda3/lib/python3.6/site-packages/torch/lib/../../../../libgomp.so.1: version `GOMP_4.0' not found (required by /home/sorson/warp-ctc/build/libwarpctc.so)

Output of installation:

sorson@phoebe:~/warp-ctc/build$ cmake ..
-- The C compiler identification is GNU 5.4.1
-- The CXX compiler identification is GNU 5.4.1
-- Check for working C compiler: /usr/bin/cc
-- Check for working C compiler: /usr/bin/cc -- works
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working CXX compiler: /usr/bin/c++
-- Check for working CXX compiler: /usr/bin/c++ -- works
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Found CUDA: /usr/local/cuda-8.0 (found suitable version "8.0", minimum required is "6.5") 
-- cuda found TRUE
CMake Warning at CMakeLists.txt:48 (FIND_PACKAGE):
  By not providing "FindTorch.cmake" in CMAKE_MODULE_PATH this project has
  asked CMake to find a package configuration file provided by "Torch", but
  CMake did not find one.

  Could not find a package configuration file provided by "Torch" with any of
  the following names:

    TorchConfig.cmake
    torch-config.cmake

  Add the installation prefix of "Torch" to CMAKE_PREFIX_PATH or set
  "Torch_DIR" to a directory containing one of the above files.  If "Torch"
  provides a separate development package or SDK, be sure it has been
  installed.


-- Torch found Torch_DIR-NOTFOUND
-- Building shared library with GPU support
-- Configuring done
-- Generating done
-- Build files have been written to: /home/sorson/warp-ctc/build

sorson@phoebe:~/warp-ctc/build$ make
[ 25%] Building NVCC (Device) object CMakeFiles/warpctc.dir/src/./warpctc_generated_reduce.cu.o
[ 50%] Building NVCC (Device) object CMakeFiles/warpctc.dir/src/./warpctc_generated_ctc_entrypoint.cu.o
Scanning dependencies of target warpctc
Linking CXX shared library libwarpctc.so
[ 50%] Built target warpctc
Scanning dependencies of target test_cpu
[ 75%] Building CXX object CMakeFiles/test_cpu.dir/tests/test_cpu.cpp.o
Linking CXX executable test_cpu
[ 75%] Built target test_cpu
[100%] Building NVCC (Device) object CMakeFiles/test_gpu.dir/tests/./test_gpu_generated_test_gpu.cu.o
Scanning dependencies of target test_gpu
Linking CXX executable test_gpu
[100%] Built target test_gpu

sorson@phoebe:~/warp-ctc/pytorch_binding$ python setup.py install
generating build/_warp_ctc.c
regenerated: 'build/_warp_ctc.c'
running install
running build
running build_py
creating build/lib.linux-x86_64-3.6
creating build/lib.linux-x86_64-3.6/warpctc_pytorch
copying warpctc_pytorch/__init__.py -> build/lib.linux-x86_64-3.6/warpctc_pytorch
running build_ext
building 'warpctc_pytorch._warp_ctc' extension
creating build/temp.linux-x86_64-3.6
creating build/temp.linux-x86_64-3.6/build
creating build/temp.linux-x86_64-3.6/home
creating build/temp.linux-x86_64-3.6/home/sorson
creating build/temp.linux-x86_64-3.6/home/sorson/warp-ctc
creating build/temp.linux-x86_64-3.6/home/sorson/warp-ctc/pytorch_binding
creating build/temp.linux-x86_64-3.6/home/sorson/warp-ctc/pytorch_binding/src
gcc -pthread -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -I/home/sorson/anaconda3/lib/python3.6/site-packages/torch/utils/ffi/../../lib/include -I/home/sorson/anaconda3/lib/python3.6/site-packages/torch/utils/ffi/../../lib/include/TH -I/home/sorson/anaconda3/lib/python3.6/site-packages/torch/utils/ffi/../../lib/include/THC -I/usr/local/cuda/include -I/home/sorson/warp-ctc/include -I/home/sorson/anaconda3/include/python3.6m -c build/_warp_ctc.c -o build/temp.linux-x86_64-3.6/build/_warp_ctc.o -std=c++11 -fPIC -DWARPCTC_ENABLE_GPU
cc1: warning: command line option ‘-std=c++11’ is valid for C++/ObjC++ but not for C
gcc -pthread -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -I/home/sorson/anaconda3/lib/python3.6/site-packages/torch/utils/ffi/../../lib/include -I/home/sorson/anaconda3/lib/python3.6/site-packages/torch/utils/ffi/../../lib/include/TH -I/home/sorson/anaconda3/lib/python3.6/site-packages/torch/utils/ffi/../../lib/include/THC -I/usr/local/cuda/include -I/home/sorson/warp-ctc/include -I/home/sorson/anaconda3/include/python3.6m -c /home/sorson/warp-ctc/pytorch_binding/src/binding.cpp -o build/temp.linux-x86_64-3.6/home/sorson/warp-ctc/pytorch_binding/src/binding.o -std=c++11 -fPIC -DWARPCTC_ENABLE_GPU
cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++
g++ -pthread -shared -L/home/sorson/anaconda3/lib -Wl,-rpath=/home/sorson/anaconda3/lib,--no-as-needed build/temp.linux-x86_64-3.6/build/_warp_ctc.o build/temp.linux-x86_64-3.6/home/sorson/warp-ctc/pytorch_binding/src/binding.o -L/home/sorson/warp-ctc/build -L/home/sorson/anaconda3/lib -Wl,--enable-new-dtags,-R/home/sorson/warp-ctc/build -lwarpctc -lpython3.6m -o build/lib.linux-x86_64-3.6/warpctc_pytorch/_warp_ctc.cpython-36m-x86_64-linux-gnu.so
running install_lib
copying build/lib.linux-x86_64-3.6/warpctc_pytorch/_warp_ctc.cpython-36m-x86_64-linux-gnu.so -> /home/sorson/anaconda3/lib/python3.6/site-packages/warpctc_pytorch
copying build/lib.linux-x86_64-3.6/warpctc_pytorch/__init__.py -> /home/sorson/anaconda3/lib/python3.6/site-packages/warpctc_pytorch
byte-compiling /home/sorson/anaconda3/lib/python3.6/site-packages/warpctc_pytorch/__init__.py to __init__.cpython-36.pyc
running install_egg_info
Removing /home/sorson/anaconda3/lib/python3.6/site-packages/warpctc_pytorch-0.1-py3.6.egg-info
Writing /home/sorson/anaconda3/lib/python3.6/site-packages/warpctc_pytorch-0.1-py3.6.egg-info

Segmentation fault (core dumped)

When run "python train.py", there is an error appeared: "Segmentation fault (core dumped)."

Support Librispeech dataloader

Error when import warpctc_pytorch

Hello,
I am using Ubuntu 16.04 and pytorch 0.1.12_2. When I make warpctc following README, I meet the following problem:

from warpctc_pytorch import CTCLoss
---------------------------------------------------------------------------
ImportError                               Traceback (most recent call last)
<ipython-input-4-fdfcfb094b24> in <module>()
----> 1 from warpctc_pytorch import CTCLoss

/usr/local/lib/python2.7/dist-packages/warpctc_pytorch/__init__.pyc in <module>()
      5 from torch.nn.modules.loss import _assert_no_grad
      6 from torch.utils.ffi import _wrap_function
----> 7 from ._warp_ctc import lib as _lib, ffi as _ffi
      8
      9 __all__ = []

ImportError: /usr/local/lib/python2.7/dist-packages/torch/lib/libgomp.so.1: version `GOMP_4.0' not found (required by /usr/local/warp-ctc/build/libwarpctc.so)

Anyone can me?

'iteration' argument is not set when continuing from checkpoint

When I try to do continue from the checkpoint deepspeech_6.pth.tar I receive this:

Loading checkpoint model ./models/deepspeech_6.pth.tar
6
  File "train.py", line 324, in <module>
    main()
  File "train.py", line 147, in main
    start_iter = int(package.get('iteration', -1)) + 1
ValueError: invalid literal for int() with base 10: 'N/A'

Annealing learning rate

Build with Anaconda GCC: Error build warp-ctc

Using Anaconda GCC, when I used the command make to build warp-ctc, I got this:

dlm@vm001nc6:~/code/warp-ctc/build$ make
CMake Warning at /home/dlm/anaconda3/share/cmake-3.6/Modules/FindCUDA.cmake:779 (message):
  Expecting to find librt for libcudart_static, but didn't find it.
Call Stack (most recent call first):
  CMakeLists.txt:20 (FIND_PACKAGE)


-- cuda found TRUE
-- Found Torch7 in /home/dlm/torch/install
-- Torch found /home/dlm/torch/install/share/cmake/torch
-- Building shared library with GPU support
-- Building Torch Bindings with GPU support
-- Configuring done
-- Generating done
-- Build files have been written to: /home/dlm/code/warp-ctc/build
[ 10%] Building NVCC (Device) object CMakeFiles/warpctc.dir/src/warpctc_generated_reduce.cu.o
[ 20%] Building NVCC (Device) object CMakeFiles/warpctc.dir/src/warpctc_generated_ctc_entrypoint.cu.o
[ 30%] Linking CXX shared library libwarpctc.so
[ 30%] Built target warpctc
Scanning dependencies of target test_cpu
[ 40%] Building CXX object CMakeFiles/test_cpu.dir/tests/test_cpu.cpp.o
[ 50%] Linking CXX executable test_cpu
/home/dlm/torch/install/lib/libTHC.so: undefined reference to `std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::~basic_string()@GLIBCXX_3.4.21'
/home/dlm/torch/install/lib/libTHC.so: undefined reference to `std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::_M_append(char const*, unsigned long)@GLIBCXX_3.4.21'
/home/dlm/torch/install/lib/libTHC.so: undefined reference to `std::random_device::_M_init(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)@GLIBCXX_3.4.21'
/home/dlm/torch/install/lib/libTHC.so: undefined reference to `std::runtime_error::runtime_error(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)@GLIBCXX_3.4.21'
/home/dlm/torch/install/lib/libTHC.so: undefined reference to `std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::_M_replace(unsigned long, unsigned long, char const*, unsigned long)@GLIBCXX_3.4.21'
/home/dlm/torch/install/lib/libTHC.so: undefined reference to `std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::_M_create(unsigned long&, unsigned long)@GLIBCXX_3.4.21'
collect2: error: ld returned 1 exit status
CMakeFiles/test_cpu.dir/build.make:110: recipe for target 'test_cpu' failed
make[2]: *** [test_cpu] Error 1
CMakeFiles/Makefile2:67: recipe for target 'CMakeFiles/test_cpu.dir/all' failed
make[1]: *** [CMakeFiles/test_cpu.dir/all] Error 2
Makefile:127: recipe for target 'all' failed
make: *** [all] Error 2
dlm@vm001nc6:~/code/warp-ctc/build$

But with Ubuntu GCC, everything is fine.

Output Layer Type

I'm working on implementing a beam decoder for this, and just realized that the output values do not appear to be posteriors. In model.py I see the output layer is a Linear layer. Why not a Softmax or LogSoftmax activation? I suppose such a layer is not strictly necessary when doing a Greedy decode (and less efficient), but will be necessary for more complicated decoders. Just wondering if there's a specific reason or if there's something I'm missing.

Thanks!

Error when running train.py

I checked out the previous issues regarding similiar matter, but the solutions don't seem to help.

Traceback (most recent call last):
  File "train.py", line 11, in <module>
    from data.data_loader import AudioDataLoader, SpectrogramDataset
  File "/home/lintangsutawika/deeplearning/deepspeech.pytorch/data/__init__.py", line 1, in <module>
    from . import data_loader
  File "/home/lintangsutawika/deeplearning/deepspeech.pytorch/data/data_loader.py", line 8, in <module>
    import torchaudio
  File "build/bdist.linux-x86_64/egg/torchaudio/__init__.py", line 5, in <module>
    
  File "build/bdist.linux-x86_64/egg/torchaudio/_ext/th_sox/__init__.py", line 3, in <module>
    
  File "build/bdist.linux-x86_64/egg/torchaudio/_ext/th_sox/_th_sox.py", line 7, in <module>
  File "build/bdist.linux-x86_64/egg/torchaudio/_ext/th_sox/_th_sox.py", line 6, in __bootstrap__
ImportError: /home/lintangsutawika/anaconda2/lib/python2.7/site-packages/torch/lib/libgomp.so.1: version `GOMP_4.0' not found (required by /usr/lib/x86_64-linux-gnu/libsox.so.2)

FP16 support

I've opened a branch called fp_16 that has a benchmark script for fp_16.

I haven't got official confirmation from anyone at NVIDIA, but it seems like consumer cards (post Titan X maxwell) have been nerfed thus perform poorly using FP16 unless it is the Tesla P100 (to differentiate both markets, more info here).

Measure the memory usage as well as the time taken for the benchmark script on various hardware at default settings. @ryanleary would it be possible to get benchmark times using this script? IIRC you have Titan X Maxwells?

deepspeech.Pytorch vs deepspeech.Torch Memory Usage

Dear Friends,

I would like to know if it is possible to optimize the memory usage of the DeepSpeach.Pytorch?

In my tests, DeepSpeeach.Pytorch uses about 9 GB while DeepSpeeach.Torch uses 5 GB training AN4.

Thanks,

David

DEEPSPEECH.TORCH
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 375.39                 Driver Version: 375.39                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Tesla K80           Off  | 7C1A:00:00.0     Off |                    0 |
| N/A   72C    P0   112W / 149W |   5006MiB / 11439MiB |     99%      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID  Type  Process name                               Usage      |
|=============================================================================|
|    0      8700    C   /home/dlm/torch/install/bin/luajit            5004MiB |
+-----------------------------------------------------------------------------+

DEEPSPEECH.PYTORCH
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 375.39                 Driver Version: 375.39                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Tesla K80           Off  | 7C1A:00:00.0     Off |                    0 |
| N/A   70C    P0   115W / 149W |   9076MiB / 11439MiB |     88%      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID  Type  Process name                               Usage      |
|=============================================================================|
|    0     10611    C   python                                        9072MiB |
+-----------------------------------------------------------------------------+

During training utterances are always processed sorted by duration

As I remember in DeepSpeech paper the samples were sorted by duration only during the first epoch? Maybe it would make sense to switch from the sequential sampler to random sampler in AudioDataLoader after the first epoch?

Unable to create an4 dataset using Python 3.6

Dear friends,

In order to run an4.py from inside the data dir (as described by README.md) I fix the data.utils in an4.py with the following changes:

#from data.utils import create_manifest
from utils import create_manifest

If I am using Python 3.6:

Now, I am getting following error:

Traceback (most recent call last):
File "an4.py", line 84, in
main()
File "an4.py", line 72, in main
_format_data(root_path, 'train', name, 'an4_clstk')
File "an4.py", line 31, in _format_data
_format_files(file_ids, new_transcript_path, new_wav_path, transcripts, wav_path)
File "an4.py", line 57, in _format_files
file.write(extracted_transcript)
TypeError: a bytes-like object is required, not 'str'

David

pytorch_binding

DeepSpeech.PyTorch stops working after installing Torch to use also DeepSpeech.Torch

Dear friends,

My DeepSpeech.PyTorch stopped working after installing Torch to use also DeepSpeech.Torch. See the logs bellow. It is very similar with an another issue of the repo and they said we should use another gcc, but I am not sure exactly what is the REAL problem.

If a move the torch installation directory, DeepSpeech.PyTorch works again! If a move the torch installation directory back, DeepSpeech.PyTorch fails!

> dlm@vm001nc6:~/code/deepspeech.pytorch$
> dlm@vm001nc6:~/code/deepspeech.pytorch$
> dlm@vm001nc6:~/code/deepspeech.pytorch$ python train.py --train_manifest data/train_manifest.csv --val_manifest data/val_manifest.csv
> Traceback (most recent call last):
>   File "train.py", line 9, in <module>
>     from warpctc_pytorch import CTCLoss
>   File "/home/dlm/anaconda3/lib/python3.6/site-packages/warpctc_pytorch/__init__.py", line 7, in <module>
>     from ._warp_ctc import lib as _lib, ffi as _ffi
> ImportError: /home/dlm/anaconda3/lib/python3.6/site-packages/torch/lib/../../../../libgomp.so.1: version `GOMP_4.0' not found (required by /home/dlm/torch/install/lib/libwarpctc.so)
> dlm@vm001nc6:~/code/deepspeech.pytorch$

seannaren / deepspeech.pytorch Goto Github PK

deepspeech.pytorch's People

Contributors

Stargazers

Watchers

Forkers

deepspeech.pytorch's Issues

Recommend Projects

Recommend Topics

Recommend Org

Jobs