GithubHelp home page GithubHelp logo

parlance / ctcdecode Goto Github PK

View Code? Open in Web Editor NEW
812.0 22.0 240.0 771 KB

PyTorch CTC Decoder bindings

License: MIT License

Python 32.42% C++ 67.58%
machine-learning pytorch ctc ctc-loss beam-search decoder

ctcdecode's Introduction

ctcdecode

ctcdecode is an implementation of CTC (Connectionist Temporal Classification) beam search decoding for PyTorch. C++ code borrowed liberally from Paddle Paddles' DeepSpeech. It includes swappable scorer support enabling standard beam search, and KenLM-based decoding. If you are new to the concepts of CTC and Beam Search, please visit the Resources section where we link a few tutorials explaining why they are needed.

Installation

The library is largely self-contained and requires only PyTorch. Building the C++ library requires gcc or clang. KenLM language modeling support is also optionally included, and enabled by default.

The below installation also works for Google Colab.

# get the code
git clone --recursive https://github.com/parlance/ctcdecode.git
cd ctcdecode && pip install .

How to Use

from ctcdecode import CTCBeamDecoder

decoder = CTCBeamDecoder(
    labels,
    model_path=None,
    alpha=0,
    beta=0,
    cutoff_top_n=40,
    cutoff_prob=1.0,
    beam_width=100,
    num_processes=4,
    blank_id=0,
    log_probs_input=False
)
beam_results, beam_scores, timesteps, out_lens = decoder.decode(output)

Inputs to CTCBeamDecoder

  • labels are the tokens you used to train your model. They should be in the same order as your outputs. For example if your tokens are the english letters and you used 0 as your blank token, then you would pass in list("_abcdefghijklmopqrstuvwxyz") as your argument to labels
  • model_path is the path to your external kenlm language model(LM). Default is none.
  • alpha Weighting associated with the LMs probabilities. A weight of 0 means the LM has no effect.
  • beta Weight associated with the number of words within our beam.
  • cutoff_top_n Cutoff number in pruning. Only the top cutoff_top_n characters with the highest probability in the vocab will be used in beam search.
  • cutoff_prob Cutoff probability in pruning. 1.0 means no pruning.
  • beam_width This controls how broad the beam search is. Higher values are more likely to find top beams, but they also will make your beam search exponentially slower. Furthermore, the longer your outputs, the more time large beams will take. This is an important parameter that represents a tradeoff you need to make based on your dataset and needs.
  • num_processes Parallelize the batch using num_processes workers. You probably want to pass the number of cpus your computer has. You can find this in python with import multiprocessing then n_cpus = multiprocessing.cpu_count(). Default 4.
  • blank_id This should be the index of the CTC blank token (probably 0).
  • log_probs_input If your outputs have passed through a softmax and represent probabilities, this should be false, if they passed through a LogSoftmax and represent negative log likelihood, you need to pass True. If you don't understand this, run print(output[0][0].sum()), if it's a negative number you've probably got NLL and need to pass True, if it sums to ~1.0 you should pass False. Default False.

Inputs to the decode method

  • output should be the output activations from your model. If your output has passed through a SoftMax layer, you shouldn't need to alter it (except maybe to transpose), but if your output represents negative log likelihoods (raw logits), you either need to pass it through an additional torch.nn.functional.softmax or you can pass log_probs_input=False to the decoder. Your output should be BATCHSIZE x N_TIMESTEPS x N_LABELS so you may need to transpose it before passing it to the decoder. Note that if you pass things in the wrong order, the beam search will probably still run, you'll just get back nonsense results.

Outputs from the decode method

4 things get returned from decode

  1. beam_results - Shape: BATCHSIZE x N_BEAMS X N_TIMESTEPS A batch containing the series of characters (these are ints, you still need to decode them back to your text) representing results from a given beam search. Note that the beams are almost always shorter than the total number of timesteps, and the additional data is non-sensical, so to see the top beam (as int labels) from the first item in the batch, you need to run beam_results[0][0][:out_len[0][0]].
  2. beam_scores - Shape: BATCHSIZE x N_BEAMS A batch with the approximate CTC score of each beam (look at the code here for more info). If this is true, you can get the model's confidence that the beam is correct with p=1/np.exp(beam_score).
  3. timesteps - Shape: BATCHSIZE x N_BEAMS The timestep at which the nth output character has peak probability. Can be used as alignment between the audio and the transcript.
  4. out_lens - Shape: BATCHSIZE x N_BEAMS. out_lens[i][j] is the length of the jth beam_result, of item i of your batch.

Online decoding

from ctcdecode import OnlineCTCBeamDecoder

decoder = OnlineCTCBeamDecoder(
    labels,
    model_path=None,
    alpha=0,
    beta=0,
    cutoff_top_n=40,
    cutoff_prob=1.0,
    beam_width=100,
    num_processes=4,
    blank_id=0,
    log_probs_input=False
)

state1 = ctcdecode.DecoderState(decoder)

probs_seq = torch.FloatTensor([probs_seq])
beam_results, beam_scores, timesteps, out_seq_len = decoder.decode(probs_seq[:, :2], [state1], [False])
beam_results, beam_scores, timesteps, out_seq_len = decoder.decode(probs_seq[:, 2:], [state1], [True])

The Online decoder is copying CTCBeamDecoder interface, but it requires states and is_eos_s sequences.

States are used to accumulate sequences of chunks, each corresponding to one data source. Is_eos_s tells the decoder whether the chunks have stopped being pushed to the corresponding state.

More examples

Get the top beam for the first item in your batch beam_results[0][0][:out_len[0][0]]

Get the top 50 beams for the first item in your batch

for i in range(50):
     print(beam_results[0][i][:out_len[0][i]])

Note, these will be a list of ints that need decoding. You likely already have a function to decode from int to text, but if not you can do something like. "".join[labels[n] for n in beam_results[0][0][:out_len[0][0]]] using the labels you passed in to CTCBeamDecoder

Resources

ctcdecode's People

Contributors

annagrr avatar brennv avatar bstriner avatar ctogle avatar joemathai avatar joshemorris avatar karimtarabishy avatar nikhilnagaraj avatar rbracco avatar reuben avatar ryanleary avatar seannaren avatar stas6626 avatar stefanocortinovis avatar unixnme avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

ctcdecode's Issues

segfault src/path_trie.cpp: No such file or directory.

Hello,

When I use language model in a binary format I get a segfault. I tried to run in gdb and it seems that path_trie.cpp is missing. What could be a problem?

Thread 24 "python" received signal SIGSEGV, Segmentation fault. [Switching to Thread 0x7fff45fff700 (LWP 9951)] PathTrie::get_path_trie (this=this@entry=0x7fff45ffebe0, new_char=new_char@entry=1, new_timestep=new_timestep@entry=0, reset=reset@entry=true) at /tmp/pip-qveo70c7-build/ctcdecode/src/path_trie.cpp:56 56 /tmp/pip-qveo70c7-build/ctcdecode/src/path_trie.cpp: No such file or directory.

Could you provide some examples/tutorials about "how to decode with a pre-trained language model"?

Hi, I recently tried to reproduce the experimental results on handwriting recognition in some papers. Fortunately, I find your implementation about ctc decoding is very helpful. However, I cannot find out any examples or tutorials about decoding with a pre-trained language model with your implementation. I am confused about following issues:

  • How to pre-train a language model if the model is purely character-based?
  • How to decode with a pre-trained language with your implementation (ctcdecode )?

It would be grateful if you could provide any suggestions/examples.
Thanks.

Add support for time-alignments

Return a matrix of time-alignment information to indicate at what index in the original output a character occurred. This will provide the capability of roughly aligning the audio to the output text.

Potential memory leak

Check bindings to ensure that all dynamically allocated memory is properly freed when no longer needed.

ModuleNotFoundError: No module named 'ctcdecode.ctcdecode._ext.ctc_decode._ctc_decode'

Hi,
after installation, I got modulenotfounderror when I call
from ctcdecode.ctcdecode import CTCBeamDecoder

File "/home-nfs/xx/speech/model_wsj_3layers.py", line 9, in <module>
    from ctcdecode.ctcdecode import CTCBeamDecoder
  File "/home-nfs/xx/speech/ctcdecode/ctcdecode/__init__.py", line 1, in <module>
    from ._ext import ctc_decode
  File "/home-nfs/xx/speech/ctcdecode/ctcdecode/_ext/ctc_decode/__init__.py", line 3, in <module>
    from ._ctc_decode import lib as _lib, ffi as _ffi
ModuleNotFoundError: No module named 'ctcdecode.ctcdecode._ext.ctc_decode._ctc_decode'

I dont see ._ctc_decode in the directory. How should I solve this problem? Thanks

Command "python setup.py egg_info" failed with error code 1 in /tmp/pip-req-build-j5vlyP/

When I git clone and type 'pip install .', I get the following error:

How can I resolve this? Thanks!

Complete output from command python setup.py egg_info:
zip_safe flag not set; analyzing archive contents...

Installed /tmp/pip-req-build-j5vlyP/.eggs/wget-3.2-py2.7.egg
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/tmp/pip-req-build-j5vlyP/setup.py", line 55, in <module>
    os.path.join(this_file, "build.py:ffi")
  File "/home/byuns9334/anaconda2/lib/python2.7/site-packages/setuptools/__init__.py", line 129, in setup
    return distutils.core.setup(**attrs)
  File "/home/byuns9334/anaconda2/lib/python2.7/distutils/core.py", line 111, in setup
    _setup_distribution = dist = klass(attrs)
  File "/home/byuns9334/anaconda2/lib/python2.7/site-packages/setuptools/dist.py", line 372, in __init__
    _Distribution.__init__(self, attrs)
  File "/home/byuns9334/anaconda2/lib/python2.7/distutils/dist.py", line 287, in __init__
    self.finalize_options()
  File "/home/byuns9334/anaconda2/lib/python2.7/site-packages/setuptools/dist.py", line 528, in finalize_options
    ep.load()(self, ep.name, value)
  File "/home/byuns9334/anaconda2/lib/python2.7/site-packages/cffi/setuptools_ext.py", line 204, in cffi_modules
    add_cffi_module(dist, cffi_module)
  File "/home/byuns9334/anaconda2/lib/python2.7/site-packages/cffi/setuptools_ext.py", line 49, in add_cffi_module
    execfile(build_file_name, mod_vars)
  File "/home/byuns9334/anaconda2/lib/python2.7/site-packages/cffi/setuptools_ext.py", line 25, in execfile
    exec(code, glob, glob)
  File "/tmp/pip-req-build-j5vlyP/build.py", line 22, in <module>
    'third_party/openfst-1.6.3.tar.gz')
  File "/tmp/pip-req-build-j5vlyP/build.py", line 15, in download_extract
    tar = tarfile.open(dl_path)
  File "/home/byuns9334/anaconda2/lib/python2.7/tarfile.py", line 1680, in open
    raise ReadError("file could not be opened successfully")
tarfile.ReadError: file could not be opened successfully

----------------------------------------

Command "python setup.py egg_info" failed with error code 1 in /tmp/pip-req-build-j5vlyP/

Import Error

Hi !

I can't import ctcdecode :

tbelos2@asus:~/socr$ python3
Python 3.6.5 |Anaconda, Inc.| (default, Apr 29 2018, 16:14:56) 
[GCC 7.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import ctcdecode
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/tbelos2/anaconda3/lib/python3.6/site-packages/ctcdecode/__init__.py", line 1, in <module>
    from ._ext import ctc_decode
  File "/home/tbelos2/anaconda3/lib/python3.6/site-packages/ctcdecode/_ext/ctc_decode/__init__.py", line 3, in <module>
    from ._ctc_decode import lib as _lib, ffi as _ffi
ImportError: /home/tbelos2/anaconda3/lib/python3.6/site-packages/ctcdecode/_ext/ctc_decode/_ctc_decode.abi3.so: undefined symbol: _Z17paddle_get_scorerddPKcS0_i
>>> 

certificate verify failed while install pip .

When I type 'pip install .', it causes error below :
I am using python 3.6, ubuntu 14.04.
Any idea how to resolve this?

kenkim@node10:/data3/kenkim/deepspeech.pytorch/ctcdecode$ pip install .
Processing /data3/kenkim/deepspeech.pytorch/ctcdecode
Complete output from command python setup.py egg_info:
Traceback (most recent call last):
File "/home/kenkim/anaconda3/lib/python3.6/urllib/request.py", line 1318, in do_open
encode_chunked=req.has_header('Transfer-encoding'))
File "/home/kenkim/anaconda3/lib/python3.6/http/client.py", line 1239, in request
self._send_request(method, url, body, headers, encode_chunked)
File "/home/kenkim/anaconda3/lib/python3.6/http/client.py", line 1285, in _send_request
self.endheaders(body, encode_chunked=encode_chunked)
File "/home/kenkim/anaconda3/lib/python3.6/http/client.py", line 1234, in endheaders
self._send_output(message_body, encode_chunked=encode_chunked)
File "/home/kenkim/anaconda3/lib/python3.6/http/client.py", line 1026, in _send_output
self.send(msg)
File "/home/kenkim/anaconda3/lib/python3.6/http/client.py", line 964, in send
self.connect()
File "/home/kenkim/anaconda3/lib/python3.6/http/client.py", line 1400, in connect
server_hostname=server_hostname)
File "/home/kenkim/anaconda3/lib/python3.6/ssl.py", line 401, in wrap_socket
_context=self, _session=session)
File "/home/kenkim/anaconda3/lib/python3.6/ssl.py", line 808, in init
self.do_handshake()
File "/home/kenkim/anaconda3/lib/python3.6/ssl.py", line 1061, in do_handshake
self._sslobj.do_handshake()
File "/home/kenkim/anaconda3/lib/python3.6/ssl.py", line 683, in do_handshake
self._sslobj.do_handshake()
ssl.SSLError: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:749)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/tmp/pip-w0kw5634-build/setup.py", line 55, in <module>
    os.path.join(this_file, "build.py:ffi")
  File "/home/kenkim/anaconda3/lib/python3.6/distutils/core.py", line 108, in setup
    _setup_distribution = dist = klass(attrs)
  File "/home/kenkim/anaconda3/lib/python3.6/site-packages/setuptools-27.2.0-py3.6.egg/setuptools/dist.py", line 318, in __init__
  File "/home/kenkim/anaconda3/lib/python3.6/distutils/dist.py", line 281, in __init__
    self.finalize_options()
  File "/home/kenkim/anaconda3/lib/python3.6/site-packages/setuptools-27.2.0-py3.6.egg/setuptools/dist.py", line 375, in finalize_options
  File "/home/kenkim/anaconda3/lib/python3.6/site-packages/cffi/setuptools_ext.py", line 187, in cffi_modules
    add_cffi_module(dist, cffi_module)
  File "/home/kenkim/anaconda3/lib/python3.6/site-packages/cffi/setuptools_ext.py", line 49, in add_cffi_module
    execfile(build_file_name, mod_vars)
  File "/home/kenkim/anaconda3/lib/python3.6/site-packages/cffi/setuptools_ext.py", line 25, in execfile
    exec(code, glob, glob)
  File "/tmp/pip-w0kw5634-build/build.py", line 24, in <module>
    'third_party/boost_1_63_0.tar.gz')
  File "/tmp/pip-w0kw5634-build/build.py", line 14, in download_extract
    out=dl_path)
  File "/home/kenkim/anaconda3/lib/python3.6/site-packages/wget-3.2-py3.6.egg/wget.py", line 526, in download
  File "/home/kenkim/anaconda3/lib/python3.6/urllib/request.py", line 248, in urlretrieve
    with contextlib.closing(urlopen(url, data)) as fp:
  File "/home/kenkim/anaconda3/lib/python3.6/urllib/request.py", line 223, in urlopen
    return opener.open(url, data, timeout)
  File "/home/kenkim/anaconda3/lib/python3.6/urllib/request.py", line 526, in open
    response = self._open(req, data)
  File "/home/kenkim/anaconda3/lib/python3.6/urllib/request.py", line 544, in _open
    '_open', req)
  File "/home/kenkim/anaconda3/lib/python3.6/urllib/request.py", line 504, in _call_chain
    result = func(*args)
  File "/home/kenkim/anaconda3/lib/python3.6/urllib/request.py", line 1361, in https_open
    context=self._context, check_hostname=self._check_hostname)
  File "/home/kenkim/anaconda3/lib/python3.6/urllib/request.py", line 1320, in do_open
    raise URLError(err)
urllib.error.URLError: <urlopen error [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:749)>

----------------------------------------

Command "python setup.py egg_info" failed with error code 1 in /tmp/pip-w0kw5634-build/

_ctc_decode import issue

Hello,

I am still experiencing the _ctc_decode import issue, with the latest pytorch-ctc checkout from the Git repository.

The relevant error lines are:

python

Python 2.7.12 (default, Nov 19 2016, 06:48:10)
[GCC 5.4.0 20160609] on linux2
Type "help", "copyright", "credits" or "license" for more information.

import torch
import pytorch_ctc
Traceback (most recent call last):
File "", line 1, in
File "/usr/local/lib/python2.7/dist-packages/pytorch_ctc/init.py", line 4, in
from ._ctc_decode import lib as _lib, ffi as _ffi
SystemError: dynamic module not initialized properly

Remove `merge_repeated` option

The merge_repeated behavior is incorrect when True (it does two passes of merge_repeated causing incorrect results). The way that the decoder works implicitly merges any repeated characters. Therefore, this option should be removed.

Parallelize Decode

Use threads to parallelize decoder either at utterance (ie item in batch) and/or beam level.

ImportError in mac

Python 2.7.15 |Anaconda, Inc.| (default, May 1 2018, 18:37:05)
[GCC 4.2.1 Compatible Clang 4.0.1 (tags/RELEASE_401/final)] on darwin
Type "help", "copyright", "credits" or "license" for more information.

import ctcdecode
Traceback (most recent call last):
File "", line 1, in
File "/Users/marxia/anaconda3/envs/py2/lib/python2.7/site-packages/ctcdecode/init.py", line 1, in
from ._ext import ctc_decode
File "/Users/marxia/anaconda3/envs/py2/lib/python2.7/site-packages/ctcdecode/_ext/ctc_decode/init.py", line 3, in
from ._ctc_decode import lib as _lib, ffi as _ffi
ImportError: dlopen(/Users/marxia/anaconda3/envs/py2/lib/python2.7/site-packages/ctcdecode/_ext/ctc_decode/_ctc_decode.so, 2): Symbol not found: __ZNSt12future_errorD1Ev
Referenced from: /Users/marxia/anaconda3/envs/py2/lib/python2.7/site-packages/ctcdecode/_ext/ctc_decode/_ctc_decode.so
Expected in: flat namespace
in /Users/marxia/anaconda3/envs/py2/lib/python2.7/site-packages/ctcdecode/_ext/ctc_decode/_ctc_decode.so

Add dictionary-only scorer

Support decodes based on a dictionary lexicon. Should be able to leverage the trie data structure and eliminate the LM.

Support for PyTorch 0.4

I'm trying to decode using a KenLM language model with pytorch 0.4 and I'm getting a seg fault (core dumped), probably because of the new tensor syntax.

What are the plans for pytorch 0.4 support?

Best,
Miguel

Making the vocabulary trie

@ryanleary I was trying to work with your fork at https://github.com/ryanleary/ctcdecode.
The vanilla decoder works like a charm. But I cant figure out how the trie is being made using the function you mentioned in the README.

import pytorch_ctc
 
lexicon = '~/language_modelling/Jaderberg_90k_lexicon.txt'
output_path = '~/tries/4gram_JaderbergLexicon/'
kenlm_path = '~/language_modelling/lm_4gram_on_lob_and_brown.klm'
labels = '_0123456789abcdefghijklmnopqrstuvwxyz '

pytorch_ctc.generate_lm_trie(lexicon, kenlm_path, output_path, labels, 0, 37)

Above is my script to generate the trie . The script runs without any errors. But nothing is being created at the specified output path

Could you please tell me If I am doing it right

Why is there a constant score for OOV?

This line gives a score of -1000 (which is declared here), to any n-gram which contains an OOV. Is this the right way to approach it? Isn't it possible to get the score for <unk> tokens from the LM and use that instead of using a hardcoded score?

SSL Error

I updated the certificate on my Linux machine, but it did not work. Any ideas?

$ pip install .
Processing /home/jennifer/git/ctcdecode
Complete output from command python setup.py egg_info:
zip_safe flag not set; analyzing archive contents...

Installed /tmp/pip-_3xikmb0-build/.eggs/wget-3.2-py3.6.egg
Traceback (most recent call last):
  File "/home/jennifer/anaconda3/lib/python3.6/urllib/request.py", line 1318, in do_open
    encode_chunked=req.has_header('Transfer-encoding'))
  File "/home/jennifer/anaconda3/lib/python3.6/http/client.py", line 1239, in request
    self._send_request(method, url, body, headers, encode_chunked)
  File "/home/jennifer/anaconda3/lib/python3.6/http/client.py", line 1285, in _send_request
    self.endheaders(body, encode_chunked=encode_chunked)
  File "/home/jennifer/anaconda3/lib/python3.6/http/client.py", line 1234, in endheaders
    self._send_output(message_body, encode_chunked=encode_chunked)
  File "/home/jennifer/anaconda3/lib/python3.6/http/client.py", line 1026, in _send_output
    self.send(msg)
  File "/home/jennifer/anaconda3/lib/python3.6/http/client.py", line 964, in send
    self.connect()
  File "/home/jennifer/anaconda3/lib/python3.6/http/client.py", line 1400, in connect
    server_hostname=server_hostname)
  File "/home/jennifer/anaconda3/lib/python3.6/ssl.py", line 407, in wrap_socket
    _context=self, _session=session)
  File "/home/jennifer/anaconda3/lib/python3.6/ssl.py", line 814, in __init__
    self.do_handshake()
  File "/home/jennifer/anaconda3/lib/python3.6/ssl.py", line 1068, in do_handshake
    self._sslobj.do_handshake()
  File "/home/jennifer/anaconda3/lib/python3.6/ssl.py", line 689, in do_handshake
    self._sslobj.do_handshake()
ssl.SSLError: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:833)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/tmp/pip-_3xikmb0-build/setup.py", line 55, in <module>
    os.path.join(this_file, "build.py:ffi")
  File "/home/jennifer/anaconda3/lib/python3.6/site-packages/setuptools/__init__.py", line 129, in setup
    return distutils.core.setup(**attrs)
  File "/home/jennifer/anaconda3/lib/python3.6/distutils/core.py", line 108, in setup
    _setup_distribution = dist = klass(attrs)
  File "/home/jennifer/anaconda3/lib/python3.6/site-packages/setuptools/dist.py", line 372, in __init__
    _Distribution.__init__(self, attrs)
  File "/home/jennifer/anaconda3/lib/python3.6/distutils/dist.py", line 281, in __init__
    self.finalize_options()
  File "/home/jennifer/anaconda3/lib/python3.6/site-packages/setuptools/dist.py", line 528, in finalize_options
    ep.load()(self, ep.name, value)
  File "/home/jennifer/anaconda3/lib/python3.6/site-packages/cffi/setuptools_ext.py", line 204, in cffi_modules
    add_cffi_module(dist, cffi_module)
  File "/home/jennifer/anaconda3/lib/python3.6/site-packages/cffi/setuptools_ext.py", line 49, in add_cffi_module
    execfile(build_file_name, mod_vars)
  File "/home/jennifer/anaconda3/lib/python3.6/site-packages/cffi/setuptools_ext.py", line 25, in execfile
    exec(code, glob, glob)
  File "/tmp/pip-_3xikmb0-build/build.py", line 24, in <module>
    'third_party/boost_1_63_0.tar.gz')
  File "/tmp/pip-_3xikmb0-build/build.py", line 14, in download_extract
    out=dl_path)
  File "/tmp/pip-_3xikmb0-build/.eggs/wget-3.2-py3.6.egg/wget.py", line 526, in download
  File "/home/jennifer/anaconda3/lib/python3.6/urllib/request.py", line 248, in urlretrieve
    with contextlib.closing(urlopen(url, data)) as fp:
  File "/home/jennifer/anaconda3/lib/python3.6/urllib/request.py", line 223, in urlopen
    return opener.open(url, data, timeout)
  File "/home/jennifer/anaconda3/lib/python3.6/urllib/request.py", line 526, in open
    response = self._open(req, data)
  File "/home/jennifer/anaconda3/lib/python3.6/urllib/request.py", line 544, in _open
    '_open', req)
  File "/home/jennifer/anaconda3/lib/python3.6/urllib/request.py", line 504, in _call_chain
    result = func(*args)
  File "/home/jennifer/anaconda3/lib/python3.6/urllib/request.py", line 1361, in https_open
    context=self._context, check_hostname=self._check_hostname)
  File "/home/jennifer/anaconda3/lib/python3.6/urllib/request.py", line 1320, in do_open
    raise URLError(err)
urllib.error.URLError: <urlopen error [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:833)>

----------------------------------------

Command "python setup.py egg_info" failed with error code 1 in /tmp/pip-_3xikmb0-build/

Num_time_steps calculation for batch inputs is wrong

When using ctcdecode with sequential data of variable output lengths, the smaller outputs are generally padded with zeros to compensate for the extra size of the largest sample. So, logically, when the ctc_beam_search_decoder loops through the timesteps of probs_seq at Link for code, it should stop at the timestep corresponding to the actual size of that sample's output instead of the length of the probs_seq, since probs_seq also has extra padding in batch mode. This causes in ctcdecode to add extra garbage characters at the end of its actual output.

Examples of such outputs are:

Example#1:
Prediction: didn't do before a ooooooh i o o t h o e l l e e e e e e e e e e e e e e o a o o ghx xxx xxx eee e

Reference: didn't god before
Example#2:
Prediction: and it may be a lot of things that are kind of true ornette e e l e e e e e e e e e e e a u n ghx xxx eee et

Reference: and it may be a lot of things that are kind of truer

I am using ctcdecode with the outputs from deepspeech.pytorch

I can think of two possible solutions for this:

  1. Pass the num_time_steps to ctc_beam_search_decoder as an argument:
    i.e. instead of
    size_t num_time_steps = probs_seq.size(); at line, it should be
    size_t num_time_steps = size # which is passed as an argument

  2. Add a check for some impossible probability outputs, such as -1 and break the loop whenever its true.
    I am currently using this hack in our system, and it seems to work! You can find it here
    For this to work, the outputs of the DeepSpeech model are changed a bit. The extra timestep values are intentionally set to -1. The changes are here

The transcripts for same examples, after using the second hacky method are:

Example#1:
Prediction: didn't do before
Reference: didn't god before
Example#2:
Prediction: and it may be a lot of things that are kind of true or
Reference: and it may be a lot of things that are kind of truer

make PyPI package

It would be helpful if this could just be installed via pip install pytorch-ctc or so.

Update Documentation

The API has changed to enable initialization of the scorer and decoders once for multiple decodings. Also adds kenlm support.

ToDos:

  • Document new API
  • Add acknowledgements
  • Document new scorers/installation requirements

Incorrect maintaining of states for words?

One of our colleagues got in touch with @willfrey and he mentioned an issue with the pytorch-ctc implementation:

He pointed out that the scorer in pytorch-ctc wasn't maintaining state between words properly, rendering it a "spell checker" (only scoring unigrams basically)

I'll do some investigation into this claim and report back!

Make KenLM optional

The dependency adds a fair amount of compilation time (order: seconds) and may not be necessary for all people.

Multiple characters in a label cause a segfault

When I try to use a multiple-character string as a label, I get a segfault.
For example:

import torch
import ctcdecode
labels = ["_", "SIL", "A"]
decoder = ctcdecode.CTCBeamDecoder(labels, blank_id=0)
decoder.decode(torch.randn(3,3,3))

triggers a segfault, whereas

import torch
import ctcdecode
labels = ["_", "A", "B"]
decoder = ctcdecode.CTCBeamDecoder(labels, blank_id=0)
decoder.decode(torch.randn(3,3,3))

does not. Is it possible to do this somehow?

Pip install fails

Processing /workspace/speech_recognition/ctcdecode
    Complete output from command python setup.py egg_info:
    Traceback (most recent call last):
      File "<string>", line 1, in <module>
      File "/tmp/pip-req-build-ffdgasnw/setup.py", line 55, in <module>
        os.path.join(this_file, "build.py:ffi")
      File "/opt/conda/lib/python3.6/site-packages/setuptools/__init__.py", line 140, in setup
        return distutils.core.setup(**attrs)
      File "/opt/conda/lib/python3.6/distutils/core.py", line 108, in setup
        _setup_distribution = dist = klass(attrs)
      File "/opt/conda/lib/python3.6/site-packages/setuptools/dist.py", line 370, in __init__
        k: v for k, v in attrs.items()
      File "/opt/conda/lib/python3.6/distutils/dist.py", line 281, in __init__
        self.finalize_options()
      File "/opt/conda/lib/python3.6/site-packages/setuptools/dist.py", line 529, in finalize_options
        ep.load()(self, ep.name, value)
      File "/opt/conda/lib/python3.6/site-packages/cffi/setuptools_ext.py", line 204, in cffi_modules
        add_cffi_module(dist, cffi_module)
      File "/opt/conda/lib/python3.6/site-packages/cffi/setuptools_ext.py", line 49, in add_cffi_module
        execfile(build_file_name, mod_vars)
      File "/opt/conda/lib/python3.6/site-packages/cffi/setuptools_ext.py", line 25, in execfile
        exec(code, glob, glob)
      File "/tmp/pip-req-build-ffdgasnw/build.py", line 9, in <module>
        from torch.utils.ffi import create_extension
      File "/opt/conda/lib/python3.6/site-packages/torch/utils/ffi/__init__.py", line 1, in <module>
        raise ImportError("torch.utils.ffi is deprecated. Please use cpp extensions instead.")
    ImportError: torch.utils.ffi is deprecated. Please use cpp extensions instead.
    
    ----------------------------------------
Command "python setup.py egg_info" failed with error code 1 in /tmp/pip-req-build-ffdgasnw/

I follow the exact instuctions given in readme with a recursive clone..

Confusion about trie files

I'm unclear as to what to expect from the generated trie files. I've run generate_lm_trie.py from deepspeech.pytorch, as well as directly within a python shell, by importing pytorch_ctc. With the former, the process takes around one second (tested with: 17GB/50k-vocab/5-gram, 2.5GB/50k-vocab/3-gram, and 6.2GB/100K-vocab/5-gram KenLM binaries), producing <3kB trie files with exactly 869 lines of mostly -1s (and 0s on every third line up to line 74), with no error messages. With the latter, the process takes some 20-30 minutes, producing ~10MB files, with 3.45M lines (still of mostly -1s) for the two aforementioned 50k-vocab binaries.

What are the expected formats and sizes of the trie files, and is there some reference against which I might compare mine?

Secondly, using the parameters quoted in the table here (beam_width 100, lm_alpha 4.0, lm_beta1 0.0, lm_beta2 5.0) increases WER/CER on a test set I'm using from 26.86/10.35 with greedy or argmax decoders to 100.00/29.63. I have not gridsearched, but I haven't found any configurations that improve WER or CER.

Is there something obvious I'm missing?

Thanks!

Rename to ctcdecode

This library may bindings to other learning systems, so rename to properly reflect that it is, first, a C++ CTC decoding implementation.

Support for explicit word separators and explicit dictionary for use in handwriting recognition

Hi,
I have a question about how to use the decoder when there is an explicit word separator besides the blank symbol. Some background: I'm trying to use the decoder for neural handwriting recognition. As you may be aware, this application is similar to neural speech recognition, and the same technology is suitable to a large extend. However, there is one issue. In neural handwriting recognition, the common practice is to keep the word separator symbols that are in the training material, and let the model reproduce them in addition to the "normal" symbols.
(See for example [https://arxiv.org/abs/1312.4569]), section IV C)

When not using the language model, this is fine, and you can get output like this (using "|" as the special word separator symbol):

Without language model:

evaluate_mdrnn - output: ""|BeTle|asd|Robbe|Mamnygard|.|"|"|Whati's|he|ben"
reference: ""|Better|ask|Robbie|Munyard|.|"|"|What|'s|he|been" --- wrong
evaluate_mdrnn - output: "Comuon|rerlet|,|wse|should|nok|be|elle|to"
reference: "Common|Market|,|we|should|not|be|able|to" --- wrong

However, when using the language model, it is not clear how to integrate the special word separator symbol (not the same as the CTC blank symbol!). When training the language model on "normal" text, such as the LOB [http://ota.ox.ac.uk/desc/0167] or Brown corpus, the word separator symbol won't be present obviously, and hence the decoder won't produce it.

With language model:

evaluate_mdrnn - output: "" Bethea Robbie Munyard . " what she ben"
reference: ""|Better|ask|Robbie|Munyard|.|"|"|What|'s|he|been" --- wrong
evaluate_mdrnn - output: "Common relative should not be elle to"
reference: "Common|Market|,|we|should|not|be|able|to" --- wrong

This is likely to harm performance, since the "|" symbol is still produced by the model, and needs to be "consumed" by the decoder somehow.

One hack I attempted is to train the language model with semi-artificial data, in which I add a separator between every word, for example:

gold-hunting | Kennedy | shocks | Dr | A | .
Germany | must | pay | .
offer | of | +357 | m | is | too | small | .
President | Kennedy | is | ready | to | get | tough | over | West | Germany's | cash | offer | to | help | America's | balance | of | payments | position | .

However, this also has undesired side-effects, such as leading to problem with Kneser-Ney discounting during language model training.

I think in decoders that use finite state transducers the finite state model is sometimes tailored with special states or transitions to deal with this problem. Perhaps this issue never occurs in speech, but I think actually it might occur if you explicitly mark long pauses for example (similar to explicit separators between words).

Do you have any suggestion how I might deal with this while using ctcdecode?
Neither using the language model trained on the original data, which cannot produce the word separator symbols, nor hacking the language model training data are very effective solutions it seems till now...

Another important and somewhat related issue seems to be the fact that there is no explicit vocabulary used in the decoder, only the language model? If one would like to restrict the vocabulary to say the 50K most frequent words would the (only) way be to change the language model training data, replacing all the words not in the 50K most frequent words with an INFREQUNT_WORD symbol or something? (This could work but again seems like quite an ugly hack which I would rather avoid if there is a way to provide an explicit vocabulary to the decoder.)

Thanks in advance for your help!

Gideon

Improve error handling

Throw an exception when files do not exist rather than exiting the Python interpreter.

Import issue with _ctc_decode

I'm running into errors trying to generate a trie, having tried in existing (deepspeech.pytorch) and clean conda environments, as well as virtualenvs with python 2.7 and 3.5. They all seem to point to a dependency issue with importing _ctc_decode via pytorch_ctc.

(pytorch) ubuntu@ds-worker:~/deepspeech.pytorch$ python generate_lm_trie.py -h
Traceback (most recent call last):
  File "generate_lm_trie.py", line 1, in <module>
    import pytorch_ctc
  File "/home/ubuntu/miniconda2/envs/pytorch/lib/python2.7/site-packages/pytorch_ctc/__init__.py", line 4, in <module>
    from ._ctc_decode import lib as _lib, ffi as _ffi
SystemError: dynamic module not initialized properly

directly importing in python shell:

>>> import _ctc_decode                                                                                                                                                   
Traceback (most recent call last):                                                                                                                                       
  File "<stdin>", line 1, in <module>                                                                                                                                    
ImportError: /home/ubuntu/pytorch-ctc/build/lib.linux-x86_64-3.5/pytorch_ctc/_ctc_decode.so: undefined symbol: THIntTensor_set2d            
>>>

Full build log:

generating build/ctc_decode/_ctc_decode.c
(already up-to-date)
running install
running build
running build_py
running build_ext
building 'pytorch_ctc._ctc_decode' extension
/usr/bin/gcc-5 -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -I/home/ubuntu/ctc_py3/lib/python3.5/site-packages/torch/utils/ffi/../../lib/include -I/home/ubuntu/ctc_py3/lib/python3.5/site-packages/torch/utils/ffi/../../lib/include/TH -Ithird_party/eigen3 -Ithird_party/utf8 -Ithird_party/kenlm -I/usr/include/python3.5m -I/home/ubuntu/ctc_py3/include/python3.5m -c build/ctc_decode/_ctc_decode.c -o build/temp.linux-x86_64-3.5/build/ctc_decode/_ctc_decode.o -std=c++11 -fPIC -w -O3 -DNDEBUG -DHAVE_ZLIB -DHAVE_BZLIB -DHAVE_XZLIB -DINCLUDE_KENLM -DKENLM_MAX_ORDER=6
/usr/bin/gcc-5 -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -I/home/ubuntu/ctc_py3/lib/python3.5/site-packages/torch/utils/ffi/../../lib/include -I/home/ubuntu/ctc_py3/lib/python3.5/site-packages/torch/utils/ffi/../../lib/include/TH -Ithird_party/eigen3 -Ithird_party/utf8 -Ithird_party/kenlm -I/usr/include/python3.5m -I/home/ubuntu/ctc_py3/include/python3.5m -c /home/ubuntu/pytorch-ctc/pytorch_ctc/src/cpu_binding.cpp -o build/temp.linux-x86_64-3.5/home/ubuntu/pytorch-ctc/pytorch_ctc/src/cpu_binding.o -std=c++11 -fPIC -w -O3 -DNDEBUG -DHAVE_ZLIB -DHAVE_BZLIB -DHAVE_XZLIB -DINCLUDE_KENLM -DKENLM_MAX_ORDER=6
cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++
/usr/bin/gcc-5 -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -I/home/ubuntu/ctc_py3/lib/python3.5/site-packages/torch/utils/ffi/../../lib/include -I/home/ubuntu/ctc_py3/lib/python3.5/site-packages/torch/utils/ffi/../../lib/include/TH -Ithird_party/eigen3 -Ithird_party/utf8 -Ithird_party/kenlm -I/usr/include/python3.5m -I/home/ubuntu/ctc_py3/include/python3.5m -c /home/ubuntu/pytorch-ctc/pytorch_ctc/src/util/status.cpp -o build/temp.linux-x86_64-3.5/home/ubuntu/pytorch-ctc/pytorch_ctc/src/util/status.o -std=c++11 -fPIC -w -O3 -DNDEBUG -DHAVE_ZLIB -DHAVE_BZLIB -DHAVE_XZLIB -DINCLUDE_KENLM -DKENLM_MAX_ORDER=6
cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++
/usr/bin/gcc-5 -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -I/home/ubuntu/ctc_py3/lib/python3.5/site-packages/torch/utils/ffi/../../lib/include -I/home/ubuntu/ctc_py3/lib/python3.5/site-packages/torch/utils/ffi/../../lib/include/TH -Ithird_party/eigen3 -Ithird_party/utf8 -Ithird_party/kenlm -I/usr/include/python3.5m -I/home/ubuntu/ctc_py3/include/python3.5m -c /home/ubuntu/pytorch-ctc/third_party/kenlm/util/parallel_read.cc -o build/temp.linux-x86_64-3.5/home/ubuntu/pytorch-ctc/third_party/kenlm/util/parallel_read.o -std=c++11 -fPIC -w -O3 -DNDEBUG -DHAVE_ZLIB -DHAVE_BZLIB -DHAVE_XZLIB -DINCLUDE_KENLM -DKENLM_MAX_ORDER=6
cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++
/usr/bin/gcc-5 -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -I/home/ubuntu/ctc_py3/lib/python3.5/site-packages/torch/utils/ffi/../../lib/include -I/home/ubuntu/ctc_py3/lib/python3.5/site-packages/torch/utils/ffi/../../lib/include/TH -Ithird_party/eigen3 -Ithird_party/utf8 -Ithird_party/kenlm -I/usr/include/python3.5m -I/home/ubuntu/ctc_py3/include/python3.5m -c /home/ubuntu/pytorch-ctc/third_party/kenlm/util/mmap.cc -o build/temp.linux-x86_64-3.5/home/ubuntu/pytorch-ctc/third_party/kenlm/util/mmap.o -std=c++11 -fPIC -w -O3 -DNDEBUG -DHAVE_ZLIB -DHAVE_BZLIB -DHAVE_XZLIB -DINCLUDE_KENLM -DKENLM_MAX_ORDER=6
cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++
/usr/bin/gcc-5 -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -I/home/ubuntu/ctc_py3/lib/python3.5/site-packages/torch/utils/ffi/../../lib/include -I/home/ubuntu/ctc_py3/lib/python3.5/site-packages/torch/utils/ffi/../../lib/include/TH -Ithird_party/eigen3 -Ithird_party/utf8 -Ithird_party/kenlm -I/usr/include/python3.5m -I/home/ubuntu/ctc_py3/include/python3.5m -c /home/ubuntu/pytorch-ctc/third_party/kenlm/util/string_piece.cc -o build/temp.linux-x86_64-3.5/home/ubuntu/pytorch-ctc/third_party/kenlm/util/string_piece.o -std=c++11 -fPIC -w -O3 -DNDEBUG -DHAVE_ZLIB -DHAVE_BZLIB -DHAVE_XZLIB -DINCLUDE_KENLM -DKENLM_MAX_ORDER=6
cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++
/usr/bin/gcc-5 -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -I/home/ubuntu/ctc_py3/lib/python3.5/site-packages/torch/utils/ffi/../../lib/include -I/home/ubuntu/ctc_py3/lib/python3.5/site-packages/torch/utils/ffi/../../lib/include/TH -Ithird_party/eigen3 -Ithird_party/utf8 -Ithird_party/kenlm -I/usr/include/python3.5m -I/home/ubuntu/ctc_py3/include/python3.5m -c /home/ubuntu/pytorch-ctc/third_party/kenlm/util/exception.cc -o build/temp.linux-x86_64-3.5/home/ubuntu/pytorch-ctc/third_party/kenlm/util/exception.o -std=c++11 -fPIC -w -O3 -DNDEBUG -DHAVE_ZLIB -DHAVE_BZLIB -DHAVE_XZLIB -DINCLUDE_KENLM -DKENLM_MAX_ORDER=6
cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++
/usr/bin/gcc-5 -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -I/home/ubuntu/ctc_py3/lib/python3.5/site-packages/torch/utils/ffi/../../lib/include -I/home/ubuntu/ctc_py3/lib/python3.5/site-packages/torch/utils/ffi/../../lib/include/TH -Ithird_party/eigen3 -Ithird_party/utf8 -Ithird_party/kenlm -I/usr/include/python3.5m -I/home/ubuntu/ctc_py3/include/python3.5m -c /home/ubuntu/pytorch-ctc/third_party/kenlm/util/file_piece.cc -o build/temp.linux-x86_64-3.5/home/ubuntu/pytorch-ctc/third_party/kenlm/util/file_piece.o -std=c++11 -fPIC -w -O3 -DNDEBUG -DHAVE_ZLIB -DHAVE_BZLIB -DHAVE_XZLIB -DINCLUDE_KENLM -DKENLM_MAX_ORDER=6
cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++
/usr/bin/gcc-5 -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -I/home/ubuntu/ctc_py3/lib/python3.5/site-packages/torch/utils/ffi/../../lib/include -I/home/ubuntu/ctc_py3/lib/python3.5/site-packages/torch/utils/ffi/../../lib/include/TH -Ithird_party/eigen3 -Ithird_party/utf8 -Ithird_party/kenlm -I/usr/include/python3.5m -I/home/ubuntu/ctc_py3/include/python3.5m -c /home/ubuntu/pytorch-ctc/third_party/kenlm/util/bit_packing.cc -o build/temp.linux-x86_64-3.5/home/ubuntu/pytorch-ctc/third_party/kenlm/util/bit_packing.o -std=c++11 -fPIC -w -O3 -DNDEBUG -DHAVE_ZLIB -DHAVE_BZLIB -DHAVE_XZLIB -DINCLUDE_KENLM -DKENLM_MAX_ORDER=6
cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++
/usr/bin/gcc-5 -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -I/home/ubuntu/ctc_py3/lib/python3.5/site-packages/torch/utils/ffi/../../lib/include -I/home/ubuntu/ctc_py3/lib/python3.5/site-packages/torch/utils/ffi/../../lib/include/TH -Ithird_party/eigen3 -Ithird_party/utf8 -Ithird_party/kenlm -I/usr/include/python3.5m -I/home/ubuntu/ctc_py3/include/python3.5m -c /home/ubuntu/pytorch-ctc/third_party/kenlm/util/ersatz_progress.cc -o build/temp.linux-x86_64-3.5/home/ubuntu/pytorch-ctc/third_party/kenlm/util/ersatz_progress.o -std=c++11 -fPIC -w -O3 -DNDEBUG -DHAVE_ZLIB -DHAVE_BZLIB -DHAVE_XZLIB -DINCLUDE_KENLM -DKENLM_MAX_ORDER=6
cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++
/usr/bin/gcc-5 -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -I/home/ubuntu/ctc_py3/lib/python3.5/site-packages/torch/utils/ffi/../../lib/include -I/home/ubuntu/ctc_py3/lib/python3.5/site-packages/torch/utils/ffi/../../lib/include/TH -Ithird_party/eigen3 -Ithird_party/utf8 -Ithird_party/kenlm -I/usr/include/python3.5m -I/home/ubuntu/ctc_py3/include/python3.5m -c /home/ubuntu/pytorch-ctc/third_party/kenlm/util/integer_to_string.cc -o build/temp.linux-x86_64-3.5/home/ubuntu/pytorch-ctc/third_party/kenlm/util/integer_to_string.o -std=c++11 -fPIC -w -O3 -DNDEBUG -DHAVE_ZLIB -DHAVE_BZLIB -DHAVE_XZLIB -DINCLUDE_KENLM -DKENLM_MAX_ORDER=6
cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++
/usr/bin/gcc-5 -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -I/home/ubuntu/ctc_py3/lib/python3.5/site-packages/torch/utils/ffi/../../lib/include -I/home/ubuntu/ctc_py3/lib/python3.5/site-packages/torch/utils/ffi/../../lib/include/TH -Ithird_party/eigen3 -Ithird_party/utf8 -Ithird_party/kenlm -I/usr/include/python3.5m -I/home/ubuntu/ctc_py3/include/python3.5m -c /home/ubuntu/pytorch-ctc/third_party/kenlm/util/float_to_string.cc -o build/temp.linux-x86_64-3.5/home/ubuntu/pytorch-ctc/third_party/kenlm/util/float_to_string.o -std=c++11 -fPIC -w -O3 -DNDEBUG -DHAVE_ZLIB -DHAVE_BZLIB -DHAVE_XZLIB -DINCLUDE_KENLM -DKENLM_MAX_ORDER=6
cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++
/usr/bin/gcc-5 -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -I/home/ubuntu/ctc_py3/lib/python3.5/site-packages/torch/utils/ffi/../../lib/include -I/home/ubuntu/ctc_py3/lib/python3.5/site-packages/torch/utils/ffi/../../lib/include/TH -Ithird_party/eigen3 -Ithird_party/utf8 -Ithird_party/kenlm -I/usr/include/python3.5m -I/home/ubuntu/ctc_py3/include/python3.5m -c /home/ubuntu/pytorch-ctc/third_party/kenlm/util/scoped.cc -o build/temp.linux-x86_64-3.5/home/ubuntu/pytorch-ctc/third_party/kenlm/util/scoped.o -std=c++11 -fPIC -w -O3 -DNDEBUG -DHAVE_ZLIB -DHAVE_BZLIB -DHAVE_XZLIB -DINCLUDE_KENLM -DKENLM_MAX_ORDER=6
cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++
/usr/bin/gcc-5 -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -I/home/ubuntu/ctc_py3/lib/python3.5/site-packages/torch/utils/ffi/../../lib/include -I/home/ubuntu/ctc_py3/lib/python3.5/site-packages/torch/utils/ffi/../../lib/include/TH -Ithird_party/eigen3 -Ithird_party/utf8 -Ithird_party/kenlm -I/usr/include/python3.5m -I/home/ubuntu/ctc_py3/include/python3.5m -c /home/ubuntu/pytorch-ctc/third_party/kenlm/util/murmur_hash.cc -o build/temp.linux-x86_64-3.5/home/ubuntu/pytorch-ctc/third_party/kenlm/util/murmur_hash.o -std=c++11 -fPIC -w -O3 -DNDEBUG -DHAVE_ZLIB -DHAVE_BZLIB -DHAVE_XZLIB -DINCLUDE_KENLM -DKENLM_MAX_ORDER=6
cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++
/usr/bin/gcc-5 -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -I/home/ubuntu/ctc_py3/lib/python3.5/site-packages/torch/utils/ffi/../../lib/include -I/home/ubuntu/ctc_py3/lib/python3.5/site-packages/torch/utils/ffi/../../lib/include/TH -Ithird_party/eigen3 -Ithird_party/utf8 -Ithird_party/kenlm -I/usr/include/python3.5m -I/home/ubuntu/ctc_py3/include/python3.5m -c /home/ubuntu/pytorch-ctc/third_party/kenlm/util/read_compressed.cc -o build/temp.linux-x86_64-3.5/home/ubuntu/pytorch-ctc/third_party/kenlm/util/read_compressed.o -std=c++11 -fPIC -w -O3 -DNDEBUG -DHAVE_ZLIB -DHAVE_BZLIB -DHAVE_XZLIB -DINCLUDE_KENLM -DKENLM_MAX_ORDER=6
cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++
/usr/bin/gcc-5 -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -I/home/ubuntu/ctc_py3/lib/python3.5/site-packages/torch/utils/ffi/../../lib/include -I/home/ubuntu/ctc_py3/lib/python3.5/site-packages/torch/utils/ffi/../../lib/include/TH -Ithird_party/eigen3 -Ithird_party/utf8 -Ithird_party/kenlm -I/usr/include/python3.5m -I/home/ubuntu/ctc_py3/include/python3.5m -c /home/ubuntu/pytorch-ctc/third_party/kenlm/util/usage.cc -o build/temp.linux-x86_64-3.5/home/ubuntu/pytorch-ctc/third_party/kenlm/util/usage.o -std=c++11 -fPIC -w -O3 -DNDEBUG -DHAVE_ZLIB -DHAVE_BZLIB -DHAVE_XZLIB -DINCLUDE_KENLM -DKENLM_MAX_ORDER=6
cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++
/usr/bin/gcc-5 -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -I/home/ubuntu/ctc_py3/lib/python3.5/site-packages/torch/utils/ffi/../../lib/include -I/home/ubuntu/ctc_py3/lib/python3.5/site-packages/torch/utils/ffi/../../lib/include/TH -Ithird_party/eigen3 -Ithird_party/utf8 -Ithird_party/kenlm -I/usr/include/python3.5m -I/home/ubuntu/ctc_py3/include/python3.5m -c /home/ubuntu/pytorch-ctc/third_party/kenlm/util/spaces.cc -o build/temp.linux-x86_64-3.5/home/ubuntu/pytorch-ctc/third_party/kenlm/util/spaces.o -std=c++11 -fPIC -w -O3 -DNDEBUG -DHAVE_ZLIB -DHAVE_BZLIB -DHAVE_XZLIB -DINCLUDE_KENLM -DKENLM_MAX_ORDER=6
cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++
/usr/bin/gcc-5 -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -I/home/ubuntu/ctc_py3/lib/python3.5/site-packages/torch/utils/ffi/../../lib/include -I/home/ubuntu/ctc_py3/lib/python3.5/site-packages/torch/utils/ffi/../../lib/include/TH -Ithird_party/eigen3 -Ithird_party/utf8 -Ithird_party/kenlm -I/usr/include/python3.5m -I/home/ubuntu/ctc_py3/include/python3.5m -c /home/ubuntu/pytorch-ctc/third_party/kenlm/util/file.cc -o build/temp.linux-x86_64-3.5/home/ubuntu/pytorch-ctc/third_party/kenlm/util/file.o -std=c++11 -fPIC -w -O3 -DNDEBUG -DHAVE_ZLIB -DHAVE_BZLIB -DHAVE_XZLIB -DINCLUDE_KENLM -DKENLM_MAX_ORDER=6
cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++
/usr/bin/gcc-5 -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -I/home/ubuntu/ctc_py3/lib/python3.5/site-packages/torch/utils/ffi/../../lib/include -I/home/ubuntu/ctc_py3/lib/python3.5/site-packages/torch/utils/ffi/../../lib/include/TH -Ithird_party/eigen3 -Ithird_party/utf8 -Ithird_party/kenlm -I/usr/include/python3.5m -I/home/ubuntu/ctc_py3/include/python3.5m -c /home/ubuntu/pytorch-ctc/third_party/kenlm/util/pool.cc -o build/temp.linux-x86_64-3.5/home/ubuntu/pytorch-ctc/third_party/kenlm/util/pool.o -std=c++11 -fPIC -w -O3 -DNDEBUG -DHAVE_ZLIB -DHAVE_BZLIB -DHAVE_XZLIB -DINCLUDE_KENLM -DKENLM_MAX_ORDER=6
cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++
/usr/bin/gcc-5 -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -I/home/ubuntu/ctc_py3/lib/python3.5/site-packages/torch/utils/ffi/../../lib/include -I/home/ubuntu/ctc_py3/lib/python3.5/site-packages/torch/utils/ffi/../../lib/include/TH -Ithird_party/eigen3 -Ithird_party/utf8 -Ithird_party/kenlm -I/usr/include/python3.5m -I/home/ubuntu/ctc_py3/include/python3.5m -c /home/ubuntu/pytorch-ctc/third_party/kenlm/lm/config.cc -o build/temp.linux-x86_64-3.5/home/ubuntu/pytorch-ctc/third_party/kenlm/lm/config.o -std=c++11 -fPIC -w -O3 -DNDEBUG -DHAVE_ZLIB -DHAVE_BZLIB -DHAVE_XZLIB -DINCLUDE_KENLM -DKENLM_MAX_ORDER=6
cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++
/usr/bin/gcc-5 -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -I/home/ubuntu/ctc_py3/lib/python3.5/site-packages/torch/utils/ffi/../../lib/include -I/home/ubuntu/ctc_py3/lib/python3.5/site-packages/torch/utils/ffi/../../lib/include/TH -Ithird_party/eigen3 -Ithird_party/utf8 -Ithird_party/kenlm -I/usr/include/python3.5m -I/home/ubuntu/ctc_py3/include/python3.5m -c /home/ubuntu/pytorch-ctc/third_party/kenlm/lm/virtual_interface.cc -o build/temp.linux-x86_64-3.5/home/ubuntu/pytorch-ctc/third_party/kenlm/lm/virtual_interface.o -std=c++11 -fPIC -w -O3 -DNDEBUG -DHAVE_ZLIB -DHAVE_BZLIB -DHAVE_XZLIB -DINCLUDE_KENLM -DKENLM_MAX_ORDER=6
cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++
/usr/bin/gcc-5 -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -I/home/ubuntu/ctc_py3/lib/python3.5/site-packages/torch/utils/ffi/../../lib/include -I/home/ubuntu/ctc_py3/lib/python3.5/site-packages/torch/utils/ffi/../../lib/include/TH -Ithird_party/eigen3 -Ithird_party/utf8 -Ithird_party/kenlm -I/usr/include/python3.5m -I/home/ubuntu/ctc_py3/include/python3.5m -c /home/ubuntu/pytorch-ctc/third_party/kenlm/lm/bhiksha.cc -o build/temp.linux-x86_64-3.5/home/ubuntu/pytorch-ctc/third_party/kenlm/lm/bhiksha.o -std=c++11 -fPIC -w -O3 -DNDEBUG -DHAVE_ZLIB -DHAVE_BZLIB -DHAVE_XZLIB -DINCLUDE_KENLM -DKENLM_MAX_ORDER=6
cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++
/usr/bin/gcc-5 -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -I/home/ubuntu/ctc_py3/lib/python3.5/site-packages/torch/utils/ffi/../../lib/include -I/home/ubuntu/ctc_py3/lib/python3.5/site-packages/torch/utils/ffi/../../lib/include/TH -Ithird_party/eigen3 -Ithird_party/utf8 -Ithird_party/kenlm -I/usr/include/python3.5m -I/home/ubuntu/ctc_py3/include/python3.5m -c /home/ubuntu/pytorch-ctc/third_party/kenlm/lm/search_trie.cc -o build/temp.linux-x86_64-3.5/home/ubuntu/pytorch-ctc/third_party/kenlm/lm/search_trie.o -std=c++11 -fPIC -w -O3 -DNDEBUG -DHAVE_ZLIB -DHAVE_BZLIB -DHAVE_XZLIB -DINCLUDE_KENLM -DKENLM_MAX_ORDER=6
cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++
/usr/bin/gcc-5 -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -I/home/ubuntu/ctc_py3/lib/python3.5/site-packages/torch/utils/ffi/../../lib/include -I/home/ubuntu/ctc_py3/lib/python3.5/site-packages/torch/utils/ffi/../../lib/include/TH -Ithird_party/eigen3 -Ithird_party/utf8 -Ithird_party/kenlm -I/usr/include/python3.5m -I/home/ubuntu/ctc_py3/include/python3.5m -c /home/ubuntu/pytorch-ctc/third_party/kenlm/lm/binary_format.cc -o build/temp.linux-x86_64-3.5/home/ubuntu/pytorch-ctc/third_party/kenlm/lm/binary_format.o -std=c++11 -fPIC -w -O3 -DNDEBUG -DHAVE_ZLIB -DHAVE_BZLIB -DHAVE_XZLIB -DINCLUDE_KENLM -DKENLM_MAX_ORDER=6
cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++
/usr/bin/gcc-5 -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -I/home/ubuntu/ctc_py3/lib/python3.5/site-packages/torch/utils/ffi/../../lib/include -I/home/ubuntu/ctc_py3/lib/python3.5/site-packages/torch/utils/ffi/../../lib/include/TH -Ithird_party/eigen3 -Ithird_party/utf8 -Ithird_party/kenlm -I/usr/include/python3.5m -I/home/ubuntu/ctc_py3/include/python3.5m -c /home/ubuntu/pytorch-ctc/third_party/kenlm/lm/value_build.cc -o build/temp.linux-x86_64-3.5/home/ubuntu/pytorch-ctc/third_party/kenlm/lm/value_build.o -std=c++11 -fPIC -w -O3 -DNDEBUG -DHAVE_ZLIB -DHAVE_BZLIB -DHAVE_XZLIB -DINCLUDE_KENLM -DKENLM_MAX_ORDER=6
cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++
/usr/bin/gcc-5 -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -I/home/ubuntu/ctc_py3/lib/python3.5/site-packages/torch/utils/ffi/../../lib/include -I/home/ubuntu/ctc_py3/lib/python3.5/site-packages/torch/utils/ffi/../../lib/include/TH -Ithird_party/eigen3 -Ithird_party/utf8 -Ithird_party/kenlm -I/usr/include/python3.5m -I/home/ubuntu/ctc_py3/include/python3.5m -c /home/ubuntu/pytorch-ctc/third_party/kenlm/lm/read_arpa.cc -o build/temp.linux-x86_64-3.5/home/ubuntu/pytorch-ctc/third_party/kenlm/lm/read_arpa.o -std=c++11 -fPIC -w -O3 -DNDEBUG -DHAVE_ZLIB -DHAVE_BZLIB -DHAVE_XZLIB -DINCLUDE_KENLM -DKENLM_MAX_ORDER=6
cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++
/usr/bin/gcc-5 -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -I/home/ubuntu/ctc_py3/lib/python3.5/site-packages/torch/utils/ffi/../../lib/include -I/home/ubuntu/ctc_py3/lib/python3.5/site-packages/torch/utils/ffi/../../lib/include/TH -Ithird_party/eigen3 -Ithird_party/utf8 -Ithird_party/kenlm -I/usr/include/python3.5m -I/home/ubuntu/ctc_py3/include/python3.5m -c /home/ubuntu/pytorch-ctc/third_party/kenlm/lm/trie_sort.cc -o build/temp.linux-x86_64-3.5/home/ubuntu/pytorch-ctc/third_party/kenlm/lm/trie_sort.o -std=c++11 -fPIC -w -O3 -DNDEBUG -DHAVE_ZLIB -DHAVE_BZLIB -DHAVE_XZLIB -DINCLUDE_KENLM -DKENLM_MAX_ORDER=6
cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++
/usr/bin/gcc-5 -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -I/home/ubuntu/ctc_py3/lib/python3.5/site-packages/torch/utils/ffi/../../lib/include -I/home/ubuntu/ctc_py3/lib/python3.5/site-packages/torch/utils/ffi/../../lib/include/TH -Ithird_party/eigen3 -Ithird_party/utf8 -Ithird_party/kenlm -I/usr/include/python3.5m -I/home/ubuntu/ctc_py3/include/python3.5m -c /home/ubuntu/pytorch-ctc/third_party/kenlm/lm/trie.cc -o build/temp.linux-x86_64-3.5/home/ubuntu/pytorch-ctc/third_party/kenlm/lm/trie.o -std=c++11 -fPIC -w -O3 -DNDEBUG -DHAVE_ZLIB -DHAVE_BZLIB -DHAVE_XZLIB -DINCLUDE_KENLM -DKENLM_MAX_ORDER=6
cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++
/usr/bin/gcc-5 -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -I/home/ubuntu/ctc_py3/lib/python3.5/site-packages/torch/utils/ffi/../../lib/include -I/home/ubuntu/ctc_py3/lib/python3.5/site-packages/torch/utils/ffi/../../lib/include/TH -Ithird_party/eigen3 -Ithird_party/utf8 -Ithird_party/kenlm -I/usr/include/python3.5m -I/home/ubuntu/ctc_py3/include/python3.5m -c /home/ubuntu/pytorch-ctc/third_party/kenlm/lm/sizes.cc -o build/temp.linux-x86_64-3.5/home/ubuntu/pytorch-ctc/third_party/kenlm/lm/sizes.o -std=c++11 -fPIC -w -O3 -DNDEBUG -DHAVE_ZLIB -DHAVE_BZLIB -DHAVE_XZLIB -DINCLUDE_KENLM -DKENLM_MAX_ORDER=6
cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++
/usr/bin/gcc-5 -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -I/home/ubuntu/ctc_py3/lib/python3.5/site-packages/torch/utils/ffi/../../lib/include -I/home/ubuntu/ctc_py3/lib/python3.5/site-packages/torch/utils/ffi/../../lib/include/TH -Ithird_party/eigen3 -Ithird_party/utf8 -Ithird_party/kenlm -I/usr/include/python3.5m -I/home/ubuntu/ctc_py3/include/python3.5m -c /home/ubuntu/pytorch-ctc/third_party/kenlm/lm/quantize.cc -o build/temp.linux-x86_64-3.5/home/ubuntu/pytorch-ctc/third_party/kenlm/lm/quantize.o -std=c++11 -fPIC -w -O3 -DNDEBUG -DHAVE_ZLIB -DHAVE_BZLIB -DHAVE_XZLIB -DINCLUDE_KENLM -DKENLM_MAX_ORDER=6
cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++
/usr/bin/gcc-5 -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -I/home/ubuntu/ctc_py3/lib/python3.5/site-packages/torch/utils/ffi/../../lib/include -I/home/ubuntu/ctc_py3/lib/python3.5/site-packages/torch/utils/ffi/../../lib/include/TH -Ithird_party/eigen3 -Ithird_party/utf8 -Ithird_party/kenlm -I/usr/include/python3.5m -I/home/ubuntu/ctc_py3/include/python3.5m -c /home/ubuntu/pytorch-ctc/third_party/kenlm/lm/search_hashed.cc -o build/temp.linux-x86_64-3.5/home/ubuntu/pytorch-ctc/third_party/kenlm/lm/search_hashed.o -std=c++11 -fPIC -w -O3 -DNDEBUG -DHAVE_ZLIB -DHAVE_BZLIB -DHAVE_XZLIB -DINCLUDE_KENLM -DKENLM_MAX_ORDER=6
cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++
/usr/bin/gcc-5 -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -I/home/ubuntu/ctc_py3/lib/python3.5/site-packages/torch/utils/ffi/../../lib/include -I/home/ubuntu/ctc_py3/lib/python3.5/site-packages/torch/utils/ffi/../../lib/include/TH -Ithird_party/eigen3 -Ithird_party/utf8 -Ithird_party/kenlm -I/usr/include/python3.5m -I/home/ubuntu/ctc_py3/include/python3.5m -c /home/ubuntu/pytorch-ctc/third_party/kenlm/lm/lm_exception.cc -o build/temp.linux-x86_64-3.5/home/ubuntu/pytorch-ctc/third_party/kenlm/lm/lm_exception.o -std=c++11 -fPIC -w -O3 -DNDEBUG -DHAVE_ZLIB -DHAVE_BZLIB -DHAVE_XZLIB -DINCLUDE_KENLM -DKENLM_MAX_ORDER=6
cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++
/usr/bin/gcc-5 -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -I/home/ubuntu/ctc_py3/lib/python3.5/site-packages/torch/utils/ffi/../../lib/include -I/home/ubuntu/ctc_py3/lib/python3.5/site-packages/torch/utils/ffi/../../lib/include/TH -Ithird_party/eigen3 -Ithird_party/utf8 -Ithird_party/kenlm -I/usr/include/python3.5m -I/home/ubuntu/ctc_py3/include/python3.5m -c /home/ubuntu/pytorch-ctc/third_party/kenlm/lm/vocab.cc -o build/temp.linux-x86_64-3.5/home/ubuntu/pytorch-ctc/third_party/kenlm/lm/vocab.o -std=c++11 -fPIC -w -O3 -DNDEBUG -DHAVE_ZLIB -DHAVE_BZLIB -DHAVE_XZLIB -DINCLUDE_KENLM -DKENLM_MAX_ORDER=6
cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++
/usr/bin/gcc-5 -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -I/home/ubuntu/ctc_py3/lib/python3.5/site-packages/torch/utils/ffi/../../lib/include -I/home/ubuntu/ctc_py3/lib/python3.5/site-packages/torch/utils/ffi/../../lib/include/TH -Ithird_party/eigen3 -Ithird_party/utf8 -Ithird_party/kenlm -I/usr/include/python3.5m -I/home/ubuntu/ctc_py3/include/python3.5m -c /home/ubuntu/pytorch-ctc/third_party/kenlm/lm/model.cc -o build/temp.linux-x86_64-3.5/home/ubuntu/pytorch-ctc/third_party/kenlm/lm/model.o -std=c++11 -fPIC -w -O3 -DNDEBUG -DHAVE_ZLIB -DHAVE_BZLIB -DHAVE_XZLIB -DINCLUDE_KENLM -DKENLM_MAX_ORDER=6
cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++
/usr/bin/gcc-5 -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -I/home/ubuntu/ctc_py3/lib/python3.5/site-packages/torch/utils/ffi/../../lib/include -I/home/ubuntu/ctc_py3/lib/python3.5/site-packages/torch/utils/ffi/../../lib/include/TH -Ithird_party/eigen3 -Ithird_party/utf8 -Ithird_party/kenlm -I/usr/include/python3.5m -I/home/ubuntu/ctc_py3/include/python3.5m -c /home/ubuntu/pytorch-ctc/third_party/kenlm/util/double-conversion/bignum-dtoa.cc -o build/temp.linux-x86_64-3.5/home/ubuntu/pytorch-ctc/third_party/kenlm/util/double-conversion/bignum-dtoa.o -std=c++11 -fPIC -w -O3 -DNDEBUG -DHAVE_ZLIB -DHAVE_BZLIB -DHAVE_XZLIB -DINCLUDE_KENLM -DKENLM_MAX_ORDER=6
cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++
/usr/bin/gcc-5 -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -I/home/ubuntu/ctc_py3/lib/python3.5/site-packages/torch/utils/ffi/../../lib/include -I/home/ubuntu/ctc_py3/lib/python3.5/site-packages/torch/utils/ffi/../../lib/include/TH -Ithird_party/eigen3 -Ithird_party/utf8 -Ithird_party/kenlm -I/usr/include/python3.5m -I/home/ubuntu/ctc_py3/include/python3.5m -c /home/ubuntu/pytorch-ctc/third_party/kenlm/util/double-conversion/diy-fp.cc -o build/temp.linux-x86_64-3.5/home/ubuntu/pytorch-ctc/third_party/kenlm/util/double-conversion/diy-fp.o -std=c++11 -fPIC -w -O3 -DNDEBUG -DHAVE_ZLIB -DHAVE_BZLIB -DHAVE_XZLIB -DINCLUDE_KENLM -DKENLM_MAX_ORDER=6
cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++
/usr/bin/gcc-5 -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -I/home/ubuntu/ctc_py3/lib/python3.5/site-packages/torch/utils/ffi/../../lib/include -I/home/ubuntu/ctc_py3/lib/python3.5/site-packages/torch/utils/ffi/../../lib/include/TH -Ithird_party/eigen3 -Ithird_party/utf8 -Ithird_party/kenlm -I/usr/include/python3.5m -I/home/ubuntu/ctc_py3/include/python3.5m -c /home/ubuntu/pytorch-ctc/third_party/kenlm/util/double-conversion/cached-powers.cc -o build/temp.linux-x86_64-3.5/home/ubuntu/pytorch-ctc/third_party/kenlm/util/double-conversion/cached-powers.o -std=c++11 -fPIC -w -O3 -DNDEBUG -DHAVE_ZLIB -DHAVE_BZLIB -DHAVE_XZLIB -DINCLUDE_KENLM -DKENLM_MAX_ORDER=6
cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++
/usr/bin/gcc-5 -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -I/home/ubuntu/ctc_py3/lib/python3.5/site-packages/torch/utils/ffi/../../lib/include -I/home/ubuntu/ctc_py3/lib/python3.5/site-packages/torch/utils/ffi/../../lib/include/TH -Ithird_party/eigen3 -Ithird_party/utf8 -Ithird_party/kenlm -I/usr/include/python3.5m -I/home/ubuntu/ctc_py3/include/python3.5m -c /home/ubuntu/pytorch-ctc/third_party/kenlm/util/double-conversion/fast-dtoa.cc -o build/temp.linux-x86_64-3.5/home/ubuntu/pytorch-ctc/third_party/kenlm/util/double-conversion/fast-dtoa.o -std=c++11 -fPIC -w -O3 -DNDEBUG -DHAVE_ZLIB -DHAVE_BZLIB -DHAVE_XZLIB -DINCLUDE_KENLM -DKENLM_MAX_ORDER=6
cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++
/usr/bin/gcc-5 -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -I/home/ubuntu/ctc_py3/lib/python3.5/site-packages/torch/utils/ffi/../../lib/include -I/home/ubuntu/ctc_py3/lib/python3.5/site-packages/torch/utils/ffi/../../lib/include/TH -Ithird_party/eigen3 -Ithird_party/utf8 -Ithird_party/kenlm -I/usr/include/python3.5m -I/home/ubuntu/ctc_py3/include/python3.5m -c /home/ubuntu/pytorch-ctc/third_party/kenlm/util/double-conversion/fixed-dtoa.cc -o build/temp.linux-x86_64-3.5/home/ubuntu/pytorch-ctc/third_party/kenlm/util/double-conversion/fixed-dtoa.o -std=c++11 -fPIC -w -O3 -DNDEBUG -DHAVE_ZLIB -DHAVE_BZLIB -DHAVE_XZLIB -DINCLUDE_KENLM -DKENLM_MAX_ORDER=6
cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++
/usr/bin/gcc-5 -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -I/home/ubuntu/ctc_py3/lib/python3.5/site-packages/torch/utils/ffi/../../lib/include -I/home/ubuntu/ctc_py3/lib/python3.5/site-packages/torch/utils/ffi/../../lib/include/TH -Ithird_party/eigen3 -Ithird_party/utf8 -Ithird_party/kenlm -I/usr/include/python3.5m -I/home/ubuntu/ctc_py3/include/python3.5m -c /home/ubuntu/pytorch-ctc/third_party/kenlm/util/double-conversion/double-conversion.cc -o build/temp.linux-x86_64-3.5/home/ubuntu/pytorch-ctc/third_party/kenlm/util/double-conversion/double-conversion.o -std=c++11 -fPIC -w -O3 -DNDEBUG -DHAVE_ZLIB -DHAVE_BZLIB -DHAVE_XZLIB -DINCLUDE_KENLM -DKENLM_MAX_ORDER=6
cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++
/usr/bin/gcc-5 -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -I/home/ubuntu/ctc_py3/lib/python3.5/site-packages/torch/utils/ffi/../../lib/include -I/home/ubuntu/ctc_py3/lib/python3.5/site-packages/torch/utils/ffi/../../lib/include/TH -Ithird_party/eigen3 -Ithird_party/utf8 -Ithird_party/kenlm -I/usr/include/python3.5m -I/home/ubuntu/ctc_py3/include/python3.5m -c /home/ubuntu/pytorch-ctc/third_party/kenlm/util/double-conversion/strtod.cc -o build/temp.linux-x86_64-3.5/home/ubuntu/pytorch-ctc/third_party/kenlm/util/double-conversion/strtod.o -std=c++11 -fPIC -w -O3 -DNDEBUG -DHAVE_ZLIB -DHAVE_BZLIB -DHAVE_XZLIB -DINCLUDE_KENLM -DKENLM_MAX_ORDER=6
cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++
/usr/bin/gcc-5 -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -I/home/ubuntu/ctc_py3/lib/python3.5/site-packages/torch/utils/ffi/../../lib/include -I/home/ubuntu/ctc_py3/lib/python3.5/site-packages/torch/utils/ffi/../../lib/include/TH -Ithird_party/eigen3 -Ithird_party/utf8 -Ithird_party/kenlm -I/usr/include/python3.5m -I/home/ubuntu/ctc_py3/include/python3.5m -c /home/ubuntu/pytorch-ctc/third_party/kenlm/util/double-conversion/bignum.cc -o build/temp.linux-x86_64-3.5/home/ubuntu/pytorch-ctc/third_party/kenlm/util/double-conversion/bignum.o -std=c++11 -fPIC -w -O3 -DNDEBUG -DHAVE_ZLIB -DHAVE_BZLIB -DHAVE_XZLIB -DINCLUDE_KENLM -DKENLM_MAX_ORDER=6
cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++
/usr/bin/g++-5 -pthread -shared -Wl,-O1 -Wl,-Bsymbolic-functions -Wl,-Bsymbolic-functions -Wl,-z,relro -Wl,-Bsymbolic-functions -Wl,-z,relro -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 build/temp.linux-x86_64-3.5/build/ctc_decode/_ctc_decode.o build/temp.linux-x86_64-3.5/home/ubuntu/pytorch-ctc/pytorch_ctc/src/cpu_binding.o build/temp.linux-x86_64-3.5/home/ubuntu/pytorch-ctc/pytorch_ctc/src/util/status.o build/temp.linux-x86_64-3.5/home/ubuntu/pytorch-ctc/third_party/kenlm/util/parallel_read.o build/temp.linux-x86_64-3.5/home/ubuntu/pytorch-ctc/third_party/kenlm/util/mmap.o build/temp.linux-x86_64-3.5/home/ubuntu/pytorch-ctc/third_party/kenlm/util/string_piece.o build/temp.linux-x86_64-3.5/home/ubuntu/pytorch-ctc/third_party/kenlm/util/exception.o build/temp.linux-x86_64-3.5/home/ubuntu/pytorch-ctc/third_party/kenlm/util/file_piece.o build/temp.linux-x86_64-3.5/home/ubuntu/pytorch-ctc/third_party/kenlm/util/bit_packing.o build/temp.linux-x86_64-3.5/home/ubuntu/pytorch-ctc/third_party/kenlm/util/ersatz_progress.o build/temp.linux-x86_64-3.5/home/ubuntu/pytorch-ctc/third_party/kenlm/util/integer_to_string.o build/temp.linux-x86_64-3.5/home/ubuntu/pytorch-ctc/third_party/kenlm/util/float_to_string.o build/temp.linux-x86_64-3.5/home/ubuntu/pytorch-ctc/third_party/kenlm/util/scoped.o build/temp.linux-x86_64-3.5/home/ubuntu/pytorch-ctc/third_party/kenlm/util/murmur_hash.o build/temp.linux-x86_64-3.5/home/ubuntu/pytorch-ctc/third_party/kenlm/util/read_compressed.o build/temp.linux-x86_64-3.5/home/ubuntu/pytorch-ctc/third_party/kenlm/util/usage.o build/temp.linux-x86_64-3.5/home/ubuntu/pytorch-ctc/third_party/kenlm/util/spaces.o build/temp.linux-x86_64-3.5/home/ubuntu/pytorch-ctc/third_party/kenlm/util/file.o build/temp.linux-x86_64-3.5/home/ubuntu/pytorch-ctc/third_party/kenlm/util/pool.o build/temp.linux-x86_64-3.5/home/ubuntu/pytorch-ctc/third_party/kenlm/lm/config.o build/temp.linux-x86_64-3.5/home/ubuntu/pytorch-ctc/third_party/kenlm/lm/virtual_interface.o build/temp.linux-x86_64-3.5/home/ubuntu/pytorch-ctc/third_party/kenlm/lm/bhiksha.o build/temp.linux-x86_64-3.5/home/ubuntu/pytorch-ctc/third_party/kenlm/lm/search_trie.o build/temp.linux-x86_64-3.5/home/ubuntu/pytorch-ctc/third_party/kenlm/lm/binary_format.o build/temp.linux-x86_64-3.5/home/ubuntu/pytorch-ctc/third_party/kenlm/lm/value_build.o build/temp.linux-x86_64-3.5/home/ubuntu/pytorch-ctc/third_party/kenlm/lm/read_arpa.o build/temp.linux-x86_64-3.5/home/ubuntu/pytorch-ctc/third_party/kenlm/lm/trie_sort.o build/temp.linux-x86_64-3.5/home/ubuntu/pytorch-ctc/third_party/kenlm/lm/trie.o build/temp.linux-x86_64-3.5/home/ubuntu/pytorch-ctc/third_party/kenlm/lm/sizes.o build/temp.linux-x86_64-3.5/home/ubuntu/pytorch-ctc/third_party/kenlm/lm/quantize.o build/temp.linux-x86_64-3.5/home/ubuntu/pytorch-ctc/third_party/kenlm/lm/search_hashed.o build/temp.linux-x86_64-3.5/home/ubuntu/pytorch-ctc/third_party/kenlm/lm/lm_exception.o build/temp.linux-x86_64-3.5/home/ubuntu/pytorch-ctc/third_party/kenlm/lm/vocab.o build/temp.linux-x86_64-3.5/home/ubuntu/pytorch-ctc/third_party/kenlm/lm/model.o build/temp.linux-x86_64-3.5/home/ubuntu/pytorch-ctc/third_party/kenlm/util/double-conversion/bignum-dtoa.o build/temp.linux-x86_64-3.5/home/ubuntu/pytorch-ctc/third_party/kenlm/util/double-conversion/diy-fp.o build/temp.linux-x86_64-3.5/home/ubuntu/pytorch-ctc/third_party/kenlm/util/double-conversion/cached-powers.o build/temp.linux-x86_64-3.5/home/ubuntu/pytorch-ctc/third_party/kenlm/util/double-conversion/fast-dtoa.o build/temp.linux-x86_64-3.5/home/ubuntu/pytorch-ctc/third_party/kenlm/util/double-conversion/fixed-dtoa.o build/temp.linux-x86_64-3.5/home/ubuntu/pytorch-ctc/third_party/kenlm/util/double-conversion/double-conversion.o build/temp.linux-x86_64-3.5/home/ubuntu/pytorch-ctc/third_party/kenlm/util/double-conversion/strtod.o build/temp.linux-x86_64-3.5/home/ubuntu/pytorch-ctc/third_party/kenlm/util/double-conversion/bignum.o -lstdc++ -lz -lbz2 -llzma -o build/lib.linux-x86_64-3.5/pytorch_ctc/_ctc_decode.cpython-35m-x86_64-linux-gnu.so
running install_lib
copying build/lib.linux-x86_64-3.5/pytorch_ctc/_ctc_decode.so -> /home/ubuntu/ctc_py3/lib/python3.5/site-packages/pytorch_ctc
copying build/lib.linux-x86_64-3.5/pytorch_ctc/_ctc_decode.cpython-35m-x86_64-linux-gnu.so -> /home/ubuntu/ctc_py3/lib/python3.5/site-packages/pytorch_ctc
running install_egg_info
Removing /home/ubuntu/ctc_py3/lib/python3.5/site-packages/pytorch_ctc-0.1.egg-info
Writing /home/ubuntu/ctc_py3/lib/python3.5/site-packages/pytorch_ctc-0.1.egg-info
not modified: 'build/ctc_decode/_ctc_decode.c'
/usr/lib/python3.5/distutils/dist.py:261: UserWarning: Unknown distribution option: 'install_requires'
  warnings.warn(msg)
/usr/lib/python3.5/distutils/dist.py:261: UserWarning: Unknown distribution option: 'setup_requires'
  warnings.warn(msg)

How to understand the output scores?

Hi,

During decoding, I used top-5 best paths. When I look at the output scores returned by the decoder, I found 2 things that I don't understand:

  1. The scores are in ascending order, starting from the smallest one. Since it's the "top" N best, why is the first score the smallest?
  2. Some scores (the largest ones) are greater than 1. I am wondering since they are log probabilities, should they be less than 0?

So, how should I interpret the scores?

Thank you so much for any help

IOError: [Errno socket error] [Errno 101] Network is unreachable

When I git clone and type ' pip install . ', I get the following IOError:Network is unreachable ?
any solution will be appreciated.

root@hxh:/home/hxh/common_use/ctcdecode# python setup.py install
zip_safe flag not set; analyzing archive contents...

Installed /home/hxh/common_use/ctcdecode/.eggs/wget-3.2-py2.7.egg
Traceback (most recent call last):
  File "setup.py", line 55, in <module>
    os.path.join(this_file, "build.py:ffi")
  File "/home/hxh/anaconda2/lib/python2.7/site-packages/setuptools/__init__.py", line 129, in setup
    return distutils.core.setup(**attrs)
  File "/home/hxh/anaconda2/lib/python2.7/distutils/core.py", line 111, in setup
    _setup_distribution = dist = klass(attrs)
  File "/home/hxh/anaconda2/lib/python2.7/site-packages/setuptools/dist.py", line 333, in __init__
    _Distribution.__init__(self, attrs)
  File "/home/hxh/anaconda2/lib/python2.7/distutils/dist.py", line 287, in __init__
    self.finalize_options()
  File "/home/hxh/anaconda2/lib/python2.7/site-packages/setuptools/dist.py", line 476, in finalize_options
    ep.load()(self, ep.name, value)
  File "/home/hxh/anaconda2/lib/python2.7/site-packages/cffi/setuptools_ext.py", line 193, in cffi_modules
    add_cffi_module(dist, cffi_module)
  File "/home/hxh/anaconda2/lib/python2.7/site-packages/cffi/setuptools_ext.py", line 49, in add_cffi_module
    execfile(build_file_name, mod_vars)
  File "/home/hxh/anaconda2/lib/python2.7/site-packages/cffi/setuptools_ext.py", line 25, in execfile
    exec(code, glob, glob)
  File "build.py", line 22, in <module>
    'third_party/openfst-1.6.7.tar.gz')
  File "build.py", line 14, in download_extract
    out=dl_path)
  File "build/bdist.linux-x86_64/egg/wget.py", line 526, in download
  File "/home/hxh/anaconda2/lib/pyt``hon2.7/urllib.py", line 98, in urlretrieve
    return opener.retrieve(url, filename, reporthook, data)
  File "/home/hxh/anaconda2/lib/python2.7/urllib.py", line 245, in retrieve
    fp = self.open(url, data)
  File "/home/hxh/anaconda2/lib/python2.7/urllib.py", line 213, in open
    return getattr(self, name)(url)
  File "/home/hxh/anaconda2/lib/python2.7/urllib.py", line 443, in open_https
    h.endheaders(data)
  File "/home/hxh/anaconda2/lib/python2.7/httplib.py", line 1038, in endheaders
    self._send_output(message_body)
  File "/home/hxh/anaconda2/lib/python2.7/httplib.py", line 882, in _send_output
    self.send(msg)
  File "/home/hxh/anaconda2/lib/python2.7/httplib.py", line 844, in send
    self.connect()
  File "/home/hxh/anaconda2/lib/python2.7/httplib.py", line 1255, in connect
    HTTPConnection.connect(self)
  File "/home/hxh/anaconda2/lib/python2.7/httplib.py", line 821, in connect
    self.timeout, self.source_address)
  File "/home/hxh/anaconda2/lib/python2.7/socket.py", line 575, in create_connection
    raise err
IOError: [Errno socket error] [Errno 101] Network is unreachable

Support gzip for models/tries

The binary LMs and ASCII tries are very large. Loading will be faster and use less space on disk if gzip'd. Gzip support can/should be optional based on libraries installed on system building the plugin.

word_count_weight and valid_word_count_weight

Hi, I cannot still understand the word_count_weight and valid_word_count_weight. What they affect to the scoring? I've found that each default value is 0. and 1. in ctc_beam_scorer_klm.h

import pytorch_ctc error

I have installed pytorch_ctc,but got the import error.

from pytorch_ctc import CTCBeamDecoder as CTCBD
Traceback (most recent call last):
File "", line 1, in
File "/home/bliu/anaconda3/lib/python3.5/site-packages/pytorch_ctc/init.py", line 4, in
from ._ctc_decode import lib as _lib, ffi as _ffi
ImportError: /home/bliu/anaconda3/lib/python3.5/site-packages/pytorch_ctc/_ctc_decode.cpython-35m-x86_64-linux-gnu.so: undefined symbol: _ZTVNSt7__cxx1118basic_stringstreamIcSt11char_traitsIcESaIcEEE

CPU RAM memory leak when using the beam search decoder

Hi,

I used the ctc beam decoder from this link https://github.com/joshemorris/pytorch-ctc. However, I found that after I finished decoding one utterance, the decoder does not release RAM memory. After decoding more and more sentences, the RAM was full. This is especially true if I use a large beam width such as 100, in which case RAM usage quickly blow up.

My code looks like this:

import pytorch_ctc
from pytorch_ctc import Scorer

decoder = pytorch_ctc.CTCBeamDecoder(Scorer(), labels, top_paths = 1, beam_width = 100, blank_index = 0, space_index = -1, merge_repeated=False)

for i in range(total_num_utterances):
decoded, _, out_seq_len = decoder.decode(prob_tensor_i, seq_len_i)

Anyone has any ideas how to fix this issue?

Thank you very much.

what is timestep?

I'm wondering why some solutions by beam search have the same timesteps.

For example, TextDecoders.test_beam_search_decoder_1() for the unit test in tests/test.py,
we can get top 20 solutions by beam search,
but top 8 of them have the same timesteps.
Some other solutions also have the same timesteps.
Is it OK with this situation?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.