mozilla / dsalign Goto Github PK

View Code? Open in Web Editor NEW

226.0 22.0 33.0 234 KB

DeepSpeech based forced alignment tool

License: Mozilla Public License 2.0

Python 97.92% Shell 2.08%

forced-alignment deepspeech

dsalign's Introduction

DSAlign

DeepSpeech based forced alignment tool

Installation

It is recommended to use this tool from within a virtual environment. After cloning and changing to the root of the project, there is a script for creating one with all requirements in the git-ignored dir venv:

$ bin/createenv.sh
$ ls venv
bin  include  lib  lib64  pyvenv.cfg  share

bin/align.sh will automatically use it.

Internally DSAlign uses the DeepSpeech STT engine. For it to be able to function, it requires a couple of files that are specific to the language of the speech data you want to align. If you want to align English, there is already a helper script that will download and prepare all required data:

$ bin/getmodel.sh 
[...]
$ ls models/en/
alphabet.txt  lm.binary  output_graph.pb  output_graph.pbmm  output_graph.tflite  trie

Overview and documentation

A typical application of the aligner is done in three phases:

Preparing the data. Albeit most of this has to be done individually, there are some tools for data preparation, statistics and maintenance. All involved file formats are described here.
Aligning the data using the alignment tool and it algorithm.
Exporting aligned data using the data-set exporter.

Quickstart example

Example data

There is a script for downloading and preparing some public domain speech and transcript data. It requires ffmpeg for some sample conversion.

$ bin/gettestdata.sh
$ ls data
test1  test2

Alignment using example data

Now the aligner can be called either "manually" (specifying all involved files directly):

$ bin/align.sh --audio data/test1/audio.wav --script data/test1/transcript.txt --aligned data/test1/aligned.json --tlog data/test1/transcript.log

Or "automatically" by specifying a so-called catalog file that bundles all involved paths:

$ bin/align.sh --catalog data/test1.catalog

dsalign's People

Contributors

Stargazers

Watchers

dsalign's Issues

handling long sentences

I have to ask that how to handle long sentences./
Sometimes, depending on the spoken, long sentences spoken by one person is aligned with a single duration length. like 30 sec 100 words or so (depends on the speaker spoken in one go), and all goes in one transcript. it would be kind if it can be broken and split into smaller possible fraction of sentences. If you can guide me how to resolve the issue. or any possibility to control no of words per transcript alignment duration?

/bin/align.sh --output-max-cer 15 --loglevel 10 --audio data/audio.wav --script data/transcript.txt --aligned data/result.json --tlog data/result.log --output-pretty --text-meaningful-newlines --align-phrase-snap-factor 2 --stt-max-duration 1000

VAD splitting: 0it [00:00, ?it/s]INFO:root:Fragment 0: Audio too long for STT
INFO:root:Fragment 1: Audio too long for STT

and it is missing text above this limit

INFO:root:Fragment 11: character error rate (CER) too high

Running the test1 to check the alignment gives out high character error rate, possibly due to the current deepspeech model supplemented with the project. will there be an updated model available? I can provide trained weights for the tensorflow/deepspeech2 implementation if you wanna integrate that with the project.

Phoneme-level alignment

According to #6, word-level alignment (instead of utterance-level alignment) should be possible, but is not yet implemented. Are there any plans for phoneme-level alignment, so that DSAlign could be used as a replacement for the Montreal Forced Aligner?

inconsistencies

there are some inconsistencies in the new update.
like
in requirement, tqdm is removed while in align.py it is importing.
also.
/align.py", line 364, in progress return it if args.no_progress else log_progress(it, interval=args.progress_interval, total=total) NameError: name 'log_progress' is not defined

Also, a question. with this change, is there any improvement in the handling of fragment / transcript log ?

Use other STT systems ?

Are you planning to support other STT systems, like Baidu's DeepSpeech 2? This would make it possible to use their pretrained models, which are said to be very good for Chinese.

assigning speaker

Dear Sir.

I have a plain text with wav file. This text is taken from deepspeech STT engine. Now I want to convert it into .script to get the speaker e.g., speaker1 speaker 2 or with speaker name etc. I tried with your readme / tutorial guide but unsuccessful.
Can you guide me the command line. Please help. Thank you. !!

Doesn't use GPU

I am testing DSAlign with the example data. Everything is installed and configured exactly as in the readme file. However, it doesn't seem to use GPUs. How do I make sure it does use GPUs please? Thanks!

trying to run the example in the readme.md

$ bin/align.sh --output-max-cer 15 --loglevel 10 data/test1/audio.wav data/test1/transcript.txt data/test1/aligned.json

align.py: error: unrecognized arguments: data/test1/audio.wav data/test1/transcript.txt data/test1/aligned.json

it seems the app has been rewritten to require flags that are not present in the example. i've tried adding in --audio and --script but it's unclear what to put before the 3rd argument (the json one). Also is there a step missing somewhere that would have generated the json file?

raising StopIteration causes a RunTime error on Python 3.8

From the Python docs,

Changed in version 3.3: Added value attribute and the ability for generator functions to use it to return a value.

Changed in version 3.5: Introduced the RuntimeError transformation via from future import generator_stop, see PEP 479.

Changed in version 3.7: Enable PEP 479 for all code by default: a StopIteration error raised in a generator is transformed into a RuntimeError.

Running into this now that i've been moving to 3.8. Seeking clarification on what the Author is trying to do here so I can patch the code on my end. I'm guessing (not a Python programmer) that we're trying to bail out of the iteration but on Python versions other than 3.6 it doesn't behave well.

Traceback: in split_match
raise StopIteration
StopIteration

edit, i was able to modify the Raise StopIteration lines to return and that seems to have done the trick.

Adapt for DeepSpeech 0.6.0

I'm trying to adapt this code for the new release of DeepSpeech. After some minor modifications in align/wavTranscriber.py, mostly CreateModel and enableDecoderWithLM, I'm running into the following error:

bin/align.sh --output-max-cer 15 --loglevel 10 --audio data/test1/audio.wav --script data/test1/transcript.txt --aligned data/test1/aligned.json --tlog data/test1/log.txt --stt-model-dir models/en/deepspeech-0.6.0-models --alphabet models/en/alphabet.txt
DEBUG:root:Start
DEBUG:root:Loading alphabet from "models/en/alphabet.txt"...
DEBUG:root:Looking for model files in "models/en/deepspeech-0.6.0-models"...
DEBUG:root:Loading acoustic model from "models/en/deepspeech-0.6.0-models/output_graph.pb", alphabet from "models/en/alphabet.txt" and language model from "data/test1/transcript.txt.lm"...
DEBUG:root:Transcribing VAD segments...
VAD splitting: 3464it [00:01, 2274.26it/s]
Transcribing:   0%|                                                                                                                                                                                                  | 0/3464 [00:00<?, ?it/s]DEBUG:root:Process 28330: Loaded models
TensorFlow: v1.14.0-21-ge77504ac6b
DeepSpeech: v0.6.0-0-g6d43e21
Warning: reading entire model file into memory. Transform model file into an mmapped graph to reduce heap usage.
2019-12-19 12:58:48.519864: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
Error: Trie file version mismatch (4 instead of expected 5). Update your trie file.
libc++abi.dylib: terminating with uncaught exception of type int

pip list
Package      Version
------------ -------
deepspeech   0.6.0
numpy        1.17.4
pip          19.0.3
pydub        0.23.1
setuptools   40.8.0
six          1.13.0
sox          1.3.7
textdistance 4.1.5
tqdm         4.40.2
webrtcvad    2.0.10

I have replaced generate_trie with the 0.6.0 version from native client. I can get a bit further by not generating specific LMs but then I run into this problem:

bin/align.sh --output-max-cer 15 --loglevel 10 --audio data/test1/audio.wav --script data/test1/transcript.txt --aligned data/test1/aligned.json --tlog data/test1/log.txt --stt-model-dir models/en/deepspeech-0.6.0-models --alphabet models/en/alphabet.txt --stt-no-own-lm
DEBUG:root:Start
DEBUG:root:Loading alphabet from "models/en/alphabet.txt"...
DEBUG:root:Looking for model files in "models/en/deepspeech-0.6.0-models"...
DEBUG:root:Loading acoustic model from "models/en/deepspeech-0.6.0-models/output_graph.pb", alphabet from "models/en/alphabet.txt", trie from "models/en/deepspeech-0.6.0-models/trie" and language model from "models/en/deepspeech-0.6.0-models/lm.binary"...
DEBUG:root:Transcribing VAD segments...
VAD splitting: 3464it [00:01, 2264.24it/s]
Transcribing:   0%|                                                                                                                                                                                                  | 0/3464 [00:00<?, ?it/s]DEBUG:root:Process 31650: Loaded models
TensorFlow: v1.14.0-21-ge77504ac6b
DeepSpeech: v0.6.0-0-g6d43e21
Warning: reading entire model file into memory. Transform model file into an mmapped graph to reduce heap usage.
2019-12-19 14:19:34.495576: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
DEBUG:root:Process 31650: Transcribing...
DEBUG:root:Process 31650: Transcribing...
multiprocessing.pool.RemoteTraceback:
"""
Traceback (most recent call last):
  File "/usr/local/Cellar/python/3.7.4_1/Frameworks/Python.framework/Versions/3.7/lib/python3.7/multiprocessing/pool.py", line 121, in worker
    result = (True, func(*args, **kwds))
  File "/Users/tobias/dev/git/DSAlign/align/align.py", line 74, in stt
    transcript = wavTranscriber.stt(model, audio, sample_rate)
  File "/Users/tobias/dev/git/DSAlign/align/wavTranscriber.py", line 40, in stt
    output = ds.stt(audio, fs)
  File "/Users/tobias/dev/git/DSAlign/venv/lib/python3.7/site-packages/deepspeech/__init__.py", line 93, in stt
    return deepspeech.impl.SpeechToText(self._impl, *args, **kwargs)
TypeError: SpeechToText() takes at most 2 arguments (3 given)
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/Users/tobias/dev/git/DSAlign/align/align.py", line 672, in <module>
DEBUG:root:Process 31650: Transcribing...
    main()
  File "/Users/tobias/dev/git/DSAlign/align/align.py", line 631, in main
    for time_start, time_end, segment_transcript in transcripts:
  File "/Users/tobias/dev/git/DSAlign/venv/lib/python3.7/site-packages/tqdm/std.py", line 1102, in __iter__
    for obj in iterable:
  File "/usr/local/Cellar/python/3.7.4_1/Frameworks/Python.framework/Versions/3.7/lib/python3.7/multiprocessing/pool.py", line 748, in next
    raise value
DEBUG:root:Process 31650: Transcribing...
TypeError: SpeechToText() takes at most 2 arguments (3 given)
DEBUG:root:Process 31650: Transcribing...
Transcribing:   0%|

I'm sure these are just minor problems and adapting DSAlign to DeepSpeech 0.6.0 won't be that difficult.

Fix for Mac OSX

I'm running this on a Mac. Please don't hate me. Readlink doesn't work on a Mac so I made some changes here: BoneGoat@f62b239
It's not portable but if there is interest we can fix that.

word level alignment

Instead of fragments, is there any easy way to get the word by word (word level) alignment? Pleae guide and help.

where is buildkenlm.sh?

Did not find it in bin/buildkenlm.sh?

Unable to generate document specific language model

I cannot seem to generate document specific language model and perform speech alignment with the command below:
bin/align.sh --output-max-cer 15 --loglevel 10 --audio data/14001-voice.wav --script data/14001.txt --aligned data/14001.aligned.json --tlog data/14001.log.txt

I have just installed and configured everything following the instructions. The two examples provided seem to work fine. However when I switch to my data, the error below occurs. Is this because my script file does not contain enough words?

DEBUG:root:Start
DEBUG:root:Looking for model files in "models/en"...
DEBUG:root:Loading alphabet from "models/en/alphabet.txt"...
=== 1/5 Counting and sorting n-grams ===
Reading /mnt/sdb/Tools/DSAlign/data/14001.txt.clean
----5---10---15---20---25---30---35---40---45---50---55---60---65---70---75---80---85---90---95--100

Unigram tokens 3370 types 698
=== 2/5 Calculating and sorting adjusted counts ===
Chain sizes: 1:8376 2:31559028736 3:59173183488 4:94677090304 5:138070753280
/mnt/sdb/Tools/DSAlign/dependencies/kenlm/lm/builder/adjust_counts.cc:52 in void lm::builder::{anonymous}::StatCollector::CalculateDiscounts(const lm::builder::DiscountConfig&) threw BadDiscountException because `s.n[j] == 0'.
Could not calculate Kneser-Ney discounts for 5-grams with adjusted count 4 because we didn't observe any 5-grams with adjusted count 3; Is this small or artificial data?
Try deduplicating the input. To override this error for e.g. a class-based model, rerun with --discount_fallback

Traceback (most recent call last):
File "/mnt/sdb/Tools/DSAlign/align/align.py", line 653, in
main()
File "/mnt/sdb/Tools/DSAlign/align/align.py", line 563, in main
'5'
File "/usr/lib/python3.6/subprocess.py", line 311, in check_call
raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['dependencies/kenlm/build/bin/lmplz', '--text', 'data/14001.txt.clean', '--arpa', 'data/14001.txt.arpa', '--o', '5']' died with <Signals.SIGABRT: 6>.

Is there any way to speed the alignment process?

Thanks for supporting such an excellent tool to align sentence and audio!

I have a question that is there is a way to speed the alignment process? Now i have about 80k pairs of audio files and transcripts to process.... There is a RTX2080 on my server, while use this command bin/align.sh --catalog demo.catalog only use about 153Mb of the total 11Gb memory.

alphabet issues

I have found this tool very interesting until now. I have to ask the issue which could be the JSON or python character issue. If you can help.
the original text has special characters like ä or é and they are also in the alphabet file. while transcribing, it is showing the json format with "\u00e4 " output. Can you help me why is it so and how to resolve it? Is this typical unicode issue for export JSON?

Excluded {} empty transcripts error

Traceback (most recent call last):
  File "align/align.py", line 673, in <module>
    main()
  File "align/align.py", line 640, in main
    logging.debug('Excluded {} empty transcripts'.format(len(transcripts) - len(fragments)))
TypeError: object of type 'IMapIterator' has no len()

TaskCluster Download Issue

On a linux system, the bin/lm-dependencies.sh script fails. To reproduce the issue only four commands are needed:

git clone https://github.com/mozilla/DSAlign.git
cd DSAlign/
bin/createenv.sh
bin/lm-dependencies.sh

lm-dependencies.sh compiles fine, but then fails at the downloading step:

Downloading https://community-tc.services.mozilla.com/api/index/v1/task/project.deepspeech.deepspeech.native_client.v0.7.1.cpu/artifacts/public/native_client.tar.xz ...
Traceback (most recent call last):
  File "/home/kaleko/DSAlign/bin/taskcluster.py", line 153, in <module>
    main()
  File "/home/kaleko/DSAlign/bin/taskcluster.py", line 147, in main
    maybe_download_tc(target_dir=args.target, tc_url=get_tc_url(args.arch, args.artifact, args.branch))
  File "/home/kaleko/DSAlign/bin/taskcluster.py", line 57, in maybe_download_tc
    urllib.request.urlretrieve(tc_url, target_file, reporthook=(report_progress if progress else None))
  File "/home/kaleko/anaconda3/lib/python3.7/urllib/request.py", line 247, in urlretrieve
    with contextlib.closing(urlopen(url, data)) as fp:
  File "/home/kaleko/anaconda3/lib/python3.7/urllib/request.py", line 222, in urlopen
    return opener.open(url, data, timeout)
  File "/home/kaleko/anaconda3/lib/python3.7/urllib/request.py", line 531, in open
    response = meth(req, response)
  File "/home/kaleko/anaconda3/lib/python3.7/urllib/request.py", line 641, in http_response
    'http', request, response, code, msg, hdrs)
  File "/home/kaleko/anaconda3/lib/python3.7/urllib/request.py", line 569, in error
    return self._call_chain(*args)
  File "/home/kaleko/anaconda3/lib/python3.7/urllib/request.py", line 503, in _call_chain
    result = func(*args)
  File "/home/kaleko/anaconda3/lib/python3.7/urllib/request.py", line 649, in http_error_default
    raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 404: Not Found

MemoryError when calling align on a large audio file (~1gb mp3)

> Traceback (most recent call last):
>   File "DSAlign/align/align.py", line 653, in <module>
>     main()
>   File "DSAlign/align/align.py", line 593, in main
>     segments, rate, audio_length = wavSplit.vad_segment_generator(audio, aggressiveness)
>   File "DSAlign/align/wavSplit.py", line 122, in vad_segment_generator
>     audio = (AudioSegment.from_file(audio_file)
>   File "DSAlign/venv/lib/python3.6/site-packages/pydub/audio_segment.py", line 706, in from_file
>     p_out = bytearray(p_out)
>

I was just curious to see how DSAlign would handle a large input. Perhaps instead of generating all the segments all at once and then processing them, we could generate and consume segments as needed. Not sure how that would work in practice or if it would be too complicated to make such a change. Also is it ok to make issues like these or am I being a nuisance?

aligned output json doesn't have unicode text

Instead of showing unicode aligned text strings, it has for example strings that look like "\uXXXX\uYYYY....". Any way to tell the tool to output unicode correctly?

edit. Something like "".join(s) is the desired output instead of the s in the output.

Alphabet is not defined despite being loaded.

Ran a small example it doesn't produce the align.json file.

"""
Traceback (most recent call last):
  File "/usr/local/Cellar/[email protected]/3.8.5/Frameworks/Python.framework/Versions/3.8/lib/python3.8/multiprocessing/pool.py", line 125, in worker
    result = (True, func(*args, **kwds))
  File "/Users/raisintoastllc/MachineLearning/Projects/SpeechSynthesis/DataProcessing/DSAlign/align/align.py", line 85, in align
    tc = read_script(script)
  File "/Users/raisintoastllc/MachineLearning/Projects/SpeechSynthesis/DataProcessing/DSAlign/align/align.py", line 47, in read_script
    tc = TextCleaner(alphabet,
NameError: name 'alphabet' is not defined
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/Users/raisintoastllc/MachineLearning/Projects/SpeechSynthesis/DataProcessing/DSAlign/align/align.py", line 682, in <module>
    main()
  File "/Users/raisintoastllc/MachineLearning/Projects/SpeechSynthesis/DataProcessing/DSAlign/align/align.py", line 521, in main
    for aligned_file, file_total_fragments, file_dropped_fragments, file_reasons in \
  File "/Users/raisintoastllc/MachineLearning/Projects/SpeechSynthesis/DataProcessing/DSAlign/align/utils.py", line 73, in log_progress
    for global_step, obj in enumerate(it, 1):
  File "/usr/local/Cellar/[email protected]/3.8.5/Frameworks/Python.framework/Versions/3.8/lib/python3.8/multiprocessing/pool.py", line 868, in next
    raise value
NameError: name 'alphabet' is not defined

Is the transcript.txt in data/test1 correct ?

I ran ./bin/gettestdata.sh and the transcript.txt file downloaded in data/test1 does not contain the transcript of the wav file downloaded with it.

Instead, transcript.txt only contains Google legal disclaimer "Google. This is a digital copy of a book that was preserved for generations on library shelves before it was carefully scanned by Google as part of a project [etc.]".

Did I miss something or did anything wrong ?

Part of aligned text gets shifted to the next segment

I've seen this in DS 0.6.x and the current DS 0.7.x fork I'm working on but I cannot figure out why it's happening. The first aligned audio segment contains "av juli" but the aligned text does not.

   {
        "start": 77880.0,
        "end": 81480.0,
        "transcript": "en första preliminär rapport ska vara klar i mitten av juli", <= "av juli" is shifted to the next segment
        "text-start": 1630,
        "text-end": 1682,
        "meta": {},
        "aligned-raw": "En första preliminär rapport ska vara klar i mitten ", <= "av juli" is missing and there is a space here
        "aligned": "en första preliminär rapport ska vara klar i mitten "
    },
    {
        "start": 81570.0,
        "end": 86100.0,
        "transcript": "och den kan följas om en formell utredning om brott mot eus finansregler",
        "text-start": 1682,
        "text-end": 1761,
        "meta": {},
        "aligned-raw": "av juli. Den kan följas av en formell utredning om brott mot EU:s finansregler.", <= "av juli." should not be there
        "aligned": "av juli den kan följas av en formell utredning om brott mot eus finansregler"
    }

./bin/lm-dependencies.sh is failing because some files are missing

Error:

Downloading https://community-tc.services.mozilla.com/api/index/v1/task/project.deepspeech.deepspeech.native_client.v0.6.0.cpu/artifacts/public/native_client.tar.xz ...
Traceback (most recent call last):
  File "/home/ubuntu/DSAlign/bin/taskcluster.py", line 153, in <module>
    main()
  File "/home/ubuntu/DSAlign/bin/taskcluster.py", line 147, in main
    maybe_download_tc(target_dir=args.target, tc_url=get_tc_url(args.arch, args.artifact, args.branch))
  File "/home/ubuntu/DSAlign/bin/taskcluster.py", line 57, in maybe_download_tc
    urllib.request.urlretrieve(tc_url, target_file, reporthook=(report_progress if progress else None))
  File "/home/ubuntu/miniconda3/lib/python3.7/urllib/request.py", line 247, in urlretrieve
    with contextlib.closing(urlopen(url, data)) as fp:
  File "/home/ubuntu/miniconda3/lib/python3.7/urllib/request.py", line 222, in urlopen
    return opener.open(url, data, timeout)
  File "/home/ubuntu/miniconda3/lib/python3.7/urllib/request.py", line 531, in open
    response = meth(req, response)
  File "/home/ubuntu/miniconda3/lib/python3.7/urllib/request.py", line 641, in http_response
    'http', request, response, code, msg, hdrs)
  File "/home/ubuntu/miniconda3/lib/python3.7/urllib/request.py", line 569, in error
    return self._call_chain(*args)
  File "/home/ubuntu/miniconda3/lib/python3.7/urllib/request.py", line 503, in _call_chain
    result = func(*args)
  File "/home/ubuntu/miniconda3/lib/python3.7/urllib/request.py", line 649, in http_error_default
    raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 404: Not Found

Tensorflow warning

When I am aligning my audio and transcript. It is getting aligned successfully but here is my log.

INFO:root:VAD splitting
2 (elapsed: 00:00:00, speed: 358.30 it/s)
TensorFlow: v1.15.0-24-gceb46aa
DeepSpeech: v0.7.1-0-g2e9c281
INFO:root:Transcribing
2020-12-02 11:15:00.031793: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
1 of 2 : 50.00% (elapsed: 00:00:03, speed: 0.29 it/s, ETA: 00:00:03)
2 of 2 : 100.00% (elapsed: 00:00:03, speed: 2.07 it/s, ETA: 00:00:00)
INFO:root:Aligning
1 of 1 : 100.00% (elapsed: 00:00:00, speed: 5.17 it/s, ETA: 00:00:00)
INFO:root:Aligned 2 fragments

should I be concerned with the highlighted warning or it is fine. Why is this occuring

Duplicating audio/text pairs

Hello @tilmankamp,

I'm using the transcribe.py with the catalog filetype tool to align my audios, but it is generating duplicated transcriptions.

Audio type
Very long audios with about 1h of pure speech.

Text type
Very long text with the correct text in a sequential way, no punctuation, pure text. The text was reviewed manually by a professional so it is 98%+ accurate to the audio.

I'm using everything on default, but it still duplicates if I play with the configuration, I always use the same catalog file for both process aligning and cutting.

I've validated that the segment of duplicated text appears one time in the whole text to cut.

Thanks.

Large catalog spawns many processes which won't die

Hi,

I have a catalog of over 30k files and align.py will spawn multiple times until it will crash with too many open files error:

OSError: SoX failed! [Errno 24] Too many open files
ERROR:sox:OSError: SoX failed! [Errno 24] Too many open files
Traceback (most recent call last):
  File "/home/tobias/dev/git/DSAlign/align/align.py", line 689, in <module>
    main()
  File "/home/tobias/dev/git/DSAlign/align/align.py", line 499, in main
    samples = list(progress(pre_filter(), desc='VAD splitting'))
  File "/home/tobias/dev/git/DSAlign/venv/lib/python3.7/site-packages/tqdm/std.py", line 1107, in __iter__
    for obj in iterable:
  File "/home/tobias/dev/git/DSAlign/align/align.py", line 488, in pre_filter
    for i, segment in enumerate(segments):
  File "/home/tobias/dev/git/DSAlign/align/audio.py", line 225, in vad_split
    for frame_index, frame in enumerate(audio_frames):
  File "/home/tobias/dev/git/DSAlign/align/audio.py", line 200, in read_frames_from_file
    with AudioFile(audio_path, audio_format=audio_format) as wav_file:
  File "/home/tobias/dev/git/DSAlign/align/audio.py", line 173, in __enter__
    convert_audio(self.audio_path, self.tmp_file_path, file_type='wav', audio_format=self.audio_format)
  File "/home/tobias/dev/git/DSAlign/align/audio.py", line 134, in convert_audio
    transformer.build(src_audio_path, dst_audio_path)
  File "/home/tobias/dev/git/DSAlign/venv/lib/python3.7/site-packages/sox/transform.py", line 441, in build
    "Stdout: {}\nStderr: {}".format(out, err)
sox.core.SoxError: Stdout: None
Stderr: None

Every 2.0s: ps aux |grep align.py                                                                                                                                           nasapa: Sat Feb 29 19:30:43 2020

tobias   17405  1.2  0.7 2868040 127720 pts/1  Sl+  19:12   0:13 python /home/tobias/dev/git/DSAlign/align/align.py --output-max-cer 15 --loglevel 10 --stt-model-dir models/sv --stt-workers 1 --catalog /m
tobias   17432  6.2  1.7 1163416 284784 pts/1  Sl+  19:12   1:09 python /home/tobias/dev/git/DSAlign/align/align.py --output-max-cer 15 --loglevel 10 --stt-model-dir models/sv --stt-workers 1 --catalog /m
tobias   17662  6.6  1.7 1164664 287544 pts/1  Sl+  19:13   1:08 python /home/tobias/dev/git/DSAlign/align/align.py --output-max-cer 15 --loglevel 10 --stt-model-dir models/sv --stt-workers 1 --catalog /m
tobias   17801  4.1  1.7 1161720 285876 pts/1  Sl+  19:14   0:40 python /home/tobias/dev/git/DSAlign/align/align.py --output-max-cer 15 --loglevel 10 --stt-model-dir models/sv --stt-workers 1 --catalog /m
tobias   17924  8.5  1.7 1237952 291292 pts/1  Sl+  19:14   1:21 python /home/tobias/dev/git/DSAlign/align/align.py --output-max-cer 15 --loglevel 10 --stt-model-dir models/sv --stt-workers 1 --catalog /m
tobias   18072  9.9  1.7 1458364 290144 pts/1  Sl+  19:16   1:26 python /home/tobias/dev/git/DSAlign/align/align.py --output-max-cer 15 --loglevel 10 --stt-model-dir models/sv --stt-workers 1 --catalog /m
tobias   18212  1.6  1.7 1679032 288132 pts/1  Sl+  19:17   0:13 python /home/tobias/dev/git/DSAlign/align/align.py --output-max-cer 15 --loglevel 10 --stt-model-dir models/sv --stt-workers 1 --catalog /m
tobias   18331  7.1  1.7 1900204 293768 pts/1  Sl+  19:17   0:56 python /home/tobias/dev/git/DSAlign/align/align.py --output-max-cer 15 --loglevel 10 --stt-model-dir models/sv --stt-workers 1 --catalog /m
tobias   18465  1.9  1.7 2121456 289700 pts/1  Sl+  19:18   0:14 python /home/tobias/dev/git/DSAlign/align/align.py --output-max-cer 15 --loglevel 10 --stt-model-dir models/sv --stt-workers 1 --catalog /m
tobias   18583 10.0  1.8 2342652 294884 pts/1  Sl+  19:18   1:13 python /home/tobias/dev/git/DSAlign/align/align.py --output-max-cer 15 --loglevel 10 --stt-model-dir models/sv --stt-workers 1 --catalog /m
tobias   18719 14.2  1.8 2567492 301520 pts/1  Sl+  19:19   1:34 python /home/tobias/dev/git/DSAlign/align/align.py --output-max-cer 15 --loglevel 10 --stt-model-dir models/sv --stt-workers 1 --catalog /m
tobias   18872 11.7  1.8 2786148 296664 pts/1  Sl+  19:21   1:08 python /home/tobias/dev/git/DSAlign/align/align.py --output-max-cer 15 --loglevel 10 --stt-model-dir models/sv --stt-workers 1 --catalog /m
tobias   19008 13.6  1.8 2810460 297716 pts/1  Sl+  19:22   1:10 python /home/tobias/dev/git/DSAlign/align/align.py --output-max-cer 15 --loglevel 10 --stt-model-dir models/sv --stt-workers 1 --catalog /m
tobias   19145 17.2  1.8 2835508 301320 pts/1  Sl+  19:23   1:18 python /home/tobias/dev/git/DSAlign/align/align.py --output-max-cer 15 --loglevel 10 --stt-model-dir models/sv --stt-workers 1 --catalog /m
tobias   19282  8.8  1.8 2859920 299984 pts/1  Sl+  19:24   0:34 python /home/tobias/dev/git/DSAlign/align/align.py --output-max-cer 15 --loglevel 10 --stt-model-dir models/sv --stt-workers 1 --catalog /m
tobias   19402 25.4  1.8 2884740 302236 pts/1  Sl+  19:24   1:30 python /home/tobias/dev/git/DSAlign/align/align.py --output-max-cer 15 --loglevel 10 --stt-model-dir models/sv --stt-workers 1 --catalog /m
tobias   19553 21.7  1.8 2909136 304040 pts/1  Sl+  19:26   1:00 python /home/tobias/dev/git/DSAlign/align/align.py --output-max-cer 15 --loglevel 10 --stt-model-dir models/sv --stt-workers 1 --catalog /m
tobias   19677  7.7  1.8 2933740 302824 pts/1  Sl+  19:26   0:17 python /home/tobias/dev/git/DSAlign/align/align.py --output-max-cer 15 --loglevel 10 --stt-model-dir models/sv --stt-workers 1 --catalog /m
tobias   19812 43.2  1.8 2959768 308860 pts/1  Sl+  19:27   1:29 python /home/tobias/dev/git/DSAlign/align/align.py --output-max-cer 15 --loglevel 10 --stt-model-dir models/sv --stt-workers 1 --catalog /m
tobias   19837  0.2  0.0  15204  3652 pts/0    S+   19:27   0:00 watch ps aux |grep align.py
tobias   20066 77.1  1.8 2983964 307360 pts/1  Sl+  19:28   1:39 python /home/tobias/dev/git/DSAlign/align/align.py --output-max-cer 15 --loglevel 10 --stt-model-dir models/sv --stt-workers 1 --catalog /m
tobias   20395 70.2  1.8 3008568 307056 pts/1  Sl+  19:30   0:29 python /home/tobias/dev/git/DSAlign/align/align.py --output-max-cer 15 --loglevel 10 --stt-model-dir models/sv --stt-workers 1 --catalog /m
tobias   20566  113  1.8 3033172 310400 pts/1  Sl+  19:30   0:18 python /home/tobias/dev/git/DSAlign/align/align.py --output-max-cer 15 --loglevel 10 --stt-model-dir models/sv --stt-workers 1 --catalog /m
tobias   20611  0.0  0.0  15204  1156 pts/0    S+   19:30   0:00 watch ps aux |grep align.py
tobias   20612  0.0  0.0   4628   920 pts/0    S+   19:30   0:00 sh -c ps aux |grep align.py
tobias   20614  0.0  0.0  14680  1028 pts/0    S+   19:30   0:00 grep align.py

Best regards

ERROR: 4-gram discount out of range for adjusted count 3: -0.9085922

when running align.sh for a 5 sec audio against a transcript ,I am getting the above error.
can someone help!!

Update DSAlign to 0.7+

Hi, first of all thank You for this awsome tool.
@tilmankamp have you considered updating DSAlign to DeepSpeech 0.7.3?

Could not generate example data from bin/gettestdata.sh

Could not generate example data from bin/gettestdata.sh. Hope to know whether that's my network problem ?

Dataset/catalog access

Hi @tilmankamp and thank you for sharing this !
I assume you have used DSAlign to generate large datasets from sources like librivox.
Do you plan to share them with the community?
Best regards

seems to hang if transcript contains no alphabet characters

If you give DSAlign an mp3 and a transcript without any letters, like 1 + 2 = 3 or 안녕하세요 when transcribing English, it hangs somewhere during transcription with ~0% CPU

I ran this on my Mac and it's repeatedly crashing in _wrap_SpeechToText -> DS_SpeechToText, maybe the LM or trie is totally broken?

It makes an lm.arpa like this:

\data\
ngram 1=2
ngram 2=0
ngram 3=0
ngram 4=0
ngram 5=0

\1-grams:
0	<unk>	0
0	<s>	0

\2-grams:

\3-grams:

\4-grams:

\5-grams:

\end\

StopIteration error while aligning

Hello,
After launching this command :
(DSAlign) (deepspeech-gpu-venv) lerner@m148:~/DSAlign$ bin/align.sh --output-max-cer 15 --loglevel 10 --audio data/test2/asyoulikeit_0_shakespeare_64kb.mp3 --script data/test2/transcript.script --aligned data/test2/aligned.json --tlog data/test2/tlog.tlog --stt-workers 1 --stt-model-dir deepspeech-0.5.1-models --stt-no-own-lm

The text cleaning STT went fine, however, when the alignment starts I have a StopIteration error in file align.py. Complete output :

DEBUG:root:Start
DEBUG:root:Looking for model files in "deepspeech-0.5.1-models"...
DEBUG:root:Loading alphabet from "deepspeech-0.5.1-models/alphabet.txt"...
DEBUG:root:Loading acoustic model from "deepspeech-0.5.1-models/output_graph.pb", alphabet from "deepspeech-0.5.1-models/alphabet.txt" and language model from "deepspeech-0.5.1-models/lm.binary"...
DEBUG:root:Transcribing VAD segments...
DEBUG:pydub.converter:subprocess.call(['ffmpeg', '-y', '-i', 'data/test2/asyoulikeit_0_shakespeare_64kb.mp3', '-acodec', 'pcm_s16le', '-vn', '-f', 'wav', '-'])
VAD splitting: 55it [00:00, 1000.69it/s]
DEBUG:root:Process 41639: Loaded models
TensorFlow: v1.13.1-10-g3e0cc53
DeepSpeech: v0.5.1-0-g4b29b78
Warning: reading entire model file into memory. Transform model file into an mmapped graph to reduce heap usage.
2019-10-07 12:35:37.356386: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
Transcribing:   0%|                                                                                                                                        | 0/55 [00:00<?, ?it/s]2019-10-07 12:35:38.310379: E tensorflow/core/framework/op_kernel.cc:1325] OpKernel ('op: "UnwrapDatasetVariant" device_type: "CPU"') for unknown op: UnwrapDatasetVariant
2019-10-07 12:35:38.310440: E tensorflow/core/framework/op_kernel.cc:1325] OpKernel ('op: "WrapDatasetVariant" device_type: "GPU" host_memory_arg: "input_handle" host_memory_arg: "output_handle"') for unknown op: WrapDatasetVariant
2019-10-07 12:35:38.310458: E tensorflow/core/framework/op_kernel.cc:1325] OpKernel ('op: "WrapDatasetVariant" device_type: "CPU"') for unknown op: WrapDatasetVariant
2019-10-07 12:35:38.310672: E tensorflow/core/framework/op_kernel.cc:1325] OpKernel ('op: "UnwrapDatasetVariant" device_type: "GPU" host_memory_arg: "input_handle" host_memory_arg: "output_handle"') for unknown op: UnwrapDatasetVariant
DEBUG:root:Process 41639: Transcribing...
2019-10-07 12:35:43.184433: W tensorflow/core/framework/allocator.cc:124] Allocation of 134217728 exceeds 10% of system memory.
2019-10-07 12:35:43.666503: W tensorflow/core/framework/allocator.cc:124] Allocation of 134217728 exceeds 10% of system memory.
2019-10-07 12:35:44.440064: W tensorflow/core/framework/allocator.cc:124] Allocation of 134217728 exceeds 10% of system memory.
2019-10-07 12:35:44.542822: W tensorflow/core/framework/allocator.cc:124] Allocation of 134217728 exceeds 10% of system memory.
2019-10-07 12:35:44.646766: W tensorflow/core/framework/allocator.cc:124] Allocation of 134217728 exceeds 10% of system memory.
DEBUG:root:Process 41639: as you like it by william shakespeare
DEBUG:root:Process 41639: Transcribing...
Transcribing:   2%|██▎                                                                                                                             | 1/55 [00:10<09:17, 10.32s/it]DEBUG:root:Process 41639: this is a liberator ing
DEBUG:root:Process 41639: Transcribing...
Transcribing:   4%|████▋                                                                                                                           | 2/55 [00:11<06:44,  7.64s/it]DEBUG:root:Process 41639: all over but recording or in the public domain
DEBUG:root:Process 41639: Transcribing...
Transcribing:   5%|██████▉                                                                                                                         | 3/55 [00:13<05:11,  5.99s/it]DEBUG:root:Process 41639: for more information or to volunteer
DEBUG:root:Process 41639: Transcribing...
Transcribing:   7%|█████████▎                                                                                                                      | 4/55 [00:15<04:03,  4.78s/it]DEBUG:root:Process 41639: is it liberal
DEBUG:root:Process 41639: Transcribing...
Transcribing:   9%|███████████▋                                                                                                                    | 5/55 [00:17<03:05,  3.72s/it]DEBUG:root:Process 41639: 
DEBUG:root:Process 41639: Transcribing...
Transcribing:  11%|█████████████▉                                                                                                                  | 6/55 [00:17<02:18,  2.83s/it]DEBUG:root:Process 41639: dramatis personae
DEBUG:root:Process 41639: Transcribing...
Transcribing:  13%|████████████████▎                                                                                                               | 7/55 [00:19<01:54,  2.39s/it]DEBUG:root:Process 41639: du seen the red by heavy
DEBUG:root:Process 41639: Transcribing...
Transcribing:  15%|██████████████████▌                                                                                                             | 8/55 [00:21<01:54,  2.43s/it]DEBUG:root:Process 41639: to frederick
DEBUG:root:Process 41639: Transcribing...
Transcribing:  16%|████████████████████▉                                                                                                           | 9/55 [00:22<01:32,  2.01s/it]DEBUG:root:Process 41639: by
DEBUG:root:Process 41639: Transcribing...
Transcribing:  18%|███████████████████████                                                                                                        | 10/55 [00:23<01:13,  1.64s/it]DEBUG:root:Process 41639: sugah
DEBUG:root:Process 41639: Transcribing...
Transcribing:  20%|█████████████████████████▍                                                                                                     | 11/55 [00:24<01:03,  1.45s/it]DEBUG:root:Process 41639: he means read by cecilia prior
DEBUG:root:Process 41639: Transcribing...
Transcribing:  22%|███████████████████████████▋                                                                                                   | 12/55 [00:26<01:11,  1.66s/it]DEBUG:root:Process 41639: jack was
DEBUG:root:Process 41639: Transcribing...
Transcribing:  24%|██████████████████████████████                                                                                                 | 13/55 [00:27<01:01,  1.47s/it]DEBUG:root:Process 41639: read by elizabeth let
DEBUG:root:Process 41639: Transcribing...
Transcribing:  25%|████████████████████████████████▎                                                                                              | 14/55 [00:29<00:59,  1.44s/it]DEBUG:root:Process 41639: labo
DEBUG:root:Process 41639: Transcribing...
Transcribing:  27%|██████████████████████████████████▋                                                                                            | 15/55 [00:29<00:51,  1.29s/it]DEBUG:root:Process 41639: red by simon lover
DEBUG:root:Process 41639: Transcribing...
Transcribing:  29%|████████████████████████████████████▉                                                                                          | 16/55 [00:31<00:55,  1.42s/it]DEBUG:root:Process 41639: charles
DEBUG:root:Process 41639: Transcribing...
Transcribing:  31%|███████████████████████████████████████▎                                                                                       | 17/55 [00:32<00:46,  1.22s/it]DEBUG:root:Process 41639: read by me on my
DEBUG:root:Process 41639: Transcribing...
Transcribing:  33%|█████████████████████████████████████████▌                                                                                     | 18/55 [00:33<00:47,  1.28s/it]DEBUG:root:Process 41639: and even
DEBUG:root:Process 41639: Transcribing...
Transcribing:  35%|███████████████████████████████████████████▊                                                                                   | 19/55 [00:34<00:41,  1.15s/it]DEBUG:root:Process 41639: red bateiseki
DEBUG:root:Process 41639: Transcribing...
Transcribing:  36%|██████████████████████████████████████████████▏                                                                                | 20/55 [00:36<00:44,  1.28s/it]DEBUG:root:Process 41639: jake was the air
DEBUG:root:Process 41639: Transcribing...
Transcribing:  38%|████████████████████████████████████████████████▍                                                                              | 21/55 [00:37<00:43,  1.28s/it]DEBUG:root:Process 41639: red by david lawrence
DEBUG:root:Process 41639: Transcribing...
Transcribing:  40%|██████████████████████████████████████████████████▊                                                                            | 22/55 [00:38<00:43,  1.31s/it]DEBUG:root:Process 41639: part of orlando
DEBUG:root:Process 41639: Transcribing...
Transcribing:  42%|█████████████████████████████████████████████████████                                                                          | 23/55 [00:40<00:42,  1.33s/it]DEBUG:root:Process 41639: by m b
Transcribing:  44%|███████████████████████████████████████████████████████▍                                                                       | 24/55 [00:41<00:42,  1.36s/it]DEBUG:root:Process 41639: Transcribing...
DEBUG:root:Process 41639: adam
DEBUG:root:Process 41639: Transcribing...
Transcribing:  45%|█████████████████████████████████████████████████████████▋                                                                     | 25/55 [00:42<00:34,  1.17s/it]DEBUG:root:Process 41639: the papeete
DEBUG:root:Process 41639: Transcribing...
Transcribing:  47%|████████████████████████████████████████████████████████████                                                                   | 26/55 [00:44<00:39,  1.37s/it]DEBUG:root:Process 41639: denis
DEBUG:root:Process 41639: Transcribing...
Transcribing:  49%|██████████████████████████████████████████████████████████████▎                                                                | 27/55 [00:45<00:32,  1.17s/it]DEBUG:root:Process 41639: red by rosemont
DEBUG:root:Process 41639: Transcribing...
Transcribing:  51%|████████████████████████████████████████████████████████████████▋                                                              | 28/55 [00:46<00:32,  1.20s/it]DEBUG:root:Process 41639: touched down played by mark smith
DEBUG:root:Process 41639: Transcribing...
Transcribing:  53%|██████████████████████████████████████████████████████████████████▉                                                            | 29/55 [00:48<00:38,  1.48s/it]DEBUG:root:Process 41639: line for sir oliver mar text read by rondelet
DEBUG:root:Process 41639: Transcribing...
Transcribing:  55%|█████████████████████████████████████████████████████████████████████▎                                                         | 30/55 [00:51<00:51,  2.07s/it]DEBUG:root:Process 41639: saint louis missouri
DEBUG:root:Process 41639: Transcribing...
Transcribing:  56%|███████████████████████████████████████████████████████████████████████▌                                                       | 31/55 [00:53<00:43,  1.82s/it]DEBUG:root:Process 41639: corin read by beladen
DEBUG:root:Process 41639: Transcribing...
Transcribing:  58%|█████████████████████████████████████████████████████████████████████████▉                                                     | 32/55 [00:55<00:43,  1.90s/it]DEBUG:root:Process 41639: out of
DEBUG:root:Process 41639: Transcribing...
Transcribing:  60%|████████████████████████████████████████████████████████████████████████████▏                                                  | 33/55 [00:56<00:35,  1.61s/it]DEBUG:root:Process 41639: sylvie
DEBUG:root:Process 41639: Transcribing...
Transcribing:  62%|██████████████████████████████████████████████████████████████████████████████▌                                                | 34/55 [00:56<00:28,  1.37s/it]DEBUG:root:Process 41639: red by
DEBUG:root:Process 41639: Transcribing...
Transcribing:  64%|████████████████████████████████████████████████████████████████████████████████▊                                              | 35/55 [00:57<00:24,  1.22s/it]DEBUG:root:Process 41639: david's nickel
DEBUG:root:Process 41639: Transcribing...
Transcribing:  65%|███████████████████████████████████████████████████████████████████████████████████▏                                           | 36/55 [00:58<00:21,  1.11s/it]DEBUG:root:Process 41639: william read by even but in our
Transcribing:  67%|█████████████████████████████████████████████████████████████████████████████████████▍                                         | 37/55 [01:01<00:27,  1.55s/it]DEBUG:root:Process 41639: Transcribing...
DEBUG:root:Process 41639: i mind read by lorella anderson
DEBUG:root:Process 41639: Transcribing...
Transcribing:  69%|███████████████████████████████████████████████████████████████████████████████████████▋                                       | 38/55 [01:03<00:31,  1.84s/it]DEBUG:root:Process 41639: the part of rosalind
DEBUG:root:Process 41639: Transcribing...
Transcribing:  71%|██████████████████████████████████████████████████████████████████████████████████████████                                     | 39/55 [01:05<00:27,  1.73s/it]DEBUG:root:Process 41639: read by rosalind will
DEBUG:root:Process 41639: Transcribing...
Transcribing:  73%|████████████████████████████████████████████████████████████████████████████████████████████▎                                  | 40/55 [01:06<00:24,  1.63s/it]DEBUG:root:Process 41639: sea read by felipe
DEBUG:root:Process 41639: Transcribing...
Transcribing:  75%|██████████████████████████████████████████████████████████████████████████████████████████████▋                                | 41/55 [01:08<00:22,  1.63s/it]DEBUG:root:Process 41639: see
DEBUG:root:Process 41639: Transcribing...
Transcribing:  76%|████████████████████████████████████████████████████████████████████████████████████████████████▉                              | 42/55 [01:08<00:17,  1.36s/it]DEBUG:root:Process 41639: red by charlie veemeth
DEBUG:root:Process 41639: Transcribing...
Transcribing:  78%|███████████████████████████████████████████████████████████████████████████████████████████████████▎                           | 43/55 [01:10<00:17,  1.46s/it]DEBUG:root:Process 41639: are
DEBUG:root:Process 41639: Transcribing...
Transcribing:  80%|█████████████████████████████████████████████████████████████████████████████████████████████████████▌                         | 44/55 [01:11<00:14,  1.29s/it]DEBUG:root:Process 41639: tad by mandy eh
DEBUG:root:Process 41639: Transcribing...
Transcribing:  82%|███████████████████████████████████████████████████████████████████████████████████████████████████████▉                       | 45/55 [01:13<00:13,  1.38s/it]DEBUG:root:Process 41639: first lord
DEBUG:root:Process 41639: Transcribing...
Transcribing:  84%|██████████████████████████████████████████████████████████████████████████████████████████████████████████▏                    | 46/55 [01:14<00:11,  1.30s/it]DEBUG:root:Process 41639: red by ananus
DEBUG:root:Process 41639: Transcribing...
Transcribing:  85%|████████████████████████████████████████████████████████████████████████████████████████████████████████████▌                  | 47/55 [01:15<00:11,  1.38s/it]DEBUG:root:Process 41639: second lord
DEBUG:root:Process 41639: Transcribing...
Transcribing:  87%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████▊                | 48/55 [01:16<00:08,  1.27s/it]DEBUG:root:Process 41639: red by david lawrence
DEBUG:root:Process 41639: Transcribing...
Transcribing:  89%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████▏             | 49/55 [01:18<00:08,  1.33s/it]DEBUG:root:Process 41639: first page read by ruth golding
DEBUG:root:Process 41639: Transcribing...
Transcribing:  91%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████▍           | 50/55 [01:20<00:08,  1.64s/it]DEBUG:root:Process 41639: second page
DEBUG:root:Process 41639: Transcribing...
Transcribing:  93%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▊         | 51/55 [01:21<00:05,  1.48s/it]DEBUG:root:Process 41639: read by david a canal
DEBUG:root:Process 41639: Transcribing...
Transcribing:  95%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████       | 52/55 [01:23<00:04,  1.44s/it]DEBUG:root:Process 41639: the forester played by jack in
DEBUG:root:Process 41639: Transcribing...
Transcribing:  96%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▍    | 53/55 [01:25<00:03,  1.65s/it]DEBUG:root:Process 41639: stage directions read by marian waldon
Transcribing:  98%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▋  | 54/55 [01:27<00:01,  1.96s/it]DEBUG:root:Process 41639: Transcribing...
DEBUG:root:Process 41639: and a dramatis persona
Transcribing: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 55/55 [01:29<00:00,  1.63s/it]
DEBUG:root:Excluded 0 empty transcripts
DEBUG:root:Writing transcription log to file "data/test2/tlog.tlog"...
DEBUG:root:Loading script from data/test2/transcript.script...
Aligning:   0%|                                                                                                                                             | 0/1 [00:00<?, ?it/s]DEBUG:root:Loading transcription log from data/test2/tlog.tlog...
multiprocessing.pool.RemoteTraceback: 
"""
Traceback (most recent call last):
  File "/people/lerner/DSAlign/align/text.py", line 163, in ngrams
    raise StopIteration
StopIteration

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/people/lerner/anaconda3/lib/python3.7/multiprocessing/pool.py", line 121, in worker
    result = (True, func(*args, **kwds))
  File "/people/lerner/DSAlign/align/align.py", line 146, in align
    matched_fragments = list(filter(lambda f: f is not None, matched_fragments))
  File "/people/lerner/DSAlign/align/align.py", line 136, in split_match
    for f in split_match(fragments[0:index], start=start, end=match_start):
  File "/people/lerner/DSAlign/align/align.py", line 136, in split_match
    for f in split_match(fragments[0:index], start=start, end=match_start):
  File "/people/lerner/DSAlign/align/align.py", line 136, in split_match
    for f in split_match(fragments[0:index], start=start, end=match_start):
  File "/people/lerner/DSAlign/align/align.py", line 129, in split_match
    match = search.find_best(fragment['transcript'], start=start, end=end)
  File "/people/lerner/DSAlign/align/search.py", line 88, in find_best
    for i, ngram in enumerate(ngrams(' ' + look_for + ' ', 3)):
RuntimeError: generator raised StopIteration
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/people/lerner/DSAlign/align/align.py", line 653, in <module>
    main()
  File "/people/lerner/DSAlign/align/align.py", line 639, in main
    total=len(to_align)):
  File "/people/lerner/DSAlign/venv/lib/python3.7/site-packages/tqdm/std.py", line 1081, in __iter__
    for obj in iterable:
  File "/people/lerner/anaconda3/lib/python3.7/multiprocessing/pool.py", line 748, in next
    raise value
RuntimeError: generator raised StopIteration
Aligning:   0%|                                                                                                                                             | 0/1 [00:00<?, ?it/s]

Thanks in advance for your help, this tool looks very promising :)