GithubHelp home page GithubHelp logo

abhirooptalasila / autosub Goto Github PK

View Code? Open in Web Editor NEW
554.0 12.0 101.0 94 KB

A CLI script to generate subtitle files (SRT/VTT/TXT) for any video using either DeepSpeech or Coqui

License: MIT License

Python 96.17% Dockerfile 1.55% Shell 2.28%
speech-to-text ffmpeg sox deepspeech python asr mozilla-deepspeech autosub subtitle srt

autosub's People

Contributors

abhirooptalasila avatar k-shrey avatar kylemaas avatar milahu avatar nightscape avatar qlchan24 avatar sethfalco avatar shasheene avatar shravanshetty1 avatar vnq avatar xfim avatar yash-fn avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

autosub's Issues

Create vtt, srt, txt files by default (drop the --vtt option)

Currently AutoSub has the following command-line interface:

usage: main.py [-h] --file FILE [--vtt]

AutoSub

optional arguments:
  -h, --help   show this help message and exit
  --file FILE  Input video file
  --vtt        Output a vtt file with cue points for individual words instead
               of a srt file

The --vtt option is used to switch between outputting a SRT file to outputting VTT file.

But given it's the inference part that takes the bulk of the execution time and the output subtitle file format doesn't take much time and uses very little disk space then rerunning the entire inference again just to get a VTT file is inefficient. It makes more sense just to create the VTT and SRT by default.

Also, AutoSub is well-placed to output a transcript of the input at the same time. Indeed, I saw an AutoSub fork that did just that.

So, I suggest replacing --vtt with a --format option that the user can restrict the generated file formats, but by default it should create all formats: VTT, SRT, and a TXT transcript.

I am happy to do the work and make a Pull Request. Are you happy with this approach?

Generating caption for Spanish video

I tested this with a Spanish video about 1:30m long and it generated a seemingly English "transliterated" text from Spanish words instead of writing the Spanish words themselves.

Using the Spanish models from here: https://gitlab.com/Jaco-Assistant/deepspeech-polyglot#language-models-and-checkpoints

python3 autosub/main.py --model /home/cyberquarks/AutoSub/output_graph_es.pbmm --scorer /home/cyberquarks/AutoSub/kenlm_es.scorer --file /mnt/c/temp/test.mp4

Also, each sub line looks like this:

1
00:00:08,75 --> 00:01:30,30

So the subtitle covers the whole video when used with VLC

Docker build broken

docker build -t autosub . Cannot build and results in

Step 11/13 : RUN pip3 install --no-cache-dir -r requirements.txt
 ---> Running in bdf3fc44f538
Collecting cycler==0.10.0 (from -r requirements.txt (line 1))
  Downloading https://files.pythonhosted.org/packages/f7/d2/e07d3ebb2bd7af696440ce7e754c59dd546ffe1bbe732c8ab68b9c834e61/cycler-0.10.0-py2.py3-none-any.whl
Collecting numpy (from -r requirements.txt (line 2))
  Downloading https://files.pythonhosted.org/packages/45/b2/6c7545bb7a38754d63048c7696804a0d947328125d81bf12beaa692c3ae3/numpy-1.19.5-cp36-cp36m-manylinux1_x86_64.whl (13.4MB)
Collecting stt==1.0.0 (from -r requirements.txt (line 3))
  Could not find a version that satisfies the requirement stt==1.0.0 (from -r requirements.txt (line 3)) (from versions: 0.10.0a5, 0.10.0a6, 0.10.0a8, 0.10.0a9, 0.10.0a10)
No matching distribution found for stt==1.0.0 (from -r requirements.txt (line 3))
The command '/bin/sh -c pip3 install --no-cache-dir -r requirements.txt' returned a non-zero code: 1

How to install on Windows?

Hello could you let me know how to install and run your program on windows. I am on the step where I did "pip3 install -r requirements.txt" and I got the following error.

        ERROR: Cannot install -r requirements.txt (line 4) and numpy==1.18.1 because these package versions have conflicting 
        dependencies.
        
        The conflict is caused by:
            The user requested numpy==1.18.1
            deepspeech 0.8.2 depends on numpy<=1.17.0 and >=1.14.5
        
        To fix this you could try to:
        1. loosen the range of package versions you've specified
        2. remove package versions to allow pip attempt to solve the dependency conflict
        
        ERROR: ResolutionImpossible: for help visit https://pip.pypa.io/en/latest/user_guide/#fixing-conflicting-dependencies

Broken logging module import

Traceback (most recent call last):
  File "autosub/main.py", line 8, in <module>
    from . import logger
ImportError: cannot import name 'logger'

ImportError: attempted relative import with no known parent package

Hi, my config is Win10_x64, Python 3.8. When i execute $ C:/Soft/Autosub/sub/Scripts/python autosub/main.py --file D:/Work/video.mkv. It gives me error: Traceback (most recent call last): File "autosub/main.py", line 8, in <module> from . import logger
Info: `User@Computer MINGW64 /c/Soft/Autosub (master)
$ pip list
Package Version


absl-py 1.0.0
astunparse 1.6.3
cachetools 4.2.4
certifi 2021.10.8
charset-normalizer 2.0.12
cycler 0.10.0
deepspeech-gpu 0.9.3
distlib 0.3.4
ffmpeg 1.4
filelock 3.6.0
gast 0.3.3
google-auth 1.35.0
google-auth-oauthlib 0.4.6
google-pasta 0.2.0
grpcio 1.44.0
h5py 2.10.0
idna 3.3
importlib-metadata 4.11.3
joblib 0.16.0
Keras-Preprocessing 1.1.2
kiwisolver 1.2.0
Markdown 3.3.6
numpy 1.22.3
oauthlib 3.2.0
opt-einsum 3.3.0
pip 19.2.3
platformdirs 2.5.1
protobuf 3.19.4
pyasn1 0.4.8
pyasn1-modules 0.2.8
pydub 0.23.1
pyparsing 2.4.7
python-dateutil 2.8.1
requests 2.27.1
requests-oauthlib 1.3.1
rsa 4.8
scikit-learn 1.0.2
scipy 1.4.1
setuptools 41.2.0
six 1.15.0
stt 1.0.0
tensorboard 2.2.2
tensorboard-plugin-wit 1.8.1
tensorflow-gpu 2.2.0
tensorflow-gpu-estimator 2.2.0
termcolor 1.1.0
threadpoolctl 3.1.0
tqdm 4.44.1
urllib3 1.26.9
virtualenv 20.13.3
Werkzeug 2.0.3
wheel 0.37.1
wrapt 1.14.0
zipp 3.7.0
WARNING: You are using pip version 19.2.3, however version 22.0.4 is available.
You should consider upgrading via the 'python -m pip install --upgrade pip' command.`

Issue in running model

Command I run: python3 autosub/main.py --file video.mp4

[INFO] ARGS: Namespace(dry_run=False, engine='stt', file='video.mp4', format='srt', model=None, scorer=None, split_duration=5)
[INFO] Model: /media/ravneet/SSD2/TMN_Tasks/AutoSub/model.tflite
[INFO] Scorer: /media/ravneet/SSD2/TMN_Tasks/AutoSub/deepspeech-0.9.3-models.scorer
[INFO] Input file: video.mp4
[INFO] Extracted audio to audio/video.wav
[INFO] Splitting on silent parts in audio file
[INFO] Running inference...
TensorFlow: v2.3.0-6-g23ad988fcde
Coqui STT: v0.10.0-alpha.10-0-g9b517632
Warning: reading entire model file into memory. Transform model file into an mmapped graph to reduce heap usage.
2022-08-15 23:53:27.304189: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN)to use the following CPU instructions in performance-critical operations: AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
Data loss: Can't parse /media/ravneet/SSD2/TMN_Tasks/AutoSub/model.tflite as binary proto
[ERROR] Invalid model file

Problem using Chinese Acoustic models

Hey Abhiroop great work on the AutoSub. It works flawlessly with the English acoustic models. However I was trying to use the experimental Chinese acoustic models which have been released and faced some issues. I guess there are some issues with the encoding part of it when the subtitle file is being written. Can you please check ?
AutoSub Error
I have tried to change the encoding to utf-8 but it hasn't helped.

1

y

Here are the steps to reduce big chunks of text - SRT

Hi,

I did this manually but maybe someone can improve it and write some script for it?:

  1. Check text that is longer than 7 words

  2. Add a line break to each line longer than 7 words.

  3. Get the numbers of lines you got, ie:
    blah blah blah blah blah blah blah
    blah blah blah blah blah blah blah
    blah blah blah blah blah blah blah
    blah blah blah blah blah

There are 4 lines
4. We take the initial and final time

13 <<< SRT Subtitle position
ie: 00:00:25,90 --> 00:00:35,25

  1. We surplus them and divide them by the number of lines
    35,25 - 25,90 = 9,35
    4 lines of max 7 words each
    9,35/4 = 2,33

  2. We take 2.33 less 1 for each limit, Ie:

13 <<< SRT Subtitle position
00:00:25,90 --> 00:00:28,23
blah blah blah blah blah blah blah

14
00:00:28,24 --> 00:00:30,57
blah blah blah blah blah blah blah

15
00:00:30,58 --> 00:00:32,91
blah blah blah blah blah blah blah

16
00:00:32,92 --> 00:00:35,25
blah blah blah blah blah

14 <<<< WARNING >>>> update all the other numbers in this case to 17 (check step 7 below)
blah blah blah blah

  1. We need to update the SRT Subtitle position, in this case we finished at 16, so we need to replace 14 for 17, and do the same for all the other numbers. Note: Update from top to bottom so the counter increments.

That's it.

Anyone? :)

No matching distribution found for stt==1.0.0

When I install the package, I'm getting

> pip install -r requirements.txt
Collecting cycler==0.10.0
  Using cached cycler-0.10.0-py2.py3-none-any.whl (6.5 kB)
Collecting numpy
  Using cached numpy-1.22.2.zip (11.4 MB)
  Installing build dependencies ... done
  Getting requirements to build wheel ... done
  Preparing metadata (pyproject.toml) ... done
ERROR: Could not find a version that satisfies the requirement stt==1.0.0 (from versions: none)
ERROR: No matching distribution found for stt==1.0.0

UPDATE
Looks like an issue with stt

> pip install stt==1.2.0
ERROR: Could not find a version that satisfies the requirement stt==1.2.0 (from versions: none)
ERROR: No matching distribution found for stt==1.2.0

Illegal instruction when trying to perform an operation on a mp4 file

I'm getting an error when trying to perform autosub on a mp4 file:

python3 main.py --model "/media/segundohd/app_repo/autosub_deepspeech_mozilla/AutoSub/autosub/deepspeech-0.8.2-models.pbmm" --scorer "/media/segundohd/app_repo/autosub_deepspeech_mozilla/AutoSub/autosub/deepspeech-0.8.2-models.scorer" --file "test.mp4"

The error message:
Illegal instruction (core dumped)

I think my computer is too old. I have an Intel integrated graphics card. Maybe that could be the reason? I'm buying a new video card next month and I'll try to use it again if that's the problem.
If it works, I'll consider making a kivy GUI for it.

doesnt even run

keeps getting

Extracted audio to audio/input.wav
Splitting on silent parts in audio file
Traceback (most recent call last):
  File "./autosub/main.py", line 130, in <module>
    main()
  File "./autosub/main.py", line 112, in main
    silenceRemoval(audio_file_name)
  File "/Users/kelvin/Downloads/AutoSub-master/autosub/segmentAudio.py", line 194, in silenceRemoval
    raise Exception("Input audio file not found!")
Exception: Input audio file not found!

Cue points for individual words

YouTube's latest speech recognition creates cue points for individual words which are made visible in the moment, they are spoken.

YT is transmitting their subtitles in a format that looks like this (which I do not recognize):

{
  "wireMagic": "pb3",
  "pens": [ {
  
  } ],
  "wsWinStyles": [ {
  
  }, {
    "mhModeHint": 2,
    "juJustifCode": 0,
    "sdScrollDir": 3
  } ],
  "wpWinPositions": [ {
  
  }, {
    "apPoint": 6,
    "ahHorPos": 20,
    "avVerPos": 100,
    "rcRows": 2,
    "ccCols": 40
  } ],
  "events": [ {
    "tStartMs": 0,
    "dDurationMs": 2795440,
    "id": 1,
    "wpWinPosId": 1,
    "wsWinStyleId": 1
  }, {
    "tStartMs": 80,
    "dDurationMs": 3119,
    "wWinId": 1,
    "segs": [ {
      "utf8": "hey",
      "acAsrConf": 255
    }, {
      "utf8": " everybody",
      "tOffsetMs": 160,
      "acAsrConf": 255
    }, {
      "utf8": " how's",
      "tOffsetMs": 480,
      "acAsrConf": 255

However, VTT subtitle format also supports cues for individual words (c.f. Karaoke style text), although not yet supported natively by the Firefox I tested in.

Timing information is available through DeepSpeeche's sttWithMetadata() and is easily transformed from character timings to word timings using their client.py.

The request is to have an output file that allows us to do it like YT: Display individual words as they are spoken.

Some words are missing

Hi, thanks for the great project!

I have a problem with some words that are missing in the transcript.
But if I just do transcribe the same audio using only deepspeech project (not autosub with ds engine), there are no missing words.

Are there any tweaks that can be done by parameter? or is it because silent segment removal process?

Here is the txt output from autosub with ds engine

biggest . 

people make when larry english and probably one of the most common miss. 

people think that they. 





don't study. 

live in. 

an out let me explain what i . 

one does studying men and how do people usually approach this pro. 

and how do people. 

And here is the deepspeech output.

the biggest mistake people make when morning english and probably one of the most common misconceptions is that people think that they need to study english and usedn't study english live english an outlet explain what i mean one does studying men and how do people 

As you can see, some words are missing on autosub output.

I am using same deepspeech 0.9.3 version and model both on autosub and deepspeech.

Add a Dockerfile?

Would it be feasible to create a Dockerfile and put this Docker Hub for more convenient usage?

This would allow users, or even developers to simply pull and start using AutoSub immediately without worrying about runtime, dependencies, and less about configuration.

Amazing! Although the results are not that accurate.

Amazing! Although the results are not that accurate.

How can I improve the accuracy? Many thanks.

https://www.youtube.com/watch?v=TOQwUISm6fw

1
00:00:00,10 --> 00:00:00,95
one

2
00:00:01,55 --> 00:00:07,10
this visconti to learn how to go at the vices

3
00:00:08,20 --> 00:00:14,80
who to open that automatic aridius in

4
00:00:14,95 --> 00:00:21,50
and this idea will guide us how to add indicator automatical

5
00:00:21,95 --> 00:00:26,50
and we will go and newton

6
00:00:26,90 --> 00:00:34,0:
and we were used in piraguas moreale indication

7
00:00:34,40 --> 00:00:43,45
starting first of all we need to add one more premature

8
00:00:44,25 --> 00:00:53,80
the primates named yes bury in the morning added coelesti special the period wooing happy

9
00:00:54,15 --> 00:00:59,70
and we will get the primitive party by using dislike

10
00:01:02,60 --> 00:01:07,25
and it the new eenamost cat indicated handle

11
00:01:08,25 --> 00:01:10,80
to get hard

12
00:01:12,25 --> 00:01:15,15
i remember very honour

13
00:01:15,75 --> 00:01:20,0:
if the specific indicator doesn't because

14
00:01:20,75 --> 00:01:30,15
then his deep we create a new one force and then returned the handle of the new indicator

15
00:01:30,50 --> 00:01:31,90
if

16
00:01:32,40 --> 00:01:43,50
the specific indicator has existed then this peril return the handle of the specifically indicate

17
00:01:46,30 --> 00:01:55,50
and handles into idea holes to hide in fine the chatterment

18
00:01:55,70 --> 00:01:59,30
so hand is very important information

19
00:01:59,55 --> 00:02:05,70
we will stop the return better to this very

20
00:02:06,40 --> 00:02:17,80
don't forget to go estimable after wives lorrainese how it works

21
00:02:18,10 --> 00:02:19,40
by ronald

22
00:02:24,20 --> 00:02:26,35
we just need to copy

.............

[Feature] GPU support

Love the project, would it be too difficult to add gpu support? Planning on using this on production!

Malformed SRT/VTT file (extra colon characters)

Thanks for the promising program. I really believe in this work, so I will become an active contributor.

The Python code for subtitles ending with the millisecond section containing zeroes is buggy. I have fixed this (see associated commit), but for the sake of completeness here is a description of the issue.

Here's a excerpt of some generated SRT output that can't load in VLC, mpv and other programs

3
00:00:12,95 --> 00:00:14,0:
but you but there

4
00:00:14,60 --> 00:00:15,30
the

And the same excerpt from the VTT output:

00:00:12.95 --> 00:00:14.0:  align:start position:0%
but you but there
<c> but</c><0:00:13.330000><c> you</c><0:00:13.450000><c> but</c><0:00:13.670000><c> there</c>

00:00:14.60 --> 00:00:15.30  align:start position:0%
the
<c> the</c>

The root cause is a broken try/except block in the source code which I have fixed.

I should note that users with sed installed can run commands like sed -i 's_: -->_ -->_g' filename.srt to repair the issue, but my attached fix will prevent the issue occurring in the first place.

Cannot import logger

docker run --volume=`pwd`input:/input --name autosub autosub --file /input/video.mp4 encounter an error of

Traceback (most recent call last):
File "autosub/main.py", line 8, in
from . import logger
ImportError: cannot import name 'logger'

fix imports to autosub module

#! /bin/sh
sed -i 's/import logger/from . import logger/' autosub/main.py autosub/utils.py autosub/audioProcessing.py autosub/segmentAudio.py
sed -i 's/from utils import \*/from .utils import */' autosub/main.py
sed -i 's/from writeToFile import write_to_file/from .writeToFile import write_to_file/' autosub/main.py
sed -i 's/from audioProcessing import extract_audio/from .audioProcessing import extract_audio/' autosub/main.py
sed -i 's/from segmentAudio import remove_silent_segments/from .segmentAudio import remove_silent_segments/' autosub/main.py
sed -i 's/import trainAudio as TA/from . import trainAudio as TA/' autosub/segmentAudio.py
sed -i 's/import featureExtraction as FE/from . import featureExtraction as FE/' autosub/segmentAudio.py

error was

ModuleNotFoundError: No module named 'logger'
ModuleNotFoundError: No module named 'utils'
ModuleNotFoundError: No module named 'writeToFile'
ModuleNotFoundError: No module named 'audioProcessing'
ModuleNotFoundError: No module named 'segmentAudio'
ModuleNotFoundError: No module named 'trainAudio'
ModuleNotFoundError: No module named 'featureExtraction'

stream not stack = write result to disk more often

write result to disk more often, not just once at the end of the process

example code snippet from my srtgen

output_file_path = None
output_file_handle = None

# output goes to stdout and file
def out(*args, **kwargs):
    print(*args, **kwargs)
    if output_file_handle:
        log(f"writing to {output_file_path}")
        kwargs["file"] = output_file_handle
        print(*args, **kwargs)
        output_file_handle.flush()
    else:
        log("not writing to output_file_path") # this should not happen

def transcribe_file(input_video_path):
    """Transcribe the given video file."""

    global output_file_path
    global output_file_handle

    output_file_path = os.path.join(tempdir, "output_file.srt")
    output_file_handle = open(output_file_path, "w")

    for ...
      out("... result ...")

this could also allow to pause and continue the process

Older CPU's get Illegal instruction with Deepspeech binary due to missing AVX instructions, I have recompiled and replaced the binary..

So, long standing issue/non issue due to upstream (tensorflow) decisions when packaging the binaries to require AVX extensions. Anyways, 2 hours later I have recompiled DeepSpeech and have verified it working on my machine.

Buut, even after replacing the binary AutoSub/sub/bin/deepspeech I still get the illegal instruction.

Where exactly can I replace the binary to use the one I compiled that will work on my cpu? Thank you.

flac not wav

pro: flac needs less disk space
con: wav is easier to process

-[INFO] Extracted audio to audio/video-file-name.wav
+[INFO] Extracted audio to audio/video-file-name.flac

to read flac files, we can use pydub library

Create temporary files in temporary directories

The /audio directory is never cleaned up automatically. However, this is temporary data, and should end in a TemporaryDirectory. Failure to clean the dir can accumulate huge amounts of uncompressed audio data as well raise issues when re-running the tool with a file carrying the same name as used before: ffmpeg prompt for override and in the end, this tool tries to feed sound data from previous runs into DeepSpeech and erroring out:

(sub) user@machine:~/git/AutoSub$ python3 autosub/main.py --model ~/deepspeech/de/output_graph.pbmm --scorer ~/deepspeech/de/kenlm.scorer --file ~/deepspeech/de/geteiltes_polen.wav
AutoSub v0.1

TensorFlow: v2.3.0-6-g23ad988
DeepSpeech: v0.9.3-0-gf2e9c85
2021-01-31 15:59:44.865056: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN)to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.

Input file: /home/ajcay/deepspeech/de/geteiltes_polen.wav
Guessed Channel Layout for Input Stream #0.0 : mono
File '/home/ajcay/git/AutoSub/audio/geteiltes_polen.wav' already exists. Overwrite ? [y/N] y
Extracted audio to audio/geteiltes_polen.wav
Splitting on silent parts in audio file

Running inference:
 85%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▋                        | 82/96 [03:23<00:27,  1.93s/it]Traceback (most recent call last):
  File "autosub/main.py", line 180, in <module>
    main()
  File "autosub/main.py", line 174, in main
    ds_process_audio(ds, audio_segment_path, file_handle)
  File "autosub/main.py", line 117, in ds_process_audio
    write_to_file(file_handle, infered_text, line_count, limits)
  File "/home/ajcay/git/AutoSub/autosub/writeToFile.py", line 18, in write_to_file
    d = str(datetime.timedelta(seconds=float(limits[0])))
ValueError: could not convert string to float: 'hoerfilm16k'
 85%|████████████████████████████████████████

I dont get it

after runing pip3 install -r requirements.txt

sub) 1sm23@liushimeHacmini AutoSub % pip3 install -r requirements.txt
Requirement already satisfied: cycler==0.10.0 in ./sub/lib/python3.8/site-packages (from -r requirements.txt (line 1)) (0.10.0)
Requirement already satisfied: Cython==0.29.21 in ./sub/lib/python3.8/site-packages (from -r requirements.txt (line 2)) (0.29.21)
Collecting numpy==1.18.1
  Using cached numpy-1.18.1-cp38-cp38-macosx_10_9_x86_64.whl (15.2 MB)
Requirement already satisfied: deepspeech==0.8.2 in ./sub/lib/python3.8/site-packages (from -r requirements.txt (line 4)) (0.8.2)
Requirement already satisfied: joblib==0.16.0 in ./sub/lib/python3.8/site-packages (from -r requirements.txt (line 5)) (0.16.0)
Requirement already satisfied: kiwisolver==1.2.0 in ./sub/lib/python3.8/site-packages (from -r requirements.txt (line 6)) (1.2.0)
Requirement already satisfied: pydub==0.23.1 in ./sub/lib/python3.8/site-packages (from -r requirements.txt (line 7)) (0.23.1)
Requirement already satisfied: pyparsing==2.4.7 in ./sub/lib/python3.8/site-packages (from -r requirements.txt (line 8)) (2.4.7)
Requirement already satisfied: python-dateutil==2.8.1 in ./sub/lib/python3.8/site-packages (from -r requirements.txt (line 9)) (2.8.1)
Collecting scikit-learn==0.21.3
  Using cached scikit-learn-0.21.3.tar.gz (12.2 MB)
Requirement already satisfied: scipy==1.4.1 in ./sub/lib/python3.8/site-packages (from -r requirements.txt (line 11)) (1.4.1)
Requirement already satisfied: six==1.15.0 in ./sub/lib/python3.8/site-packages (from -r requirements.txt (line 12)) (1.15.0)
Collecting tqdm==4.44.1
  Using cached tqdm-4.44.1-py2.py3-none-any.whl (60 kB)
Using legacy 'setup.py install' for scikit-learn, since package 'wheel' is not installed.
Installing collected packages: numpy, scikit-learn, tqdm
  Attempting uninstall: numpy
    Found existing installation: numpy 1.17.3
    Uninstalling numpy-1.17.3:
      Successfully uninstalled numpy-1.17.3
    Running setup.py install for scikit-learn ... error
    ERROR: Command errored out with exit status 1:
     command: /Users/1sm23/Documents/GitHub.nosync/AutoSub/sub/bin/python3 -u -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/private/var/folders/0c/hhcsy8ws0dv5rbfl9w3kt9940000gp/T/pip-install-xijjs7wh/scikit-learn/setup.py'"'"'; __file__='"'"'/private/var/folders/0c/hhcsy8ws0dv5rbfl9w3kt9940000gp/T/pip-install-xijjs7wh/scikit-learn/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' install --record /private/var/folders/0c/hhcsy8ws0dv5rbfl9w3kt9940000gp/T/pip-record-tolv31lh/install-record.txt --single-version-externally-managed --compile --install-headers /Users/1sm23/Documents/GitHub.nosync/AutoSub/sub/include/site/python3.8/scikit-learn
         cwd: /private/var/folders/0c/hhcsy8ws0dv5rbfl9w3kt9940000gp/T/pip-install-xijjs7wh/scikit-learn/
    Complete output (51 lines):
    Partial import of sklearn during the build process.
    C compiler: xcrun -sdk macosx clang -Wno-unused-result -Wsign-compare -Wunreachable-code -fno-common -dynamic -DNDEBUG -g -fwrapv -O3 -Wall -iwithsysroot/System/Library/Frameworks/System.framework/PrivateHeaders -iwithsysroot/Applications/Xcode.app/Contents/Developer/Library/Frameworks/Python3.framework/Versions/3.8/Headers -arch arm64 -arch x86_64
    
    compile options: '-c'
    extra options: '-fopenmp'
    xcrun: test_openmp.c
    clang: error: unsupported option '-fopenmp'
    clang: error: unsupported option '-fopenmp'
    Traceback (most recent call last):
      File "<string>", line 1, in <module>
      File "/private/var/folders/0c/hhcsy8ws0dv5rbfl9w3kt9940000gp/T/pip-install-xijjs7wh/scikit-learn/setup.py", line 290, in <module>
        setup_package()
      File "/private/var/folders/0c/hhcsy8ws0dv5rbfl9w3kt9940000gp/T/pip-install-xijjs7wh/scikit-learn/setup.py", line 286, in setup_package
        setup(**metadata)
      File "/Users/1sm23/Documents/GitHub.nosync/AutoSub/sub/lib/python3.8/site-packages/numpy/distutils/core.py", line 137, in setup
        config = configuration()
      File "/private/var/folders/0c/hhcsy8ws0dv5rbfl9w3kt9940000gp/T/pip-install-xijjs7wh/scikit-learn/setup.py", line 174, in configuration
        config.add_subpackage('sklearn')
      File "/Users/1sm23/Documents/GitHub.nosync/AutoSub/sub/lib/python3.8/site-packages/numpy/distutils/misc_util.py", line 1033, in add_subpackage
        config_list = self.get_subpackage(subpackage_name, subpackage_path,
      File "/Users/1sm23/Documents/GitHub.nosync/AutoSub/sub/lib/python3.8/site-packages/numpy/distutils/misc_util.py", line 999, in get_subpackage
        config = self._get_configuration_from_setup_py(
      File "/Users/1sm23/Documents/GitHub.nosync/AutoSub/sub/lib/python3.8/site-packages/numpy/distutils/misc_util.py", line 941, in _get_configuration_from_setup_py
        config = setup_module.configuration(*args)
      File "sklearn/setup.py", line 76, in configuration
        maybe_cythonize_extensions(top_path, config)
      File "/private/var/folders/0c/hhcsy8ws0dv5rbfl9w3kt9940000gp/T/pip-install-xijjs7wh/scikit-learn/sklearn/_build_utils/__init__.py", line 42, in maybe_cythonize_extensions
        with_openmp = check_openmp_support()
      File "/private/var/folders/0c/hhcsy8ws0dv5rbfl9w3kt9940000gp/T/pip-install-xijjs7wh/scikit-learn/sklearn/_build_utils/openmp_helpers.py", line 140, in check_openmp_support
        raise CompileError(err_message)
    distutils.errors.CompileError:
                        ***
    
    It seems that scikit-learn cannot be built with OpenMP support.
    
    - Make sure you have followed the installation instructions:
    
        https://scikit-learn.org/dev/developers/advanced_installation.html
    
    - If your compiler supports OpenMP but the build still fails, please
      submit a bug report at:
    
        https://github.com/scikit-learn/scikit-learn/issues
    
    - If you want to build scikit-learn without OpenMP support, you can set
      the environment variable SKLEARN_NO_OPENMP and rerun the build
      command. Note however that some estimators will run in sequential
      mode and their `n_jobs` parameter will have no effect anymore.
    
                        ***
    
    ----------------------------------------
ERROR: Command errored out with exit status 1: /Users/1sm23/Documents/GitHub.nosync/AutoSub/sub/bin/python3 -u -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/private/var/folders/0c/hhcsy8ws0dv5rbfl9w3kt9940000gp/T/pip-install-xijjs7wh/scikit-learn/setup.py'"'"'; __file__='"'"'/private/var/folders/0c/hhcsy8ws0dv5rbfl9w3kt9940000gp/T/pip-install-xijjs7wh/scikit-learn/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' install --record /private/var/folders/0c/hhcsy8ws0dv5rbfl9w3kt9940000gp/T/pip-record-tolv31lh/install-record.txt --single-version-externally-managed --compile --install-headers /Users/1sm23/Documents/GitHub.nosync/AutoSub/sub/include/site/python3.8/scikit-learn Check the logs for full command output.

i think i have completely install it but i need help .i am very...

How-to example

Make sure the model and scorer files are in the root directory. They are automatically loaded
After following the installation instructions, you can run autosub/main.py as given below. The --file argument is the video file for which SRT file is to be generated

$ python3 autosub/main.py --file ~/movie.mp4

1 i dont understand what is root directory ? it is *****/autosub ?? or somewhere else ?

2 after i download deepspeech and i think i finally can install this program but i got so many error .

its hard to explain but it lookalike this program try to sub your tut document.

what i am worng here.

'Failed to initialize memory mapped mode

speech/init.py", line 38, in init
raise RuntimeError("CreateModel failed with '{}' (0x{:X})".format(deepspeech.impl.ErrorCodeToErrorMessage(status),status))
RuntimeError: CreateModel failed with 'Failed to initialize memory mapped model.' (0x3000)

processing mandarin Chinese video error

Hi Dear author:

When I process audio to srt with mandarin Chinese video, the following error occurred:
First time with same video:

Traceback (most recent call last):
File "C:\Python\AutoSub\autosub\main.py", line 139, in
main()
File "C:\Python\AutoSub\autosub\main.py", line 129, in main
ds_process_audio(ds, audio_segment_path, file_handle)
File "C:\Python\AutoSub\autosub\main.py", line 68, in ds_process_audio
write_to_file(file_handle, infered_text, line_count, limits)
File "C:\Python\AutoSub\autosub\writeToFile.py", line 24, in write_to_file
d = str(datetime.timedelta(seconds=float(limits[1])))
IndexError: list index out of range

second time with same video:

Traceback (most recent call last):
File "C:\Python\AutoSub\autosub\main.py", line 139, in
main()
File "C:\Python\AutoSub\autosub\main.py", line 129, in main
ds_process_audio(ds, audio_segment_path, file_handle)
File "C:\Python\AutoSub\autosub\main.py", line 68, in ds_process_audio
write_to_file(file_handle, infered_text, line_count, limits)
File "C:\Python\AutoSub\autosub\writeToFile.py", line 32, in write_to_file
file_handle.write(inferred_text + "\n\n")
UnicodeEncodeError: 'gbk' codec can't encode character '\udce4' in position 18: illegal multibyte sequence

It’s well done with en video, I can get srt.

system: ubuntu 20.04
python: 3.7.1 3.8.8 and 3.9.4

Thank you very much for your reply.

Problem running on Mac OS X Moterey

I installed everything using commit 0d38535a7511d81a126dcd33e4b8e0922585b011 and created a virtualenv project as suggested. When I tried to run autosub/main.py I got:

Traceback (most recent call last):
  File "/Users/user/python virtualenvs/AutoSub/AutoSub/autosub/main.py", line 8, in <module>
	from . import logger
ImportError: attempted relative import with no known parent package

I changed from . import logger to import logger and got:
Traceback (most recent call last):
File "/Users/user/python virtualenvs/AutoSub/AutoSub/autosub/main.py", line 11, in
import numpy as np
ModuleNotFoundError: No module named 'numpy'

I deactivated the virtualenv project and tried to install numpy––python3 -m pip install --user numpy––and got:

Requirement already satisfied: numpy in /usr/local/lib/python3.10/site-packages (1.23.2)

I verified that numpy is in that directory.

At this point there is no point in continuing. I see there are instructions for building and running autosub using Docker. This is not an option for me so how can I procede in OS X?

Is it impossible to recognize in another language?

Is it impossible to recognize in another language?

I hope the caption file comes out in Japanese.

I put a video of the conversation in Japanese and it came out in English, is there a way to modify it?

or.. is there any japaense model files?

Error on installing

Hi my friend. I'm getting the following error when installing. I think you're missing Cython package in your requirements.

ERROR: Command errored out with exit status 1:
     command: /media/segundohd/app_repo/autosub_deepspeech_mozilla/AutoSub/sub/bin/python3 -u -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-install-91ax7yu0/scikit-learn/setup.py'"'"'; __file__='"'"'/tmp/pip-install-91ax7yu0/scikit-learn/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' install --record /tmp/pip-record-jddk896v/install-record.txt --single-version-externally-managed --compile --install-headers /media/segundohd/app_repo/autosub_deepspeech_mozilla/AutoSub/sub/include/site/python3.8/scikit-learn
         cwd: /tmp/pip-install-91ax7yu0/scikit-learn/
    Complete output (28 lines):
    Partial import of sklearn during the build process.
    Traceback (most recent call last):
      File "<string>", line 1, in <module>
      File "/tmp/pip-install-91ax7yu0/scikit-learn/setup.py", line 290, in <module>
        setup_package()
      File "/tmp/pip-install-91ax7yu0/scikit-learn/setup.py", line 286, in setup_package
        setup(**metadata)
      File "/media/segundohd/app_repo/autosub_deepspeech_mozilla/AutoSub/sub/lib/python3.8/site-packages/numpy/distutils/core.py", line 137, in setup
        config = configuration()
      File "/tmp/pip-install-91ax7yu0/scikit-learn/setup.py", line 174, in configuration
        config.add_subpackage('sklearn')
      File "/media/segundohd/app_repo/autosub_deepspeech_mozilla/AutoSub/sub/lib/python3.8/site-packages/numpy/distutils/misc_util.py", line 1033, in add_subpackage
        config_list = self.get_subpackage(subpackage_name, subpackage_path,
      File "/media/segundohd/app_repo/autosub_deepspeech_mozilla/AutoSub/sub/lib/python3.8/site-packages/numpy/distutils/misc_util.py", line 999, in get_subpackage
        config = self._get_configuration_from_setup_py(
      File "/media/segundohd/app_repo/autosub_deepspeech_mozilla/AutoSub/sub/lib/python3.8/site-packages/numpy/distutils/misc_util.py", line 941, in _get_configuration_from_setup_py
        config = setup_module.configuration(*args)
      File "sklearn/setup.py", line 62, in configuration
        config.add_subpackage('utils')
      File "/media/segundohd/app_repo/autosub_deepspeech_mozilla/AutoSub/sub/lib/python3.8/site-packages/numpy/distutils/misc_util.py", line 1033, in add_subpackage
        config_list = self.get_subpackage(subpackage_name, subpackage_path,
      File "/media/segundohd/app_repo/autosub_deepspeech_mozilla/AutoSub/sub/lib/python3.8/site-packages/numpy/distutils/misc_util.py", line 999, in get_subpackage
        config = self._get_configuration_from_setup_py(
      File "/media/segundohd/app_repo/autosub_deepspeech_mozilla/AutoSub/sub/lib/python3.8/site-packages/numpy/distutils/misc_util.py", line 941, in _get_configuration_from_setup_py
        config = setup_module.configuration(*args)
      File "sklearn/utils/setup.py", line 8, in configuration
        from Cython import Tempita
    ModuleNotFoundError: No module named 'Cython'
    ----------------------------------------
ERROR: Command errored out with exit status 1: /media/segundohd/app_repo/autosub_deepspeech_mozilla/AutoSub/sub/bin/python3 -u -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-install-91ax7yu0/scikit-learn/setup.py'"'"'; __file__='"'"'/tmp/pip-install-91ax7yu0/scikit-learn/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' install --record /tmp/pip-record-jddk896v/install-record.txt --single-version-externally-managed --compile --install-headers /media/segundohd/app_repo/autosub_deepspeech_mozilla/AutoSub/sub/include/site/python3.8/scikit-learn Check the logs for full command output.

.tflite files support

After the mozilla layoffs, the deepspeech team forked the Deepspeech repo and founded the company Coqui AI (https://github.com/coqui-ai/STT) where they continue the development and AFAIK they now only allow .tflite files to export models. It theoretically should work with the old code, but for me it didn't.

When I try to run it like this:

python3 autosub/main.py --file /Users/sgrotz/Downloads/kp193-hejma-auxtomatigo.mp3 --split-duration 8

with a .tflite file in the main folder and NO language model.

Then I get:

AutoSub

['autosub/main.py', '--file', '/Users/sgrotz/Downloads/kp193-hejma-auxtomatigo.mp3', '--split-duration', '8']
ARGS: Namespace(dry_run=False, file='/Users/sgrotz/Downloads/kp193-hejma-auxtomatigo.mp3', format=['srt', 'vtt', 'txt'], model=None, scorer=None, split_duration=8.0)
Warning no models specified via --model and none found in local directory. Please run getmodel.sh convenience script from autosub repo to get some.
Error: Must have pbmm model. Exiting

Have I done anything wrong here or doesn't AutoSub support .rflite files?

I tested it on MacOS and installed ffmpeg via homebrew.

Split overly long transcript segments

Currently it's easily possible to receive cues with 340 characters or more. Amara.org suggests a maximum of 42 per line.

DeepSpeech provides timing data for each individual character, therefore the word duration and start can be calculated (see client.py) and a clean split is possible (possibly breaking the sentence but still better than 340 chars and better than having to support different grammars.)

Here is what their sample script already provides:

{
  "transcripts": [
    {
      "confidence": -44.99164581298828,
      "words": [
        {
          "word": "ja",
          "start_time": 0.36,
          "duration": 0.3
        },
        {
          "word": "meine",
          "start_time": 0.7,
          "duration": 0.22
        },
        {
          "word": "sehr",
          "start_time": 0.96,
          "duration": 0.14
        },
        {
          "word": "verehrten",
          "start_time": 1.12,
          "duration": 0.32
        },
        {
          "word": "damen",
          "start_time": 1.48,
          "duration": 0.22
        },
        {
          "word": "und",
          "start_time": 1.74,
          "duration": 0.12
        },
        {
          "word": "herren",
          "start_time": 1.9,
          "duration": 0.5
        },
        {
          "word": "liebe",
          "start_time": 2.48,
          "duration": 0.32
        },
        {
          "word": "frau",
          "start_time": 2.86,
          "duration": 0.48
        },
        {
          "word": "versorgen",
          "start_time": 3.38,
          "duration": 0.42
        }
      ]
    },
    {
      "confidence": -45.583431243896484,
      "words": [
        {
          "word": "ja",
          "start_time": 0.36,
          "duration": 0.3
        },

The idea is to find a good point for splitting, which can be a weighted decision: the closer we come to the desired splitting point, the less time between words is required for a split, if a split is required/ desired.

Creating GitHub Organization and possibly changing project name

Ideally this project would be developed under a GitHub organization (but it's definitely not required).

The Autosub organization (https://github.com/autosub) namespace is already taken, and as you may know there's an unrelated (abandoned) project called "autosub" that uses the same name: https://github.com/agermanidis/autosub

Ideally you'd get that organization namespace but I don't think it's possible given there's no contact information.

How about renaming this project "AutoSubs" (or "AutoSubtitler") and registering https://github.com/AutoSubs (or https://github.com/AutoSubtitler)? I prefer "AutoSubs". Another option would be to keep the name "AutoSub" but register a different organization.

I think the GitHub organization domain would help with the growth of the project. But again, the change is definitely not required.

(Feel free to close this issue as WONTFIX, if you'd like)

Broken Docker build

$ docker build -t autosub .
Sending build context to Docker daemon  113.2kB
Step 1/13 : ARG BASEIMAGE=ubuntu:18.04
Step 2/13 : FROM ${BASEIMAGE}
 ---> b67d6ac264e4
Step 3/13 : ARG DEPSLIST=requirements.txt
 ---> Using cache
 ---> 0ae6d0d02403
Step 4/13 : ENV PYTHONUNBUFFERED 1
 ---> Using cache
 ---> 63c984eb9ae5
Step 5/13 : RUN DEBIAN_FRONTEND=noninteractive apt update &&     apt -y install ffmpeg libsm6 libxext6 python3 python3-pip &&     apt -y clean && 	rm -rf /var/lib/apt/lists/*
 ---> Using cache
 ---> 7e2214cd96b4
Step 6/13 : COPY $DEPSLIST ./requirements.txt
 ---> Using cache
 ---> 3f437c1a2f3c
Step 7/13 : RUN pip3 install --no-cache-dir -r requirements.txt
 ---> Running in 37765d138851
Collecting cycler==0.10.0 (from -r requirements.txt (line 1))
  Downloading https://files.pythonhosted.org/packages/f7/d2/e07d3ebb2bd7af696440ce7e754c59dd546ffe1bbe732c8ab68b9c834e61/cycler-0.10.0-py2.py3-none-any.whl
Collecting numpy (from -r requirements.txt (line 2))
  Downloading https://files.pythonhosted.org/packages/45/b2/6c7545bb7a38754d63048c7696804a0d947328125d81bf12beaa692c3ae3/numpy-1.19.5-cp36-cp36m-manylinux1_x86_64.whl (13.4MB)
Collecting stt==1.0.0 (from -r requirements.txt (line 3))
  Could not find a version that satisfies the requirement stt==1.0.0 (from -r requirements.txt (line 3)) (from versions: 0.10.0a5, 0.10.0a6, 0.10.0a8, 0.10.0a9, 0.10.0a10)
No matching distribution found for stt==1.0.0 (from -r requirements.txt (line 3))
The command '/bin/sh -c pip3 install --no-cache-dir -r requirements.txt' returned a non-zero code: 1```

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.