abhirooptalasila / autosub Goto Github PK

View Code? Open in Web Editor NEW

554.0 12.0 101.0 94 KB

A CLI script to generate subtitle files (SRT/VTT/TXT) for any video using either DeepSpeech or Coqui

License: MIT License

Python 96.17% Dockerfile 1.55% Shell 2.28%

speech-to-text ffmpeg sox deepspeech python asr mozilla-deepspeech autosub subtitle srt

autosub's People

Contributors

Stargazers

Watchers

Forkers

so-ai-love silexxx krzemienski wenl2 xinj3 ajithdesilva poveteen manish-2014 cx3 sumitcoder1 jiltseb vernitgarg cnithin951 guozanhua vnq felix-itz ahmadnaruto clu-ling xfim derrickgibbs1 yaojingzhe panzuanxin phongth7 qlchan24 universityofprofessorex karthikindia makgie mrwalter davidhefan nightscape qyouqme novice-koder jaytula 9mido giaiaothoisu yash-fn sethfalco shasheene trendingtechnology c00renut minh130196 greenpeer gwnudt lllkkk999 mainians bluescale007 skalermo theeging aryailia wanghaisheng icodein ainy0315 hanjoyo david-thul ilikecola ibndias milahu taodaling yuiffy timurguseynov wyh-sg insad-video derekcrosson zafar142007 jasongdove exodus454 jamespacileo wlmiag k-shrey 44vogan ocheops tangyiyong voho0000 kylemaas sabestudo 5l1v3r1 jasleen101010 sabestudobr raryelcostasouza k-rishabh6172 kukupigs differentiablepizza leftomelas thethao133 juvarabrera joyamon xiaoyingv fredthebigcactus hal9000com adityabotsex tehong310 vaimalaviya1233 crosswordapp backhamd cherub0526 eddiestr ekakit

autosub's Issues

multi core

autosub is using only 100% cpu when it should use 400% on a quad core cpu

the task should be easy to do in parallel, by splitting the audio into N segments for N cpu cores

https://stackoverflow.com/questions/4047789/parallel-file-parsing-multiple-cpu-cores
- multiprocessing.Pool
- ray for distributed computing

Create vtt, srt, txt files by default (drop the --vtt option)

Currently AutoSub has the following command-line interface:

usage: main.py [-h] --file FILE [--vtt]

AutoSub

optional arguments:
  -h, --help   show this help message and exit
  --file FILE  Input video file
  --vtt        Output a vtt file with cue points for individual words instead
               of a srt file

The --vtt option is used to switch between outputting a SRT file to outputting VTT file.

But given it's the inference part that takes the bulk of the execution time and the output subtitle file format doesn't take much time and uses very little disk space then rerunning the entire inference again just to get a VTT file is inefficient. It makes more sense just to create the VTT and SRT by default.

Also, AutoSub is well-placed to output a transcript of the input at the same time. Indeed, I saw an AutoSub fork that did just that.

So, I suggest replacing --vtt with a --format option that the user can restrict the generated file formats, but by default it should create all formats: VTT, SRT, and a TXT transcript.

I am happy to do the work and make a Pull Request. Are you happy with this approach?

Generating caption for Spanish video

I tested this with a Spanish video about 1:30m long and it generated a seemingly English "transliterated" text from Spanish words instead of writing the Spanish words themselves.

Using the Spanish models from here: https://gitlab.com/Jaco-Assistant/deepspeech-polyglot#language-models-and-checkpoints

python3 autosub/main.py --model /home/cyberquarks/AutoSub/output_graph_es.pbmm --scorer /home/cyberquarks/AutoSub/kenlm_es.scorer --file /mnt/c/temp/test.mp4

Also, each sub line looks like this:

1
00:00:08,75 --> 00:01:30,30

So the subtitle covers the whole video when used with VLC

Docker build broken

docker build -t autosub . Cannot build and results in

Step 11/13 : RUN pip3 install --no-cache-dir -r requirements.txt
 ---> Running in bdf3fc44f538
Collecting cycler==0.10.0 (from -r requirements.txt (line 1))
  Downloading https://files.pythonhosted.org/packages/f7/d2/e07d3ebb2bd7af696440ce7e754c59dd546ffe1bbe732c8ab68b9c834e61/cycler-0.10.0-py2.py3-none-any.whl
Collecting numpy (from -r requirements.txt (line 2))
  Downloading https://files.pythonhosted.org/packages/45/b2/6c7545bb7a38754d63048c7696804a0d947328125d81bf12beaa692c3ae3/numpy-1.19.5-cp36-cp36m-manylinux1_x86_64.whl (13.4MB)
Collecting stt==1.0.0 (from -r requirements.txt (line 3))
  Could not find a version that satisfies the requirement stt==1.0.0 (from -r requirements.txt (line 3)) (from versions: 0.10.0a5, 0.10.0a6, 0.10.0a8, 0.10.0a9, 0.10.0a10)
No matching distribution found for stt==1.0.0 (from -r requirements.txt (line 3))
The command '/bin/sh -c pip3 install --no-cache-dir -r requirements.txt' returned a non-zero code: 1

HI abhiroop! i got an error while running your code please help me.

i encountered string to float error i don't how to fix this. i am really thankful if you reply to this thread.

i cant pass this stip " $ source sub/bin/activate "

the error say that " source " is unknow command .

people said that because it s unix command and i use window 10

so how to solve this problem ?

thank you so much

How to install on Windows?

Hello could you let me know how to install and run your program on windows. I am on the step where I did "pip3 install -r requirements.txt" and I got the following error.

        ERROR: Cannot install -r requirements.txt (line 4) and numpy==1.18.1 because these package versions have conflicting 
        dependencies.
        
        The conflict is caused by:
            The user requested numpy==1.18.1
            deepspeech 0.8.2 depends on numpy<=1.17.0 and >=1.14.5
        
        To fix this you could try to:
        1. loosen the range of package versions you've specified
        2. remove package versions to allow pip attempt to solve the dependency conflict
        
        ERROR: ResolutionImpossible: for help visit https://pip.pypa.io/en/latest/user_guide/#fixing-conflicting-dependencies

Broken logging module import

Traceback (most recent call last):
  File "autosub/main.py", line 8, in <module>
    from . import logger
ImportError: cannot import name 'logger'

ImportError: attempted relative import with no known parent package

Hi, my config is Win10_x64, Python 3.8. When i execute $ C:/Soft/Autosub/sub/Scripts/python autosub/main.py --file D:/Work/video.mkv. It gives me error: Traceback (most recent call last): File "autosub/main.py", line 8, in <module> from . import logger
Info: `User@Computer MINGW64 /c/Soft/Autosub (master)
$ pip list
Package Version

absl-py 1.0.0
astunparse 1.6.3
cachetools 4.2.4
certifi 2021.10.8
charset-normalizer 2.0.12
cycler 0.10.0
deepspeech-gpu 0.9.3
distlib 0.3.4
ffmpeg 1.4
filelock 3.6.0
gast 0.3.3
google-auth 1.35.0
google-auth-oauthlib 0.4.6
google-pasta 0.2.0
grpcio 1.44.0
h5py 2.10.0
idna 3.3
importlib-metadata 4.11.3
joblib 0.16.0
Keras-Preprocessing 1.1.2
kiwisolver 1.2.0
Markdown 3.3.6
numpy 1.22.3
oauthlib 3.2.0
opt-einsum 3.3.0
pip 19.2.3
platformdirs 2.5.1
protobuf 3.19.4
pyasn1 0.4.8
pyasn1-modules 0.2.8
pydub 0.23.1
pyparsing 2.4.7
python-dateutil 2.8.1
requests 2.27.1
requests-oauthlib 1.3.1
rsa 4.8
scikit-learn 1.0.2
scipy 1.4.1
setuptools 41.2.0
six 1.15.0
stt 1.0.0
tensorboard 2.2.2
tensorboard-plugin-wit 1.8.1
tensorflow-gpu 2.2.0
tensorflow-gpu-estimator 2.2.0
termcolor 1.1.0
threadpoolctl 3.1.0
tqdm 4.44.1
urllib3 1.26.9
virtualenv 20.13.3
Werkzeug 2.0.3
wheel 0.37.1
wrapt 1.14.0
zipp 3.7.0
WARNING: You are using pip version 19.2.3, however version 22.0.4 is available.
You should consider upgrading via the 'python -m pip install --upgrade pip' command.`

Issue in running model

Command I run: python3 autosub/main.py --file video.mp4

[INFO] ARGS: Namespace(dry_run=False, engine='stt', file='video.mp4', format='srt', model=None, scorer=None, split_duration=5)
[INFO] Model: /media/ravneet/SSD2/TMN_Tasks/AutoSub/model.tflite
[INFO] Scorer: /media/ravneet/SSD2/TMN_Tasks/AutoSub/deepspeech-0.9.3-models.scorer
[INFO] Input file: video.mp4
[INFO] Extracted audio to audio/video.wav
[INFO] Splitting on silent parts in audio file
[INFO] Running inference...
TensorFlow: v2.3.0-6-g23ad988fcde
Coqui STT: v0.10.0-alpha.10-0-g9b517632
Warning: reading entire model file into memory. Transform model file into an mmapped graph to reduce heap usage.
2022-08-15 23:53:27.304189: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN)to use the following CPU instructions in performance-critical operations: AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
Data loss: Can't parse /media/ravneet/SSD2/TMN_Tasks/AutoSub/model.tflite as binary proto
[ERROR] Invalid model file

Problem using Chinese Acoustic models

Hey Abhiroop great work on the AutoSub. It works flawlessly with the English acoustic models. However I was trying to use the experimental Chinese acoustic models which have been released and faced some issues. I guess there are some issues with the encoding part of it when the subtitle file is being written. Can you please check ?

I have tried to change the encoding to utf-8 but it hasn't helped.

1 Here are the steps to reduce big chunks of text - SRT

Hi,

I did this manually but maybe someone can improve it and write some script for it?:

Check text that is longer than 7 words
Add a line break to each line longer than 7 words.
Get the numbers of lines you got, ie:
blah blah blah blah blah blah blah
blah blah blah blah blah blah blah
blah blah blah blah blah blah blah
blah blah blah blah blah

There are 4 lines
4. We take the initial and final time

13 <<< SRT Subtitle position
ie: 00:00:25,90 --> 00:00:35,25

We surplus them and divide them by the number of lines
35,25 - 25,90 = 9,35
4 lines of max 7 words each
9,35/4 = 2,33
We take 2.33 less 1 for each limit, Ie:

13 <<< SRT Subtitle position
00:00:25,90 --> 00:00:28,23
blah blah blah blah blah blah blah

14
00:00:28,24 --> 00:00:30,57
blah blah blah blah blah blah blah

15
00:00:30,58 --> 00:00:32,91
blah blah blah blah blah blah blah

16
00:00:32,92 --> 00:00:35,25
blah blah blah blah blah

14 <<<< WARNING >>>> update all the other numbers in this case to 17 (check step 7 below)
blah blah blah blah

We need to update the SRT Subtitle position, in this case we finished at 16, so we need to replace 14 for 17, and do the same for all the other numbers. Note: Update from top to bottom so the counter increments.

That's it.

Anyone? :)

No matching distribution found for stt==1.0.0

When I install the package, I'm getting

> pip install -r requirements.txt
Collecting cycler==0.10.0
  Using cached cycler-0.10.0-py2.py3-none-any.whl (6.5 kB)
Collecting numpy
  Using cached numpy-1.22.2.zip (11.4 MB)
  Installing build dependencies ... done
  Getting requirements to build wheel ... done
  Preparing metadata (pyproject.toml) ... done
ERROR: Could not find a version that satisfies the requirement stt==1.0.0 (from versions: none)
ERROR: No matching distribution found for stt==1.0.0

UPDATE
Looks like an issue with stt

> pip install stt==1.2.0
ERROR: Could not find a version that satisfies the requirement stt==1.2.0 (from versions: none)
ERROR: No matching distribution found for stt==1.2.0

Include OpenAI Whisper model

OpenAI just released probably the best model that there is for speech recognition right now.

It would be great to incorprate this into this project!

More info: https://openai.com/blog/whisper/

How to train models by ourselves and do we need to modify the code if we want to use GPU?

Hi,
I just want to know how to train models by ourselves and do we need to modify the code if we want to use GPU？

ImportError: DLL load failed while importing _impl: A dynamic link library (DLL) initialization routine failed.

Hi, after I installed Autosub I always get this error message when I try to run the program "ImportError: DLL load failed while importing _impl: A dynamic link library (DLL) initialization routine failed." Could anyone tell me how to fix this?

is it possible to retrain on mistakes?

Given a false sub, would it be possible to give a correct sub, and retrain on it?

Illegal instruction when trying to perform an operation on a mp4 file

I'm getting an error when trying to perform autosub on a mp4 file:

python3 main.py --model "/media/segundohd/app_repo/autosub_deepspeech_mozilla/AutoSub/autosub/deepspeech-0.8.2-models.pbmm" --scorer "/media/segundohd/app_repo/autosub_deepspeech_mozilla/AutoSub/autosub/deepspeech-0.8.2-models.scorer" --file "test.mp4"

The error message:
Illegal instruction (core dumped)

I think my computer is too old. I have an Intel integrated graphics card. Maybe that could be the reason? I'm buying a new video card next month and I'll try to use it again if that's the problem.
If it works, I'll consider making a kivy GUI for it.

Move model COPY steps higher in Dockerfile

It's annoying to have to wait so long to copy the files into the image because of a small cache invalidation from an earlier step.

doesnt even run

keeps getting

Extracted audio to audio/input.wav
Splitting on silent parts in audio file
Traceback (most recent call last):
  File "./autosub/main.py", line 130, in <module>
    main()
  File "./autosub/main.py", line 112, in main
    silenceRemoval(audio_file_name)
  File "/Users/kelvin/Downloads/AutoSub-master/autosub/segmentAudio.py", line 194, in silenceRemoval
    raise Exception("Input audio file not found!")
Exception: Input audio file not found!

Cue points for individual words

YouTube's latest speech recognition creates cue points for individual words which are made visible in the moment, they are spoken.

YT is transmitting their subtitles in a format that looks like this (which I do not recognize):

{
  "wireMagic": "pb3",
  "pens": [ {
  
  } ],
  "wsWinStyles": [ {
  
  }, {
    "mhModeHint": 2,
    "juJustifCode": 0,
    "sdScrollDir": 3
  } ],
  "wpWinPositions": [ {
  
  }, {
    "apPoint": 6,
    "ahHorPos": 20,
    "avVerPos": 100,
    "rcRows": 2,
    "ccCols": 40
  } ],
  "events": [ {
    "tStartMs": 0,
    "dDurationMs": 2795440,
    "id": 1,
    "wpWinPosId": 1,
    "wsWinStyleId": 1
  }, {
    "tStartMs": 80,
    "dDurationMs": 3119,
    "wWinId": 1,
    "segs": [ {
      "utf8": "hey",
      "acAsrConf": 255
    }, {
      "utf8": " everybody",
      "tOffsetMs": 160,
      "acAsrConf": 255
    }, {
      "utf8": " how's",
      "tOffsetMs": 480,
      "acAsrConf": 255

However, VTT subtitle format also supports cues for individual words (c.f. Karaoke style text), although not yet supported natively by the Firefox I tested in.

Timing information is available through DeepSpeeche's sttWithMetadata() and is easily transformed from character timings to word timings using their client.py.

The request is to have an output file that allows us to do it like YT: Display individual words as they are spoken.

Some words are missing

Hi, thanks for the great project!

I have a problem with some words that are missing in the transcript.
But if I just do transcribe the same audio using only deepspeech project (not autosub with ds engine), there are no missing words.

Are there any tweaks that can be done by parameter? or is it because silent segment removal process?

Here is the txt output from autosub with ds engine

biggest . 

people make when larry english and probably one of the most common miss. 

people think that they. 





don't study. 

live in. 

an out let me explain what i . 

one does studying men and how do people usually approach this pro. 

and how do people.

And here is the deepspeech output.

the biggest mistake people make when morning english and probably one of the most common misconceptions is that people think that they need to study english and usedn't study english live english an outlet explain what i mean one does studying men and how do people

As you can see, some words are missing on autosub output.

I am using same deepspeech 0.9.3 version and model both on autosub and deepspeech.

Add a Dockerfile?

Would it be feasible to create a Dockerfile and put this Docker Hub for more convenient usage?

This would allow users, or even developers to simply pull and start using AutoSub immediately without worrying about runtime, dependencies, and less about configuration.

[Question] Dealing with background noise/music

Any thoughts?
https://github.com/tyiannak/pyAudioAnalysis - says it can detect unkown sounds and has speaker diarization.
Can it use speaker diarization to extract only vocals and ignore other sources of sound?
Also if 2 people are speaking having speaker diarization would be great

Amazing! Although the results are not that accurate.

How can I improve the accuracy? Many thanks.

https://www.youtube.com/watch?v=TOQwUISm6fw

1
00:00:00,10 --> 00:00:00,95
one

2
00:00:01,55 --> 00:00:07,10
this visconti to learn how to go at the vices

3
00:00:08,20 --> 00:00:14,80
who to open that automatic aridius in

4
00:00:14,95 --> 00:00:21,50
and this idea will guide us how to add indicator automatical

5
00:00:21,95 --> 00:00:26,50
and we will go and newton

6
00:00:26,90 --> 00:00:34,0:
and we were used in piraguas moreale indication

7
00:00:34,40 --> 00:00:43,45
starting first of all we need to add one more premature

8
00:00:44,25 --> 00:00:53,80
the primates named yes bury in the morning added coelesti special the period wooing happy

9
00:00:54,15 --> 00:00:59,70
and we will get the primitive party by using dislike

10
00:01:02,60 --> 00:01:07,25
and it the new eenamost cat indicated handle

11
00:01:08,25 --> 00:01:10,80
to get hard

12
00:01:12,25 --> 00:01:15,15
i remember very honour

13
00:01:15,75 --> 00:01:20,0:
if the specific indicator doesn't because

14
00:01:20,75 --> 00:01:30,15
then his deep we create a new one force and then returned the handle of the new indicator

15
00:01:30,50 --> 00:01:31,90
if

16
00:01:32,40 --> 00:01:43,50
the specific indicator has existed then this peril return the handle of the specifically indicate

17
00:01:46,30 --> 00:01:55,50
and handles into idea holes to hide in fine the chatterment

18
00:01:55,70 --> 00:01:59,30
so hand is very important information

19
00:01:59,55 --> 00:02:05,70
we will stop the return better to this very

20
00:02:06,40 --> 00:02:17,80
don't forget to go estimable after wives lorrainese how it works

21
00:02:18,10 --> 00:02:19,40
by ronald

22
00:02:24,20 --> 00:02:26,35
we just need to copy

.............

[Feature] GPU support

Love the project, would it be too difficult to add gpu support? Planning on using this on production!

Malformed SRT/VTT file (extra colon characters)

Thanks for the promising program. I really believe in this work, so I will become an active contributor.

The Python code for subtitles ending with the millisecond section containing zeroes is buggy. I have fixed this (see associated commit), but for the sake of completeness here is a description of the issue.

Here's a excerpt of some generated SRT output that can't load in VLC, mpv and other programs

3
00:00:12,95 --> 00:00:14,0:
but you but there

4
00:00:14,60 --> 00:00:15,30
the

And the same excerpt from the VTT output:

00:00:12.95 --> 00:00:14.0:  align:start position:0%
but you but there
<c> but</c><0:00:13.330000><c> you</c><0:00:13.450000><c> but</c><0:00:13.670000><c> there</c>

00:00:14.60 --> 00:00:15.30  align:start position:0%
the
<c> the</c>

The root cause is a broken try/except block in the source code which I have fixed.

I should note that users with sed installed can run commands like sed -i 's_: -->_ -->_g' filename.srt to repair the issue, but my attached fix will prevent the issue occurring in the first place.

Cannot import logger

docker run --volume=`pwd`input:/input --name autosub autosub --file /input/video.mp4 encounter an error of

Traceback (most recent call last):
File "autosub/main.py", line 8, in
from . import logger
ImportError: cannot import name 'logger'

ERROR: No matching distribution found for deepspeech==0.9.3 (from -r requirements.txt (line 3))

pip3 install -r requirements.txt

Collecting deepspeech==0.9.3 (from -r requirements.txt (line 3))
  ERROR: Could not find a version that satisfies the requirement deepspeech==0.9.3 (from -r requirements.txt (line 3)) (from versions: none)
ERROR: No matching distribution found for deepspeech==0.9.3 (from -r requirements.txt (line 3))

fix imports to autosub module

#! /bin/sh
sed -i 's/import logger/from . import logger/' autosub/main.py autosub/utils.py autosub/audioProcessing.py autosub/segmentAudio.py
sed -i 's/from utils import \*/from .utils import */' autosub/main.py
sed -i 's/from writeToFile import write_to_file/from .writeToFile import write_to_file/' autosub/main.py
sed -i 's/from audioProcessing import extract_audio/from .audioProcessing import extract_audio/' autosub/main.py
sed -i 's/from segmentAudio import remove_silent_segments/from .segmentAudio import remove_silent_segments/' autosub/main.py
sed -i 's/import trainAudio as TA/from . import trainAudio as TA/' autosub/segmentAudio.py
sed -i 's/import featureExtraction as FE/from . import featureExtraction as FE/' autosub/segmentAudio.py

error was

ModuleNotFoundError: No module named 'logger'
ModuleNotFoundError: No module named 'utils'
ModuleNotFoundError: No module named 'writeToFile'
ModuleNotFoundError: No module named 'audioProcessing'
ModuleNotFoundError: No module named 'segmentAudio'
ModuleNotFoundError: No module named 'trainAudio'
ModuleNotFoundError: No module named 'featureExtraction'

1 stream not stack = write result to disk more often

write result to disk more often, not just once at the end of the process

example code snippet from my srtgen

output_file_path = None
output_file_handle = None

# output goes to stdout and file
def out(*args, **kwargs):
    print(*args, **kwargs)
    if output_file_handle:
        log(f"writing to {output_file_path}")
        kwargs["file"] = output_file_handle
        print(*args, **kwargs)
        output_file_handle.flush()
    else:
        log("not writing to output_file_path") # this should not happen

def transcribe_file(input_video_path):
    """Transcribe the given video file."""

    global output_file_path
    global output_file_handle

    output_file_path = os.path.join(tempdir, "output_file.srt")
    output_file_handle = open(output_file_path, "w")

    for ...
      out("... result ...")

this could also allow to pause and continue the process

Older CPU's get Illegal instruction with Deepspeech binary due to missing AVX instructions, I have recompiled and replaced the binary..

So, long standing issue/non issue due to upstream (tensorflow) decisions when packaging the binaries to require AVX extensions. Anyways, 2 hours later I have recompiled DeepSpeech and have verified it working on my machine.

Buut, even after replacing the binary AutoSub/sub/bin/deepspeech I still get the illegal instruction.

Where exactly can I replace the binary to use the one I compiled that will work on my cpu? Thank you.

flac not wav

pro: flac needs less disk space
con: wav is easier to process

-[INFO] Extracted audio to audio/video-file-name.wav
+[INFO] Extracted audio to audio/video-file-name.flac

to read flac files, we can use pydub library

Create temporary files in temporary directories

The /audio directory is never cleaned up automatically. However, this is temporary data, and should end in a TemporaryDirectory. Failure to clean the dir can accumulate huge amounts of uncompressed audio data as well raise issues when re-running the tool with a file carrying the same name as used before: ffmpeg prompt for override and in the end, this tool tries to feed sound data from previous runs into DeepSpeech and erroring out:

(sub) user@machine:~/git/AutoSub$ python3 autosub/main.py --model ~/deepspeech/de/output_graph.pbmm --scorer ~/deepspeech/de/kenlm.scorer --file ~/deepspeech/de/geteiltes_polen.wav
AutoSub v0.1

TensorFlow: v2.3.0-6-g23ad988
DeepSpeech: v0.9.3-0-gf2e9c85
2021-01-31 15:59:44.865056: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN)to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.

Input file: /home/ajcay/deepspeech/de/geteiltes_polen.wav
Guessed Channel Layout for Input Stream #0.0 : mono
File '/home/ajcay/git/AutoSub/audio/geteiltes_polen.wav' already exists. Overwrite ? [y/N] y
Extracted audio to audio/geteiltes_polen.wav
Splitting on silent parts in audio file

Running inference:
 85%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▋                        | 82/96 [03:23<00:27,  1.93s/it]Traceback (most recent call last):
  File "autosub/main.py", line 180, in <module>
    main()
  File "autosub/main.py", line 174, in main
    ds_process_audio(ds, audio_segment_path, file_handle)
  File "autosub/main.py", line 117, in ds_process_audio
    write_to_file(file_handle, infered_text, line_count, limits)
  File "/home/ajcay/git/AutoSub/autosub/writeToFile.py", line 18, in write_to_file
    d = str(datetime.timedelta(seconds=float(limits[0])))
ValueError: could not convert string to float: 'hoerfilm16k'
 85%|████████████████████████████████████████

I dont get it

after runing pip3 install -r requirements.txt

sub) 1sm23@liushimeHacmini AutoSub % pip3 install -r requirements.txt
Requirement already satisfied: cycler==0.10.0 in ./sub/lib/python3.8/site-packages (from -r requirements.txt (line 1)) (0.10.0)
Requirement already satisfied: Cython==0.29.21 in ./sub/lib/python3.8/site-packages (from -r requirements.txt (line 2)) (0.29.21)
Collecting numpy==1.18.1
  Using cached numpy-1.18.1-cp38-cp38-macosx_10_9_x86_64.whl (15.2 MB)
Requirement already satisfied: deepspeech==0.8.2 in ./sub/lib/python3.8/site-packages (from -r requirements.txt (line 4)) (0.8.2)
Requirement already satisfied: joblib==0.16.0 in ./sub/lib/python3.8/site-packages (from -r requirements.txt (line 5)) (0.16.0)
Requirement already satisfied: kiwisolver==1.2.0 in ./sub/lib/python3.8/site-packages (from -r requirements.txt (line 6)) (1.2.0)
Requirement already satisfied: pydub==0.23.1 in ./sub/lib/python3.8/site-packages (from -r requirements.txt (line 7)) (0.23.1)
Requirement already satisfied: pyparsing==2.4.7 in ./sub/lib/python3.8/site-packages (from -r requirements.txt (line 8)) (2.4.7)
Requirement already satisfied: python-dateutil==2.8.1 in ./sub/lib/python3.8/site-packages (from -r requirements.txt (line 9)) (2.8.1)
Collecting scikit-learn==0.21.3
  Using cached scikit-learn-0.21.3.tar.gz (12.2 MB)
Requirement already satisfied: scipy==1.4.1 in ./sub/lib/python3.8/site-packages (from -r requirements.txt (line 11)) (1.4.1)
Requirement already satisfied: six==1.15.0 in ./sub/lib/python3.8/site-packages (from -r requirements.txt (line 12)) (1.15.0)
Collecting tqdm==4.44.1
  Using cached tqdm-4.44.1-py2.py3-none-any.whl (60 kB)
Using legacy 'setup.py install' for scikit-learn, since package 'wheel' is not installed.
Installing collected packages: numpy, scikit-learn, tqdm
  Attempting uninstall: numpy
    Found existing installation: numpy 1.17.3
    Uninstalling numpy-1.17.3:
      Successfully uninstalled numpy-1.17.3
    Running setup.py install for scikit-learn ... error
    ERROR: Command errored out with exit status 1:
     command: /Users/1sm23/Documents/GitHub.nosync/AutoSub/sub/bin/python3 -u -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/private/var/folders/0c/hhcsy8ws0dv5rbfl9w3kt9940000gp/T/pip-install-xijjs7wh/scikit-learn/setup.py'"'"'; __file__='"'"'/private/var/folders/0c/hhcsy8ws0dv5rbfl9w3kt9940000gp/T/pip-install-xijjs7wh/scikit-learn/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' install --record /private/var/folders/0c/hhcsy8ws0dv5rbfl9w3kt9940000gp/T/pip-record-tolv31lh/install-record.txt --single-version-externally-managed --compile --install-headers /Users/1sm23/Documents/GitHub.nosync/AutoSub/sub/include/site/python3.8/scikit-learn
         cwd: /private/var/folders/0c/hhcsy8ws0dv5rbfl9w3kt9940000gp/T/pip-install-xijjs7wh/scikit-learn/
    Complete output (51 lines):
    Partial import of sklearn during the build process.
    C compiler: xcrun -sdk macosx clang -Wno-unused-result -Wsign-compare -Wunreachable-code -fno-common -dynamic -DNDEBUG -g -fwrapv -O3 -Wall -iwithsysroot/System/Library/Frameworks/System.framework/PrivateHeaders -iwithsysroot/Applications/Xcode.app/Contents/Developer/Library/Frameworks/Python3.framework/Versions/3.8/Headers -arch arm64 -arch x86_64
    
    compile options: '-c'
    extra options: '-fopenmp'
    xcrun: test_openmp.c
    clang: error: unsupported option '-fopenmp'
    clang: error: unsupported option '-fopenmp'
    Traceback (most recent call last):
      File "<string>", line 1, in <module>
      File "/private/var/folders/0c/hhcsy8ws0dv5rbfl9w3kt9940000gp/T/pip-install-xijjs7wh/scikit-learn/setup.py", line 290, in <module>
        setup_package()
      File "/private/var/folders/0c/hhcsy8ws0dv5rbfl9w3kt9940000gp/T/pip-install-xijjs7wh/scikit-learn/setup.py", line 286, in setup_package
        setup(**metadata)
      File "/Users/1sm23/Documents/GitHub.nosync/AutoSub/sub/lib/python3.8/site-packages/numpy/distutils/core.py", line 137, in setup
        config = configuration()
      File "/private/var/folders/0c/hhcsy8ws0dv5rbfl9w3kt9940000gp/T/pip-install-xijjs7wh/scikit-learn/setup.py", line 174, in configuration
        config.add_subpackage('sklearn')
      File "/Users/1sm23/Documents/GitHub.nosync/AutoSub/sub/lib/python3.8/site-packages/numpy/distutils/misc_util.py", line 1033, in add_subpackage
        config_list = self.get_subpackage(subpackage_name, subpackage_path,
      File "/Users/1sm23/Documents/GitHub.nosync/AutoSub/sub/lib/python3.8/site-packages/numpy/distutils/misc_util.py", line 999, in get_subpackage
        config = self._get_configuration_from_setup_py(
      File "/Users/1sm23/Documents/GitHub.nosync/AutoSub/sub/lib/python3.8/site-packages/numpy/distutils/misc_util.py", line 941, in _get_configuration_from_setup_py
        config = setup_module.configuration(*args)
      File "sklearn/setup.py", line 76, in configuration
        maybe_cythonize_extensions(top_path, config)
      File "/private/var/folders/0c/hhcsy8ws0dv5rbfl9w3kt9940000gp/T/pip-install-xijjs7wh/scikit-learn/sklearn/_build_utils/__init__.py", line 42, in maybe_cythonize_extensions
        with_openmp = check_openmp_support()
      File "/private/var/folders/0c/hhcsy8ws0dv5rbfl9w3kt9940000gp/T/pip-install-xijjs7wh/scikit-learn/sklearn/_build_utils/openmp_helpers.py", line 140, in check_openmp_support
        raise CompileError(err_message)
    distutils.errors.CompileError:
                        ***
    
    It seems that scikit-learn cannot be built with OpenMP support.
    
    - Make sure you have followed the installation instructions:
    
        https://scikit-learn.org/dev/developers/advanced_installation.html
    
    - If your compiler supports OpenMP but the build still fails, please
      submit a bug report at:
    
        https://github.com/scikit-learn/scikit-learn/issues
    
    - If you want to build scikit-learn without OpenMP support, you can set
      the environment variable SKLEARN_NO_OPENMP and rerun the build
      command. Note however that some estimators will run in sequential
      mode and their `n_jobs` parameter will have no effect anymore.
    
                        ***
    
    ----------------------------------------
ERROR: Command errored out with exit status 1: /Users/1sm23/Documents/GitHub.nosync/AutoSub/sub/bin/python3 -u -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/private/var/folders/0c/hhcsy8ws0dv5rbfl9w3kt9940000gp/T/pip-install-xijjs7wh/scikit-learn/setup.py'"'"'; __file__='"'"'/private/var/folders/0c/hhcsy8ws0dv5rbfl9w3kt9940000gp/T/pip-install-xijjs7wh/scikit-learn/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' install --record /private/var/folders/0c/hhcsy8ws0dv5rbfl9w3kt9940000gp/T/pip-record-tolv31lh/install-record.txt --single-version-externally-managed --compile --install-headers /Users/1sm23/Documents/GitHub.nosync/AutoSub/sub/include/site/python3.8/scikit-learn Check the logs for full command output.

i think i have completely install it but i need help .i am very...

How-to example

Make sure the model and scorer files are in the root directory. They are automatically loaded
After following the installation instructions, you can run autosub/main.py as given below. The --file argument is the video file for which SRT file is to be generated

$ python3 autosub/main.py --file ~/movie.mp4

1 i dont understand what is root directory ? it is *****/autosub ?? or somewhere else ?

2 after i download deepspeech and i think i finally can install this program but i got so many error .

its hard to explain but it lookalike this program try to sub your tut document.

what i am worng here.

'Failed to initialize memory mapped mode

speech/init.py", line 38, in init
raise RuntimeError("CreateModel failed with '{}' (0x{:X})".format(deepspeech.impl.ErrorCodeToErrorMessage(status),status))
RuntimeError: CreateModel failed with 'Failed to initialize memory mapped model.' (0x3000)

windows version?

How do install windows?

processing mandarin Chinese video error

Hi Dear author:

When I process audio to srt with mandarin Chinese video, the following error occurred:
First time with same video:

Traceback (most recent call last):
File "C:\Python\AutoSub\autosub\main.py", line 139, in
main()
File "C:\Python\AutoSub\autosub\main.py", line 129, in main
ds_process_audio(ds, audio_segment_path, file_handle)
File "C:\Python\AutoSub\autosub\main.py", line 68, in ds_process_audio
write_to_file(file_handle, infered_text, line_count, limits)
File "C:\Python\AutoSub\autosub\writeToFile.py", line 24, in write_to_file
d = str(datetime.timedelta(seconds=float(limits[1])))
IndexError: list index out of range

second time with same video:

Traceback (most recent call last):
File "C:\Python\AutoSub\autosub\main.py", line 139, in
main()
File "C:\Python\AutoSub\autosub\main.py", line 129, in main
ds_process_audio(ds, audio_segment_path, file_handle)
File "C:\Python\AutoSub\autosub\main.py", line 68, in ds_process_audio
write_to_file(file_handle, infered_text, line_count, limits)
File "C:\Python\AutoSub\autosub\writeToFile.py", line 32, in write_to_file
file_handle.write(inferred_text + "\n\n")
UnicodeEncodeError: 'gbk' codec can't encode character '\udce4' in position 18: illegal multibyte sequence

It’s well done with en video, I can get srt.

system: ubuntu 20.04
python: 3.7.1 3.8.8 and 3.9.4

Thank you very much for your reply.

Numpy <= 1.17.0 required? (I have 1.20.0)

During the installation of requirements I get,

deepspeech 0.9.3 has requirement numpy<=1.17.0,>=1.14.5, but you'll have numpy 1.20.0 which is incompatible.

Problem running on Mac OS X Moterey

I installed everything using commit 0d38535a7511d81a126dcd33e4b8e0922585b011 and created a virtualenv project as suggested. When I tried to run autosub/main.py I got:

Traceback (most recent call last):
  File "/Users/user/python virtualenvs/AutoSub/AutoSub/autosub/main.py", line 8, in <module>
	from . import logger
ImportError: attempted relative import with no known parent package

I changed from . import logger to import logger and got:
Traceback (most recent call last):
File "/Users/user/python virtualenvs/AutoSub/AutoSub/autosub/main.py", line 11, in
import numpy as np
ModuleNotFoundError: No module named 'numpy'

I deactivated the virtualenv project and tried to install numpy––python3 -m pip install --user numpy––and got:

Requirement already satisfied: numpy in /usr/local/lib/python3.10/site-packages (1.23.2)

I verified that numpy is in that directory.

At this point there is no point in continuing. I see there are instructions for building and running autosub using Docker. This is not an option for me so how can I procede in OS X?

Is it impossible to recognize in another language?

I hope the caption file comes out in Japanese.

I put a video of the conversation in Japanese and it came out in English, is there a way to modify it?

or.. is there any japaense model files?

Error on installing

Hi my friend. I'm getting the following error when installing. I think you're missing Cython package in your requirements.

ERROR: Command errored out with exit status 1:
     command: /media/segundohd/app_repo/autosub_deepspeech_mozilla/AutoSub/sub/bin/python3 -u -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-install-91ax7yu0/scikit-learn/setup.py'"'"'; __file__='"'"'/tmp/pip-install-91ax7yu0/scikit-learn/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' install --record /tmp/pip-record-jddk896v/install-record.txt --single-version-externally-managed --compile --install-headers /media/segundohd/app_repo/autosub_deepspeech_mozilla/AutoSub/sub/include/site/python3.8/scikit-learn
         cwd: /tmp/pip-install-91ax7yu0/scikit-learn/
    Complete output (28 lines):
    Partial import of sklearn during the build process.
    Traceback (most recent call last):
      File "<string>", line 1, in <module>
      File "/tmp/pip-install-91ax7yu0/scikit-learn/setup.py", line 290, in <module>
        setup_package()
      File "/tmp/pip-install-91ax7yu0/scikit-learn/setup.py", line 286, in setup_package
        setup(**metadata)
      File "/media/segundohd/app_repo/autosub_deepspeech_mozilla/AutoSub/sub/lib/python3.8/site-packages/numpy/distutils/core.py", line 137, in setup
        config = configuration()
      File "/tmp/pip-install-91ax7yu0/scikit-learn/setup.py", line 174, in configuration
        config.add_subpackage('sklearn')
      File "/media/segundohd/app_repo/autosub_deepspeech_mozilla/AutoSub/sub/lib/python3.8/site-packages/numpy/distutils/misc_util.py", line 1033, in add_subpackage
        config_list = self.get_subpackage(subpackage_name, subpackage_path,
      File "/media/segundohd/app_repo/autosub_deepspeech_mozilla/AutoSub/sub/lib/python3.8/site-packages/numpy/distutils/misc_util.py", line 999, in get_subpackage
        config = self._get_configuration_from_setup_py(
      File "/media/segundohd/app_repo/autosub_deepspeech_mozilla/AutoSub/sub/lib/python3.8/site-packages/numpy/distutils/misc_util.py", line 941, in _get_configuration_from_setup_py
        config = setup_module.configuration(*args)
      File "sklearn/setup.py", line 62, in configuration
        config.add_subpackage('utils')
      File "/media/segundohd/app_repo/autosub_deepspeech_mozilla/AutoSub/sub/lib/python3.8/site-packages/numpy/distutils/misc_util.py", line 1033, in add_subpackage
        config_list = self.get_subpackage(subpackage_name, subpackage_path,
      File "/media/segundohd/app_repo/autosub_deepspeech_mozilla/AutoSub/sub/lib/python3.8/site-packages/numpy/distutils/misc_util.py", line 999, in get_subpackage
        config = self._get_configuration_from_setup_py(
      File "/media/segundohd/app_repo/autosub_deepspeech_mozilla/AutoSub/sub/lib/python3.8/site-packages/numpy/distutils/misc_util.py", line 941, in _get_configuration_from_setup_py
        config = setup_module.configuration(*args)
      File "sklearn/utils/setup.py", line 8, in configuration
        from Cython import Tempita
    ModuleNotFoundError: No module named 'Cython'
    ----------------------------------------
ERROR: Command errored out with exit status 1: /media/segundohd/app_repo/autosub_deepspeech_mozilla/AutoSub/sub/bin/python3 -u -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-install-91ax7yu0/scikit-learn/setup.py'"'"'; __file__='"'"'/tmp/pip-install-91ax7yu0/scikit-learn/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' install --record /tmp/pip-record-jddk896v/install-record.txt --single-version-externally-managed --compile --install-headers /media/segundohd/app_repo/autosub_deepspeech_mozilla/AutoSub/sub/include/site/python3.8/scikit-learn Check the logs for full command output.

.tflite files support

After the mozilla layoffs, the deepspeech team forked the Deepspeech repo and founded the company Coqui AI (https://github.com/coqui-ai/STT) where they continue the development and AFAIK they now only allow .tflite files to export models. It theoretically should work with the old code, but for me it didn't.

When I try to run it like this:

python3 autosub/main.py --file /Users/sgrotz/Downloads/kp193-hejma-auxtomatigo.mp3 --split-duration 8

with a .tflite file in the main folder and NO language model.

Then I get:

AutoSub

['autosub/main.py', '--file', '/Users/sgrotz/Downloads/kp193-hejma-auxtomatigo.mp3', '--split-duration', '8']
ARGS: Namespace(dry_run=False, file='/Users/sgrotz/Downloads/kp193-hejma-auxtomatigo.mp3', format=['srt', 'vtt', 'txt'], model=None, scorer=None, split_duration=8.0)
Warning no models specified via --model and none found in local directory. Please run getmodel.sh convenience script from autosub repo to get some.
Error: Must have pbmm model. Exiting

Have I done anything wrong here or doesn't AutoSub support .rflite files?

I tested it on MacOS and installed ffmpeg via homebrew.

[Question]Dealing with background music/noise

Split overly long transcript segments

Currently it's easily possible to receive cues with 340 characters or more. Amara.org suggests a maximum of 42 per line.

DeepSpeech provides timing data for each individual character, therefore the word duration and start can be calculated (see client.py) and a clean split is possible (possibly breaking the sentence but still better than 340 chars and better than having to support different grammars.)

Here is what their sample script already provides:

{
  "transcripts": [
    {
      "confidence": -44.99164581298828,
      "words": [
        {
          "word": "ja",
          "start_time": 0.36,
          "duration": 0.3
        },
        {
          "word": "meine",
          "start_time": 0.7,
          "duration": 0.22
        },
        {
          "word": "sehr",
          "start_time": 0.96,
          "duration": 0.14
        },
        {
          "word": "verehrten",
          "start_time": 1.12,
          "duration": 0.32
        },
        {
          "word": "damen",
          "start_time": 1.48,
          "duration": 0.22
        },
        {
          "word": "und",
          "start_time": 1.74,
          "duration": 0.12
        },
        {
          "word": "herren",
          "start_time": 1.9,
          "duration": 0.5
        },
        {
          "word": "liebe",
          "start_time": 2.48,
          "duration": 0.32
        },
        {
          "word": "frau",
          "start_time": 2.86,
          "duration": 0.48
        },
        {
          "word": "versorgen",
          "start_time": 3.38,
          "duration": 0.42
        }
      ]
    },
    {
      "confidence": -45.583431243896484,
      "words": [
        {
          "word": "ja",
          "start_time": 0.36,
          "duration": 0.3
        },

The idea is to find a good point for splitting, which can be a weighted decision: the closer we come to the desired splitting point, the less time between words is required for a split, if a split is required/ desired.

Creating GitHub Organization and possibly changing project name

Ideally this project would be developed under a GitHub organization (but it's definitely not required).

The Autosub organization (https://github.com/autosub) namespace is already taken, and as you may know there's an unrelated (abandoned) project called "autosub" that uses the same name: https://github.com/agermanidis/autosub

Ideally you'd get that organization namespace but I don't think it's possible given there's no contact information.

How about renaming this project "AutoSubs" (or "AutoSubtitler") and registering https://github.com/AutoSubs (or https://github.com/AutoSubtitler)? I prefer "AutoSubs". Another option would be to keep the name "AutoSub" but register a different organization.

I think the GitHub organization domain would help with the growth of the project. But again, the change is definitely not required.

(Feel free to close this issue as WONTFIX, if you'd like)

Broken Docker build

$ docker build -t autosub .
Sending build context to Docker daemon  113.2kB
Step 1/13 : ARG BASEIMAGE=ubuntu:18.04
Step 2/13 : FROM ${BASEIMAGE}
 ---> b67d6ac264e4
Step 3/13 : ARG DEPSLIST=requirements.txt
 ---> Using cache
 ---> 0ae6d0d02403
Step 4/13 : ENV PYTHONUNBUFFERED 1
 ---> Using cache
 ---> 63c984eb9ae5
Step 5/13 : RUN DEBIAN_FRONTEND=noninteractive apt update &&     apt -y install ffmpeg libsm6 libxext6 python3 python3-pip &&     apt -y clean && 	rm -rf /var/lib/apt/lists/*
 ---> Using cache
 ---> 7e2214cd96b4
Step 6/13 : COPY $DEPSLIST ./requirements.txt
 ---> Using cache
 ---> 3f437c1a2f3c
Step 7/13 : RUN pip3 install --no-cache-dir -r requirements.txt
 ---> Running in 37765d138851
Collecting cycler==0.10.0 (from -r requirements.txt (line 1))
  Downloading https://files.pythonhosted.org/packages/f7/d2/e07d3ebb2bd7af696440ce7e754c59dd546ffe1bbe732c8ab68b9c834e61/cycler-0.10.0-py2.py3-none-any.whl
Collecting numpy (from -r requirements.txt (line 2))
  Downloading https://files.pythonhosted.org/packages/45/b2/6c7545bb7a38754d63048c7696804a0d947328125d81bf12beaa692c3ae3/numpy-1.19.5-cp36-cp36m-manylinux1_x86_64.whl (13.4MB)
Collecting stt==1.0.0 (from -r requirements.txt (line 3))
  Could not find a version that satisfies the requirement stt==1.0.0 (from -r requirements.txt (line 3)) (from versions: 0.10.0a5, 0.10.0a6, 0.10.0a8, 0.10.0a9, 0.10.0a10)
No matching distribution found for stt==1.0.0 (from -r requirements.txt (line 3))
The command '/bin/sh -c pip3 install --no-cache-dir -r requirements.txt' returned a non-zero code: 1```

abhirooptalasila / autosub Goto Github PK

autosub's People

Contributors

Stargazers

Watchers

Forkers

autosub's Issues

Recommend Projects

Recommend Topics

Recommend Org

Jobs