abhirooptalasila / autosub Goto Github PK
View Code? Open in Web Editor NEWA CLI script to generate subtitle files (SRT/VTT/TXT) for any video using either DeepSpeech or Coqui
License: MIT License
A CLI script to generate subtitle files (SRT/VTT/TXT) for any video using either DeepSpeech or Coqui
License: MIT License
autosub is using only 100% cpu when it should use 400% on a quad core cpu
the task should be easy to do in parallel, by splitting the audio into N segments for N cpu cores
related
Currently AutoSub has the following command-line interface:
usage: main.py [-h] --file FILE [--vtt]
AutoSub
optional arguments:
-h, --help show this help message and exit
--file FILE Input video file
--vtt Output a vtt file with cue points for individual words instead
of a srt file
The --vtt
option is used to switch between outputting a SRT file to outputting VTT file.
But given it's the inference part that takes the bulk of the execution time and the output subtitle file format doesn't take much time and uses very little disk space then rerunning the entire inference again just to get a VTT file is inefficient. It makes more sense just to create the VTT and SRT by default.
Also, AutoSub is well-placed to output a transcript of the input at the same time. Indeed, I saw an AutoSub fork that did just that.
So, I suggest replacing --vtt
with a --format
option that the user can restrict the generated file formats, but by default it should create all formats: VTT, SRT, and a TXT transcript.
I am happy to do the work and make a Pull Request. Are you happy with this approach?
I tested this with a Spanish video about 1:30m long and it generated a seemingly English "transliterated" text from Spanish words instead of writing the Spanish words themselves.
Using the Spanish models from here: https://gitlab.com/Jaco-Assistant/deepspeech-polyglot#language-models-and-checkpoints
python3 autosub/main.py --model /home/cyberquarks/AutoSub/output_graph_es.pbmm --scorer /home/cyberquarks/AutoSub/kenlm_es.scorer --file /mnt/c/temp/test.mp4
Also, each sub line looks like this:
1
00:00:08,75 --> 00:01:30,30
So the subtitle covers the whole video when used with VLC
docker build -t autosub .
Cannot build and results in
Step 11/13 : RUN pip3 install --no-cache-dir -r requirements.txt
---> Running in bdf3fc44f538
Collecting cycler==0.10.0 (from -r requirements.txt (line 1))
Downloading https://files.pythonhosted.org/packages/f7/d2/e07d3ebb2bd7af696440ce7e754c59dd546ffe1bbe732c8ab68b9c834e61/cycler-0.10.0-py2.py3-none-any.whl
Collecting numpy (from -r requirements.txt (line 2))
Downloading https://files.pythonhosted.org/packages/45/b2/6c7545bb7a38754d63048c7696804a0d947328125d81bf12beaa692c3ae3/numpy-1.19.5-cp36-cp36m-manylinux1_x86_64.whl (13.4MB)
Collecting stt==1.0.0 (from -r requirements.txt (line 3))
Could not find a version that satisfies the requirement stt==1.0.0 (from -r requirements.txt (line 3)) (from versions: 0.10.0a5, 0.10.0a6, 0.10.0a8, 0.10.0a9, 0.10.0a10)
No matching distribution found for stt==1.0.0 (from -r requirements.txt (line 3))
The command '/bin/sh -c pip3 install --no-cache-dir -r requirements.txt' returned a non-zero code: 1
the error say that " source " is unknow command .
people said that because it s unix command and i use window 10
so how to solve this problem ?
thank you so much
Hello could you let me know how to install and run your program on windows. I am on the step where I did "pip3 install -r requirements.txt" and I got the following error.
ERROR: Cannot install -r requirements.txt (line 4) and numpy==1.18.1 because these package versions have conflicting
dependencies.
The conflict is caused by:
The user requested numpy==1.18.1
deepspeech 0.8.2 depends on numpy<=1.17.0 and >=1.14.5
To fix this you could try to:
1. loosen the range of package versions you've specified
2. remove package versions to allow pip attempt to solve the dependency conflict
ERROR: ResolutionImpossible: for help visit https://pip.pypa.io/en/latest/user_guide/#fixing-conflicting-dependencies
Traceback (most recent call last):
File "autosub/main.py", line 8, in <module>
from . import logger
ImportError: cannot import name 'logger'
Hi, my config is Win10_x64, Python 3.8. When i execute $ C:/Soft/Autosub/sub/Scripts/python autosub/main.py --file D:/Work/video.mkv
. It gives me error: Traceback (most recent call last): File "autosub/main.py", line 8, in <module> from . import logger
Info: `User@Computer MINGW64 /c/Soft/Autosub (master)
$ pip list
Package Version
absl-py 1.0.0
astunparse 1.6.3
cachetools 4.2.4
certifi 2021.10.8
charset-normalizer 2.0.12
cycler 0.10.0
deepspeech-gpu 0.9.3
distlib 0.3.4
ffmpeg 1.4
filelock 3.6.0
gast 0.3.3
google-auth 1.35.0
google-auth-oauthlib 0.4.6
google-pasta 0.2.0
grpcio 1.44.0
h5py 2.10.0
idna 3.3
importlib-metadata 4.11.3
joblib 0.16.0
Keras-Preprocessing 1.1.2
kiwisolver 1.2.0
Markdown 3.3.6
numpy 1.22.3
oauthlib 3.2.0
opt-einsum 3.3.0
pip 19.2.3
platformdirs 2.5.1
protobuf 3.19.4
pyasn1 0.4.8
pyasn1-modules 0.2.8
pydub 0.23.1
pyparsing 2.4.7
python-dateutil 2.8.1
requests 2.27.1
requests-oauthlib 1.3.1
rsa 4.8
scikit-learn 1.0.2
scipy 1.4.1
setuptools 41.2.0
six 1.15.0
stt 1.0.0
tensorboard 2.2.2
tensorboard-plugin-wit 1.8.1
tensorflow-gpu 2.2.0
tensorflow-gpu-estimator 2.2.0
termcolor 1.1.0
threadpoolctl 3.1.0
tqdm 4.44.1
urllib3 1.26.9
virtualenv 20.13.3
Werkzeug 2.0.3
wheel 0.37.1
wrapt 1.14.0
zipp 3.7.0
WARNING: You are using pip version 19.2.3, however version 22.0.4 is available.
You should consider upgrading via the 'python -m pip install --upgrade pip' command.`
Command I run: python3 autosub/main.py --file video.mp4
[INFO] ARGS: Namespace(dry_run=False, engine='stt', file='video.mp4', format='srt', model=None, scorer=None, split_duration=5)
[INFO] Model: /media/ravneet/SSD2/TMN_Tasks/AutoSub/model.tflite
[INFO] Scorer: /media/ravneet/SSD2/TMN_Tasks/AutoSub/deepspeech-0.9.3-models.scorer
[INFO] Input file: video.mp4
[INFO] Extracted audio to audio/video.wav
[INFO] Splitting on silent parts in audio file
[INFO] Running inference...
TensorFlow: v2.3.0-6-g23ad988fcde
Coqui STT: v0.10.0-alpha.10-0-g9b517632
Warning: reading entire model file into memory. Transform model file into an mmapped graph to reduce heap usage.
2022-08-15 23:53:27.304189: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN)to use the following CPU instructions in performance-critical operations: AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
Data loss: Can't parse /media/ravneet/SSD2/TMN_Tasks/AutoSub/model.tflite as binary proto
[ERROR] Invalid model file
Hey Abhiroop great work on the AutoSub. It works flawlessly with the English acoustic models. However I was trying to use the experimental Chinese acoustic models which have been released and faced some issues. I guess there are some issues with the encoding part of it when the subtitle file is being written. Can you please check ?
I have tried to change the encoding to utf-8 but it hasn't helped.
y
Hi,
I did this manually but maybe someone can improve it and write some script for it?:
Check text that is longer than 7 words
Add a line break to each line longer than 7 words.
Get the numbers of lines you got, ie:
blah blah blah blah blah blah blah
blah blah blah blah blah blah blah
blah blah blah blah blah blah blah
blah blah blah blah blah
There are 4 lines
4. We take the initial and final time
13 <<< SRT Subtitle position
ie: 00:00:25,90 --> 00:00:35,25
We surplus them and divide them by the number of lines
35,25 - 25,90 = 9,35
4 lines of max 7 words each
9,35/4 = 2,33
We take 2.33 less 1 for each limit, Ie:
13 <<< SRT Subtitle position
00:00:25,90 --> 00:00:28,23
blah blah blah blah blah blah blah
14
00:00:28,24 --> 00:00:30,57
blah blah blah blah blah blah blah
15
00:00:30,58 --> 00:00:32,91
blah blah blah blah blah blah blah
16
00:00:32,92 --> 00:00:35,25
blah blah blah blah blah
14 <<<< WARNING >>>> update all the other numbers in this case to 17 (check step 7 below)
blah blah blah blah
That's it.
Anyone? :)
When I install the package, I'm getting
> pip install -r requirements.txt
Collecting cycler==0.10.0
Using cached cycler-0.10.0-py2.py3-none-any.whl (6.5 kB)
Collecting numpy
Using cached numpy-1.22.2.zip (11.4 MB)
Installing build dependencies ... done
Getting requirements to build wheel ... done
Preparing metadata (pyproject.toml) ... done
ERROR: Could not find a version that satisfies the requirement stt==1.0.0 (from versions: none)
ERROR: No matching distribution found for stt==1.0.0
UPDATE
Looks like an issue with stt
> pip install stt==1.2.0
ERROR: Could not find a version that satisfies the requirement stt==1.2.0 (from versions: none)
ERROR: No matching distribution found for stt==1.2.0
OpenAI just released probably the best model that there is for speech recognition right now.
It would be great to incorprate this into this project!
More info: https://openai.com/blog/whisper/
Hi,
I just want to know how to train models by ourselves and do we need to modify the code if we want to use GPU?
Hi, after I installed Autosub I always get this error message when I try to run the program "ImportError: DLL load failed while importing _impl: A dynamic link library (DLL) initialization routine failed." Could anyone tell me how to fix this?
Given a false sub, would it be possible to give a correct sub, and retrain on it?
I'm getting an error when trying to perform autosub on a mp4 file:
python3 main.py --model "/media/segundohd/app_repo/autosub_deepspeech_mozilla/AutoSub/autosub/deepspeech-0.8.2-models.pbmm" --scorer "/media/segundohd/app_repo/autosub_deepspeech_mozilla/AutoSub/autosub/deepspeech-0.8.2-models.scorer" --file "test.mp4"
The error message:
Illegal instruction (core dumped)
I think my computer is too old. I have an Intel integrated graphics card. Maybe that could be the reason? I'm buying a new video card next month and I'll try to use it again if that's the problem.
If it works, I'll consider making a kivy GUI for it.
It's annoying to have to wait so long to copy the files into the image because of a small cache invalidation from an earlier step.
keeps getting
Extracted audio to audio/input.wav
Splitting on silent parts in audio file
Traceback (most recent call last):
File "./autosub/main.py", line 130, in <module>
main()
File "./autosub/main.py", line 112, in main
silenceRemoval(audio_file_name)
File "/Users/kelvin/Downloads/AutoSub-master/autosub/segmentAudio.py", line 194, in silenceRemoval
raise Exception("Input audio file not found!")
Exception: Input audio file not found!
YouTube's latest speech recognition creates cue points for individual words which are made visible in the moment, they are spoken.
YT is transmitting their subtitles in a format that looks like this (which I do not recognize):
{
"wireMagic": "pb3",
"pens": [ {
} ],
"wsWinStyles": [ {
}, {
"mhModeHint": 2,
"juJustifCode": 0,
"sdScrollDir": 3
} ],
"wpWinPositions": [ {
}, {
"apPoint": 6,
"ahHorPos": 20,
"avVerPos": 100,
"rcRows": 2,
"ccCols": 40
} ],
"events": [ {
"tStartMs": 0,
"dDurationMs": 2795440,
"id": 1,
"wpWinPosId": 1,
"wsWinStyleId": 1
}, {
"tStartMs": 80,
"dDurationMs": 3119,
"wWinId": 1,
"segs": [ {
"utf8": "hey",
"acAsrConf": 255
}, {
"utf8": " everybody",
"tOffsetMs": 160,
"acAsrConf": 255
}, {
"utf8": " how's",
"tOffsetMs": 480,
"acAsrConf": 255
However, VTT subtitle format also supports cues for individual words (c.f. Karaoke style text), although not yet supported natively by the Firefox I tested in.
Timing information is available through DeepSpeeche's sttWithMetadata()
and is easily transformed from character timings to word timings using their client.py.
The request is to have an output file that allows us to do it like YT: Display individual words as they are spoken.
Hi, thanks for the great project!
I have a problem with some words that are missing in the transcript.
But if I just do transcribe the same audio using only deepspeech
project (not autosub with ds engine), there are no missing words.
Are there any tweaks that can be done by parameter? or is it because silent segment removal process?
Here is the txt output from autosub
with ds engine
biggest .
people make when larry english and probably one of the most common miss.
people think that they.
don't study.
live in.
an out let me explain what i .
one does studying men and how do people usually approach this pro.
and how do people.
And here is the deepspeech
output.
the biggest mistake people make when morning english and probably one of the most common misconceptions is that people think that they need to study english and usedn't study english live english an outlet explain what i mean one does studying men and how do people
As you can see, some words are missing on autosub
output.
I am using same deepspeech
0.9.3 version and model both on autosub
and deepspeech
.
Would it be feasible to create a Dockerfile and put this Docker Hub for more convenient usage?
This would allow users, or even developers to simply pull and start using AutoSub immediately without worrying about runtime, dependencies, and less about configuration.
Any thoughts?
https://github.com/tyiannak/pyAudioAnalysis - says it can detect unkown sounds and has speaker diarization.
Can it use speaker diarization to extract only vocals and ignore other sources of sound?
Also if 2 people are speaking having speaker diarization would be great
Amazing! Although the results are not that accurate.
How can I improve the accuracy? Many thanks.
https://www.youtube.com/watch?v=TOQwUISm6fw
1
00:00:00,10 --> 00:00:00,95
one
2
00:00:01,55 --> 00:00:07,10
this visconti to learn how to go at the vices
3
00:00:08,20 --> 00:00:14,80
who to open that automatic aridius in
4
00:00:14,95 --> 00:00:21,50
and this idea will guide us how to add indicator automatical
5
00:00:21,95 --> 00:00:26,50
and we will go and newton
6
00:00:26,90 --> 00:00:34,0:
and we were used in piraguas moreale indication
7
00:00:34,40 --> 00:00:43,45
starting first of all we need to add one more premature
8
00:00:44,25 --> 00:00:53,80
the primates named yes bury in the morning added coelesti special the period wooing happy
9
00:00:54,15 --> 00:00:59,70
and we will get the primitive party by using dislike
10
00:01:02,60 --> 00:01:07,25
and it the new eenamost cat indicated handle
11
00:01:08,25 --> 00:01:10,80
to get hard
12
00:01:12,25 --> 00:01:15,15
i remember very honour
13
00:01:15,75 --> 00:01:20,0:
if the specific indicator doesn't because
14
00:01:20,75 --> 00:01:30,15
then his deep we create a new one force and then returned the handle of the new indicator
15
00:01:30,50 --> 00:01:31,90
if
16
00:01:32,40 --> 00:01:43,50
the specific indicator has existed then this peril return the handle of the specifically indicate
17
00:01:46,30 --> 00:01:55,50
and handles into idea holes to hide in fine the chatterment
18
00:01:55,70 --> 00:01:59,30
so hand is very important information
19
00:01:59,55 --> 00:02:05,70
we will stop the return better to this very
20
00:02:06,40 --> 00:02:17,80
don't forget to go estimable after wives lorrainese how it works
21
00:02:18,10 --> 00:02:19,40
by ronald
22
00:02:24,20 --> 00:02:26,35
we just need to copy
.............
Love the project, would it be too difficult to add gpu support? Planning on using this on production!
Thanks for the promising program. I really believe in this work, so I will become an active contributor.
The Python code for subtitles ending with the millisecond section containing zeroes is buggy. I have fixed this (see associated commit), but for the sake of completeness here is a description of the issue.
Here's a excerpt of some generated SRT output that can't load in VLC, mpv and other programs
3
00:00:12,95 --> 00:00:14,0:
but you but there
4
00:00:14,60 --> 00:00:15,30
the
And the same excerpt from the VTT output:
00:00:12.95 --> 00:00:14.0: align:start position:0%
but you but there
<c> but</c><0:00:13.330000><c> you</c><0:00:13.450000><c> but</c><0:00:13.670000><c> there</c>
00:00:14.60 --> 00:00:15.30 align:start position:0%
the
<c> the</c>
The root cause is a broken try/except block in the source code which I have fixed.
I should note that users with sed installed can run commands like sed -i 's_: -->_ -->_g' filename.srt
to repair the issue, but my attached fix will prevent the issue occurring in the first place.
docker run --volume=`pwd`input:/input --name autosub autosub --file /input/video.mp4
encounter an error of
Traceback (most recent call last):
File "autosub/main.py", line 8, in
from . import logger
ImportError: cannot import name 'logger'
pip3 install -r requirements.txt
Collecting deepspeech==0.9.3 (from -r requirements.txt (line 3))
ERROR: Could not find a version that satisfies the requirement deepspeech==0.9.3 (from -r requirements.txt (line 3)) (from versions: none)
ERROR: No matching distribution found for deepspeech==0.9.3 (from -r requirements.txt (line 3))
#! /bin/sh
sed -i 's/import logger/from . import logger/' autosub/main.py autosub/utils.py autosub/audioProcessing.py autosub/segmentAudio.py
sed -i 's/from utils import \*/from .utils import */' autosub/main.py
sed -i 's/from writeToFile import write_to_file/from .writeToFile import write_to_file/' autosub/main.py
sed -i 's/from audioProcessing import extract_audio/from .audioProcessing import extract_audio/' autosub/main.py
sed -i 's/from segmentAudio import remove_silent_segments/from .segmentAudio import remove_silent_segments/' autosub/main.py
sed -i 's/import trainAudio as TA/from . import trainAudio as TA/' autosub/segmentAudio.py
sed -i 's/import featureExtraction as FE/from . import featureExtraction as FE/' autosub/segmentAudio.py
error was
ModuleNotFoundError: No module named 'logger'
ModuleNotFoundError: No module named 'utils'
ModuleNotFoundError: No module named 'writeToFile'
ModuleNotFoundError: No module named 'audioProcessing'
ModuleNotFoundError: No module named 'segmentAudio'
ModuleNotFoundError: No module named 'trainAudio'
ModuleNotFoundError: No module named 'featureExtraction'
write result to disk more often, not just once at the end of the process
example code snippet from my srtgen
output_file_path = None
output_file_handle = None
# output goes to stdout and file
def out(*args, **kwargs):
print(*args, **kwargs)
if output_file_handle:
log(f"writing to {output_file_path}")
kwargs["file"] = output_file_handle
print(*args, **kwargs)
output_file_handle.flush()
else:
log("not writing to output_file_path") # this should not happen
def transcribe_file(input_video_path):
"""Transcribe the given video file."""
global output_file_path
global output_file_handle
output_file_path = os.path.join(tempdir, "output_file.srt")
output_file_handle = open(output_file_path, "w")
for ...
out("... result ...")
this could also allow to pause and continue the process
So, long standing issue/non issue due to upstream (tensorflow) decisions when packaging the binaries to require AVX extensions. Anyways, 2 hours later I have recompiled DeepSpeech and have verified it working on my machine.
Buut, even after replacing the binary AutoSub/sub/bin/deepspeech I still get the illegal instruction.
Where exactly can I replace the binary to use the one I compiled that will work on my cpu? Thank you.
pro: flac needs less disk space
con: wav is easier to process
-[INFO] Extracted audio to audio/video-file-name.wav
+[INFO] Extracted audio to audio/video-file-name.flac
to read flac files, we can use pydub
library
The /audio directory is never cleaned up automatically. However, this is temporary data, and should end in a TemporaryDirectory. Failure to clean the dir can accumulate huge amounts of uncompressed audio data as well raise issues when re-running the tool with a file carrying the same name as used before: ffmpeg prompt for override and in the end, this tool tries to feed sound data from previous runs into DeepSpeech and erroring out:
(sub) user@machine:~/git/AutoSub$ python3 autosub/main.py --model ~/deepspeech/de/output_graph.pbmm --scorer ~/deepspeech/de/kenlm.scorer --file ~/deepspeech/de/geteiltes_polen.wav
AutoSub v0.1
TensorFlow: v2.3.0-6-g23ad988
DeepSpeech: v0.9.3-0-gf2e9c85
2021-01-31 15:59:44.865056: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN)to use the following CPU instructions in performance-critical operations: AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
Input file: /home/ajcay/deepspeech/de/geteiltes_polen.wav
Guessed Channel Layout for Input Stream #0.0 : mono
File '/home/ajcay/git/AutoSub/audio/geteiltes_polen.wav' already exists. Overwrite ? [y/N] y
Extracted audio to audio/geteiltes_polen.wav
Splitting on silent parts in audio file
Running inference:
85%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▋ | 82/96 [03:23<00:27, 1.93s/it]Traceback (most recent call last):
File "autosub/main.py", line 180, in <module>
main()
File "autosub/main.py", line 174, in main
ds_process_audio(ds, audio_segment_path, file_handle)
File "autosub/main.py", line 117, in ds_process_audio
write_to_file(file_handle, infered_text, line_count, limits)
File "/home/ajcay/git/AutoSub/autosub/writeToFile.py", line 18, in write_to_file
d = str(datetime.timedelta(seconds=float(limits[0])))
ValueError: could not convert string to float: 'hoerfilm16k'
85%|████████████████████████████████████████
after runing pip3 install -r requirements.txt
sub) 1sm23@liushimeHacmini AutoSub % pip3 install -r requirements.txt
Requirement already satisfied: cycler==0.10.0 in ./sub/lib/python3.8/site-packages (from -r requirements.txt (line 1)) (0.10.0)
Requirement already satisfied: Cython==0.29.21 in ./sub/lib/python3.8/site-packages (from -r requirements.txt (line 2)) (0.29.21)
Collecting numpy==1.18.1
Using cached numpy-1.18.1-cp38-cp38-macosx_10_9_x86_64.whl (15.2 MB)
Requirement already satisfied: deepspeech==0.8.2 in ./sub/lib/python3.8/site-packages (from -r requirements.txt (line 4)) (0.8.2)
Requirement already satisfied: joblib==0.16.0 in ./sub/lib/python3.8/site-packages (from -r requirements.txt (line 5)) (0.16.0)
Requirement already satisfied: kiwisolver==1.2.0 in ./sub/lib/python3.8/site-packages (from -r requirements.txt (line 6)) (1.2.0)
Requirement already satisfied: pydub==0.23.1 in ./sub/lib/python3.8/site-packages (from -r requirements.txt (line 7)) (0.23.1)
Requirement already satisfied: pyparsing==2.4.7 in ./sub/lib/python3.8/site-packages (from -r requirements.txt (line 8)) (2.4.7)
Requirement already satisfied: python-dateutil==2.8.1 in ./sub/lib/python3.8/site-packages (from -r requirements.txt (line 9)) (2.8.1)
Collecting scikit-learn==0.21.3
Using cached scikit-learn-0.21.3.tar.gz (12.2 MB)
Requirement already satisfied: scipy==1.4.1 in ./sub/lib/python3.8/site-packages (from -r requirements.txt (line 11)) (1.4.1)
Requirement already satisfied: six==1.15.0 in ./sub/lib/python3.8/site-packages (from -r requirements.txt (line 12)) (1.15.0)
Collecting tqdm==4.44.1
Using cached tqdm-4.44.1-py2.py3-none-any.whl (60 kB)
Using legacy 'setup.py install' for scikit-learn, since package 'wheel' is not installed.
Installing collected packages: numpy, scikit-learn, tqdm
Attempting uninstall: numpy
Found existing installation: numpy 1.17.3
Uninstalling numpy-1.17.3:
Successfully uninstalled numpy-1.17.3
Running setup.py install for scikit-learn ... error
ERROR: Command errored out with exit status 1:
command: /Users/1sm23/Documents/GitHub.nosync/AutoSub/sub/bin/python3 -u -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/private/var/folders/0c/hhcsy8ws0dv5rbfl9w3kt9940000gp/T/pip-install-xijjs7wh/scikit-learn/setup.py'"'"'; __file__='"'"'/private/var/folders/0c/hhcsy8ws0dv5rbfl9w3kt9940000gp/T/pip-install-xijjs7wh/scikit-learn/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' install --record /private/var/folders/0c/hhcsy8ws0dv5rbfl9w3kt9940000gp/T/pip-record-tolv31lh/install-record.txt --single-version-externally-managed --compile --install-headers /Users/1sm23/Documents/GitHub.nosync/AutoSub/sub/include/site/python3.8/scikit-learn
cwd: /private/var/folders/0c/hhcsy8ws0dv5rbfl9w3kt9940000gp/T/pip-install-xijjs7wh/scikit-learn/
Complete output (51 lines):
Partial import of sklearn during the build process.
C compiler: xcrun -sdk macosx clang -Wno-unused-result -Wsign-compare -Wunreachable-code -fno-common -dynamic -DNDEBUG -g -fwrapv -O3 -Wall -iwithsysroot/System/Library/Frameworks/System.framework/PrivateHeaders -iwithsysroot/Applications/Xcode.app/Contents/Developer/Library/Frameworks/Python3.framework/Versions/3.8/Headers -arch arm64 -arch x86_64
compile options: '-c'
extra options: '-fopenmp'
xcrun: test_openmp.c
clang: error: unsupported option '-fopenmp'
clang: error: unsupported option '-fopenmp'
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "/private/var/folders/0c/hhcsy8ws0dv5rbfl9w3kt9940000gp/T/pip-install-xijjs7wh/scikit-learn/setup.py", line 290, in <module>
setup_package()
File "/private/var/folders/0c/hhcsy8ws0dv5rbfl9w3kt9940000gp/T/pip-install-xijjs7wh/scikit-learn/setup.py", line 286, in setup_package
setup(**metadata)
File "/Users/1sm23/Documents/GitHub.nosync/AutoSub/sub/lib/python3.8/site-packages/numpy/distutils/core.py", line 137, in setup
config = configuration()
File "/private/var/folders/0c/hhcsy8ws0dv5rbfl9w3kt9940000gp/T/pip-install-xijjs7wh/scikit-learn/setup.py", line 174, in configuration
config.add_subpackage('sklearn')
File "/Users/1sm23/Documents/GitHub.nosync/AutoSub/sub/lib/python3.8/site-packages/numpy/distutils/misc_util.py", line 1033, in add_subpackage
config_list = self.get_subpackage(subpackage_name, subpackage_path,
File "/Users/1sm23/Documents/GitHub.nosync/AutoSub/sub/lib/python3.8/site-packages/numpy/distutils/misc_util.py", line 999, in get_subpackage
config = self._get_configuration_from_setup_py(
File "/Users/1sm23/Documents/GitHub.nosync/AutoSub/sub/lib/python3.8/site-packages/numpy/distutils/misc_util.py", line 941, in _get_configuration_from_setup_py
config = setup_module.configuration(*args)
File "sklearn/setup.py", line 76, in configuration
maybe_cythonize_extensions(top_path, config)
File "/private/var/folders/0c/hhcsy8ws0dv5rbfl9w3kt9940000gp/T/pip-install-xijjs7wh/scikit-learn/sklearn/_build_utils/__init__.py", line 42, in maybe_cythonize_extensions
with_openmp = check_openmp_support()
File "/private/var/folders/0c/hhcsy8ws0dv5rbfl9w3kt9940000gp/T/pip-install-xijjs7wh/scikit-learn/sklearn/_build_utils/openmp_helpers.py", line 140, in check_openmp_support
raise CompileError(err_message)
distutils.errors.CompileError:
***
It seems that scikit-learn cannot be built with OpenMP support.
- Make sure you have followed the installation instructions:
https://scikit-learn.org/dev/developers/advanced_installation.html
- If your compiler supports OpenMP but the build still fails, please
submit a bug report at:
https://github.com/scikit-learn/scikit-learn/issues
- If you want to build scikit-learn without OpenMP support, you can set
the environment variable SKLEARN_NO_OPENMP and rerun the build
command. Note however that some estimators will run in sequential
mode and their `n_jobs` parameter will have no effect anymore.
***
----------------------------------------
ERROR: Command errored out with exit status 1: /Users/1sm23/Documents/GitHub.nosync/AutoSub/sub/bin/python3 -u -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/private/var/folders/0c/hhcsy8ws0dv5rbfl9w3kt9940000gp/T/pip-install-xijjs7wh/scikit-learn/setup.py'"'"'; __file__='"'"'/private/var/folders/0c/hhcsy8ws0dv5rbfl9w3kt9940000gp/T/pip-install-xijjs7wh/scikit-learn/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' install --record /private/var/folders/0c/hhcsy8ws0dv5rbfl9w3kt9940000gp/T/pip-record-tolv31lh/install-record.txt --single-version-externally-managed --compile --install-headers /Users/1sm23/Documents/GitHub.nosync/AutoSub/sub/include/site/python3.8/scikit-learn Check the logs for full command output.
How-to example
Make sure the model and scorer files are in the root directory. They are automatically loaded
After following the installation instructions, you can run autosub/main.py as given below. The --file argument is the video file for which SRT file is to be generated
$ python3 autosub/main.py --file ~/movie.mp4
1 i dont understand what is root directory ? it is *****/autosub ?? or somewhere else ?
2 after i download deepspeech and i think i finally can install this program but i got so many error .
its hard to explain but it lookalike this program try to sub your tut document.
what i am worng here.
speech/init.py", line 38, in init
raise RuntimeError("CreateModel failed with '{}' (0x{:X})".format(deepspeech.impl.ErrorCodeToErrorMessage(status),status))
RuntimeError: CreateModel failed with 'Failed to initialize memory mapped model.' (0x3000)
How do install windows?
Hi Dear author:
When I process audio to srt with mandarin Chinese video, the following error occurred:
First time with same video:
Traceback (most recent call last):
File "C:\Python\AutoSub\autosub\main.py", line 139, in
main()
File "C:\Python\AutoSub\autosub\main.py", line 129, in main
ds_process_audio(ds, audio_segment_path, file_handle)
File "C:\Python\AutoSub\autosub\main.py", line 68, in ds_process_audio
write_to_file(file_handle, infered_text, line_count, limits)
File "C:\Python\AutoSub\autosub\writeToFile.py", line 24, in write_to_file
d = str(datetime.timedelta(seconds=float(limits[1])))
IndexError: list index out of range
second time with same video:
Traceback (most recent call last):
File "C:\Python\AutoSub\autosub\main.py", line 139, in
main()
File "C:\Python\AutoSub\autosub\main.py", line 129, in main
ds_process_audio(ds, audio_segment_path, file_handle)
File "C:\Python\AutoSub\autosub\main.py", line 68, in ds_process_audio
write_to_file(file_handle, infered_text, line_count, limits)
File "C:\Python\AutoSub\autosub\writeToFile.py", line 32, in write_to_file
file_handle.write(inferred_text + "\n\n")
UnicodeEncodeError: 'gbk' codec can't encode character '\udce4' in position 18: illegal multibyte sequence
It’s well done with en video, I can get srt.
system: ubuntu 20.04
python: 3.7.1 3.8.8 and 3.9.4
Thank you very much for your reply.
During the installation of requirements I get,
deepspeech 0.9.3 has requirement numpy<=1.17.0,>=1.14.5, but you'll have numpy 1.20.0 which is incompatible.
I installed everything using commit 0d38535a7511d81a126dcd33e4b8e0922585b011
and created a virtualenv project as suggested. When I tried to run autosub/main.py
I got:
Traceback (most recent call last):
File "/Users/user/python virtualenvs/AutoSub/AutoSub/autosub/main.py", line 8, in <module>
from . import logger
ImportError: attempted relative import with no known parent package
I changed from . import logger
to import logger
and got:
Traceback (most recent call last):
File "/Users/user/python virtualenvs/AutoSub/AutoSub/autosub/main.py", line 11, in
import numpy as np
ModuleNotFoundError: No module named 'numpy'
I deactivated the virtualenv project and tried to install numpy
––python3 -m pip install --user numpy
––and got:
Requirement already satisfied: numpy in /usr/local/lib/python3.10/site-packages (1.23.2)
I verified that numpy
is in that directory.
At this point there is no point in continuing. I see there are instructions for building and running autosub
using Docker
. This is not an option for me so how can I procede in OS X?
Is it impossible to recognize in another language?
I hope the caption file comes out in Japanese.
I put a video of the conversation in Japanese and it came out in English, is there a way to modify it?
or.. is there any japaense model files?
Hi my friend. I'm getting the following error when installing. I think you're missing Cython package in your requirements.
ERROR: Command errored out with exit status 1:
command: /media/segundohd/app_repo/autosub_deepspeech_mozilla/AutoSub/sub/bin/python3 -u -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-install-91ax7yu0/scikit-learn/setup.py'"'"'; __file__='"'"'/tmp/pip-install-91ax7yu0/scikit-learn/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' install --record /tmp/pip-record-jddk896v/install-record.txt --single-version-externally-managed --compile --install-headers /media/segundohd/app_repo/autosub_deepspeech_mozilla/AutoSub/sub/include/site/python3.8/scikit-learn
cwd: /tmp/pip-install-91ax7yu0/scikit-learn/
Complete output (28 lines):
Partial import of sklearn during the build process.
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "/tmp/pip-install-91ax7yu0/scikit-learn/setup.py", line 290, in <module>
setup_package()
File "/tmp/pip-install-91ax7yu0/scikit-learn/setup.py", line 286, in setup_package
setup(**metadata)
File "/media/segundohd/app_repo/autosub_deepspeech_mozilla/AutoSub/sub/lib/python3.8/site-packages/numpy/distutils/core.py", line 137, in setup
config = configuration()
File "/tmp/pip-install-91ax7yu0/scikit-learn/setup.py", line 174, in configuration
config.add_subpackage('sklearn')
File "/media/segundohd/app_repo/autosub_deepspeech_mozilla/AutoSub/sub/lib/python3.8/site-packages/numpy/distutils/misc_util.py", line 1033, in add_subpackage
config_list = self.get_subpackage(subpackage_name, subpackage_path,
File "/media/segundohd/app_repo/autosub_deepspeech_mozilla/AutoSub/sub/lib/python3.8/site-packages/numpy/distutils/misc_util.py", line 999, in get_subpackage
config = self._get_configuration_from_setup_py(
File "/media/segundohd/app_repo/autosub_deepspeech_mozilla/AutoSub/sub/lib/python3.8/site-packages/numpy/distutils/misc_util.py", line 941, in _get_configuration_from_setup_py
config = setup_module.configuration(*args)
File "sklearn/setup.py", line 62, in configuration
config.add_subpackage('utils')
File "/media/segundohd/app_repo/autosub_deepspeech_mozilla/AutoSub/sub/lib/python3.8/site-packages/numpy/distutils/misc_util.py", line 1033, in add_subpackage
config_list = self.get_subpackage(subpackage_name, subpackage_path,
File "/media/segundohd/app_repo/autosub_deepspeech_mozilla/AutoSub/sub/lib/python3.8/site-packages/numpy/distutils/misc_util.py", line 999, in get_subpackage
config = self._get_configuration_from_setup_py(
File "/media/segundohd/app_repo/autosub_deepspeech_mozilla/AutoSub/sub/lib/python3.8/site-packages/numpy/distutils/misc_util.py", line 941, in _get_configuration_from_setup_py
config = setup_module.configuration(*args)
File "sklearn/utils/setup.py", line 8, in configuration
from Cython import Tempita
ModuleNotFoundError: No module named 'Cython'
----------------------------------------
ERROR: Command errored out with exit status 1: /media/segundohd/app_repo/autosub_deepspeech_mozilla/AutoSub/sub/bin/python3 -u -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-install-91ax7yu0/scikit-learn/setup.py'"'"'; __file__='"'"'/tmp/pip-install-91ax7yu0/scikit-learn/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' install --record /tmp/pip-record-jddk896v/install-record.txt --single-version-externally-managed --compile --install-headers /media/segundohd/app_repo/autosub_deepspeech_mozilla/AutoSub/sub/include/site/python3.8/scikit-learn Check the logs for full command output.
After the mozilla layoffs, the deepspeech team forked the Deepspeech repo and founded the company Coqui AI (https://github.com/coqui-ai/STT) where they continue the development and AFAIK they now only allow .tflite files to export models. It theoretically should work with the old code, but for me it didn't.
When I try to run it like this:
python3 autosub/main.py --file /Users/sgrotz/Downloads/kp193-hejma-auxtomatigo.mp3 --split-duration 8
with a .tflite file in the main folder and NO language model.
Then I get:
AutoSub
['autosub/main.py', '--file', '/Users/sgrotz/Downloads/kp193-hejma-auxtomatigo.mp3', '--split-duration', '8']
ARGS: Namespace(dry_run=False, file='/Users/sgrotz/Downloads/kp193-hejma-auxtomatigo.mp3', format=['srt', 'vtt', 'txt'], model=None, scorer=None, split_duration=8.0)
Warning no models specified via --model and none found in local directory. Please run getmodel.sh convenience script from autosub repo to get some.
Error: Must have pbmm model. Exiting
Have I done anything wrong here or doesn't AutoSub support .rflite files?
I tested it on MacOS and installed ffmpeg via homebrew.
Any thoughts?
https://github.com/tyiannak/pyAudioAnalysis - says it can detect unkown sounds and has speaker diarization.
Can it use speaker diarization to extract only vocals and ignore other sources of sound?
Also if 2 people are speaking having speaker diarization would be great
Currently it's easily possible to receive cues with 340 characters or more. Amara.org suggests a maximum of 42 per line.
DeepSpeech provides timing data for each individual character, therefore the word duration and start can be calculated (see client.py) and a clean split is possible (possibly breaking the sentence but still better than 340 chars and better than having to support different grammars.)
Here is what their sample script already provides:
{
"transcripts": [
{
"confidence": -44.99164581298828,
"words": [
{
"word": "ja",
"start_time": 0.36,
"duration": 0.3
},
{
"word": "meine",
"start_time": 0.7,
"duration": 0.22
},
{
"word": "sehr",
"start_time": 0.96,
"duration": 0.14
},
{
"word": "verehrten",
"start_time": 1.12,
"duration": 0.32
},
{
"word": "damen",
"start_time": 1.48,
"duration": 0.22
},
{
"word": "und",
"start_time": 1.74,
"duration": 0.12
},
{
"word": "herren",
"start_time": 1.9,
"duration": 0.5
},
{
"word": "liebe",
"start_time": 2.48,
"duration": 0.32
},
{
"word": "frau",
"start_time": 2.86,
"duration": 0.48
},
{
"word": "versorgen",
"start_time": 3.38,
"duration": 0.42
}
]
},
{
"confidence": -45.583431243896484,
"words": [
{
"word": "ja",
"start_time": 0.36,
"duration": 0.3
},
The idea is to find a good point for splitting, which can be a weighted decision: the closer we come to the desired splitting point, the less time between words is required for a split, if a split is required/ desired.
Ideally this project would be developed under a GitHub organization (but it's definitely not required).
The Autosub organization (https://github.com/autosub) namespace is already taken, and as you may know there's an unrelated (abandoned) project called "autosub" that uses the same name: https://github.com/agermanidis/autosub
Ideally you'd get that organization namespace but I don't think it's possible given there's no contact information.
How about renaming this project "AutoSubs" (or "AutoSubtitler") and registering https://github.com/AutoSubs (or https://github.com/AutoSubtitler)? I prefer "AutoSubs". Another option would be to keep the name "AutoSub" but register a different organization.
I think the GitHub organization domain would help with the growth of the project. But again, the change is definitely not required.
(Feel free to close this issue as WONTFIX, if you'd like)
$ docker build -t autosub .
Sending build context to Docker daemon 113.2kB
Step 1/13 : ARG BASEIMAGE=ubuntu:18.04
Step 2/13 : FROM ${BASEIMAGE}
---> b67d6ac264e4
Step 3/13 : ARG DEPSLIST=requirements.txt
---> Using cache
---> 0ae6d0d02403
Step 4/13 : ENV PYTHONUNBUFFERED 1
---> Using cache
---> 63c984eb9ae5
Step 5/13 : RUN DEBIAN_FRONTEND=noninteractive apt update && apt -y install ffmpeg libsm6 libxext6 python3 python3-pip && apt -y clean && rm -rf /var/lib/apt/lists/*
---> Using cache
---> 7e2214cd96b4
Step 6/13 : COPY $DEPSLIST ./requirements.txt
---> Using cache
---> 3f437c1a2f3c
Step 7/13 : RUN pip3 install --no-cache-dir -r requirements.txt
---> Running in 37765d138851
Collecting cycler==0.10.0 (from -r requirements.txt (line 1))
Downloading https://files.pythonhosted.org/packages/f7/d2/e07d3ebb2bd7af696440ce7e754c59dd546ffe1bbe732c8ab68b9c834e61/cycler-0.10.0-py2.py3-none-any.whl
Collecting numpy (from -r requirements.txt (line 2))
Downloading https://files.pythonhosted.org/packages/45/b2/6c7545bb7a38754d63048c7696804a0d947328125d81bf12beaa692c3ae3/numpy-1.19.5-cp36-cp36m-manylinux1_x86_64.whl (13.4MB)
Collecting stt==1.0.0 (from -r requirements.txt (line 3))
Could not find a version that satisfies the requirement stt==1.0.0 (from -r requirements.txt (line 3)) (from versions: 0.10.0a5, 0.10.0a6, 0.10.0a8, 0.10.0a9, 0.10.0a10)
No matching distribution found for stt==1.0.0 (from -r requirements.txt (line 3))
The command '/bin/sh -c pip3 install --no-cache-dir -r requirements.txt' returned a non-zero code: 1```
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.