antiboredom / audiogrep Goto Github PK
View Code? Open in Web Editor NEWCreates audio supercuts.
Home Page: http://antiboredom.github.io/audiogrep
License: MIT License
Creates audio supercuts.
Home Page: http://antiboredom.github.io/audiogrep
License: MIT License
I have thousands of 16k mono wav files, which I think is what audiogrep wants to have in order to perform transcription. However, when I point audiogrep at these files, it converts them again. It seems like the convert_to_wav function does not perform any checks on the file type, but just checks for the presence of specific file names.
I can understand that it can be hard to detect some media formats with a high degree of precision, but I did not think wav was one that was hard to detect. If I'm wrong, then I wonder if maybe a '--skip-conversion' flag or similar could be added so the user can tell audiogrep 'Hey, these files are already in your expected format -- please don't convert these files to the same format they've already been converted to'.
If this feature would be welcome but nobody has time to do it, let me know. If I've missed something that makes this unnecessary, let me know how I can stop conversion from happening!
Thanks.
$ audiogrep
Traceback (most recent call last):
File "/usr/bin/audiogrep", line 6, in
audiogrep.main()
AttributeError: module 'audiogrep' has no attribute 'main'
On Mac OS X Yosemite, homebrew python version 2.7.9 ——
python audiogrep.py --input /Users/harris/Documents/Books/The\ War\ of\ Art\ -\ Steven\ Pressfield/02\ -\ Book\ One\ -\ Resistance\,\ Defining\ the\ Enemy/22\ -\ Resistance\ and\ Sex.mp3 --transcribe
ffmpeg version 2.5.4 Copyright (c) 2000-2015 the FFmpeg developers
built on Feb 16 2015 16:34:55 with Apple LLVM version 6.0 (clang-600.0.56) (based on LLVM 3.5svn)
configuration: --prefix=/usr/local/Cellar/ffmpeg/2.5.4 --enable-shared --enable-pthreads --enable-gpl --enable-version3 --enable-hardcoded-tables --enable-avresample --cc=clang --host-cflags= --host-ldflags= --enable-libx264 --enable-libmp3lame --enable-libvo-aacenc --enable-libxvid --enable-libvorbis --enable-libvpx --enable-vda
libavutil 54. 15.100 / 54. 15.100
libavcodec 56. 13.100 / 56. 13.100
libavformat 56. 15.102 / 56. 15.102
libavdevice 56. 3.100 / 56. 3.100
libavfilter 5. 2.103 / 5. 2.103
libavresample 2. 1. 0 / 2. 1. 0
libswscale 3. 1.101 / 3. 1.101
libswresample 1. 1.100 / 1. 1.100
libpostproc 53. 3.100 / 53. 3.100
[mp3 @ 0x7ff26200da00] Estimating duration from bitrate, this may be inaccurate
Input #0, mp3, from '/Users/harris/Documents/Books/The War of Art - Steven Pressfield/02 - Book One - Resistance, Defining the Enemy/22 - Resistance and Sex.mp3':
Metadata:
album : The War of Art
genre : Other
album_artist : Steven Pressfield
track : 22
artist : Steven Pressfield
Duration: 00:01:05.78, start: 0.000000, bitrate: 201 kb/s
Stream #0:0: Audio: mp3, 44100 Hz, stereo, s16p, 192 kb/s
Stream #0:1: Video: mjpeg, yuvj444p(pc, bt470bg/unknown/unknown), 680x680, 90k tbr, 90k tbn, 90k tbc
Metadata:
title :
comment : Other
Output #0, wav, to '/Users/harris/Documents/Books/The War of Art - Steven Pressfield/02 - Book One - Resistance, Defining the Enemy/22 - Resistance and Sex.mp3.temp.wav':
Metadata:
IPRD : The War of Art
IGNR : Other
album_artist : Steven Pressfield
IPRT : 22
IART : Steven Pressfield
ISFT : Lavf56.15.102
Stream #0:0: Audio: pcm_s16le ([1][0][0][0] / 0x0001), 16000 Hz, mono, s16, 256 kb/s
Metadata:
encoder : Lavc56.13.100 pcm_s16le
Stream mapping:
Stream #0:0 -> #0:0 (mp3 (native) -> pcm_s16le (native))
Press [q] to stop, [?] for help
size= 2056kB time=00:01:05.77 bitrate= 256.0kbits/s
video:0kB audio:2056kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: 0.007316%
1/1 Transcribing /Users/harris/Documents/Books/The War of Art - Steven Pressfield/02 - Book One - Resistance, Defining the Enemy/22 - Resistance and Sex.mp3.temp.wav
Traceback (most recent call last):
File "audiogrep.py", line 208, in <module>
transcribe(files)
File "audiogrep.py", line 35, in transcribe
transcript = subprocess.check_output(['pocketsphinx_continuous', '-infile', f, '-time', 'yes', '-logfn', '/dev/null', '-vad_prespeech', str(pre), '-vad_postspeech', str(post)])
File "/usr/local/Cellar/python/2.7.9/Frameworks/Python.framework/Versions/2.7/lib/python2.7/subprocess.py", line 566, in check_output
process = Popen(stdout=PIPE, *popenargs, **kwargs)
File "/usr/local/Cellar/python/2.7.9/Frameworks/Python.framework/Versions/2.7/lib/python2.7/subprocess.py", line 710, in __init__
errread, errwrite)
File "/usr/local/Cellar/python/2.7.9/Frameworks/Python.framework/Versions/2.7/lib/python2.7/subprocess.py", line 1335, in _execute_child
raise child_exception
OSError: [Errno 2] No such file or directory
This happens with any .mp3 file I attempt to transcribe.
The Windows version of Pocketsphinx requires to specify the model files manually for whatever reason. These additional parameters make it work
'-hmm', 'model/en-us/en-us', '-lm', 'model/en-us/en-us.lm.dmp', '-dict', 'model/en-us/cmudict-en-us.dict'
I suppose that at some point, you'd want the user to be able to select different recognition models anyway?
I get the following message instantly
root@Pocketsphinx28:~/audiogrep# audiogrep --input dnc-2004-speech.mp3 --search 'freedom'
No results for "freedom"
Does this currently support transcribing using multiple CPUs? If not, any plans to?
Awesome app!
If you install audiogrep using pip, then it doesn't give you the --extract functionality. The package on pip must not be up to date.
We used audiogrep in a project to extract transcript. We noticed that some of the transcripts contained timestamp that go backwards. Is there a reason why this would happen? Can we do anything in the config/setting to avoid this?
i 1627.530 1628.000 0.453576
ain't 1628.010 1628.300 0.023120
<sil> 1628.310 1628.710 0.988861
[SPEECH] 1628.720 1629.210 0.952747
</s> 1629.220 1629.670 1.000000
<s> 1617.430 1617.450 0.995510
it's 1617.460 1617.770 0.366955
your(2) 1617.780 1617.930 0.049122
Thank you.
Any chance there are other known ways around this error besides uninstalling and reinstalling cmu-*sphinx* stuff? I'm on MacOS 10.15 for what it's worth.
» ./audiogrep.py --input ./Romeo_and_Juliet_Act_1_64kb.mp3 --transcribe seth@localhost
ffmpeg version 2.5.4-1 Copyright (c) 2000-2015 the FFmpeg developers
built with gcc 4.9.2 (Debian 4.9.2-10)
configuration: --prefix=/usr --extra-version=1 --build-suffix=-ffmpeg --toolchain=hardened --libdir=/usr/lib/x86_64-linux-gnu --shlibdir=/usr/lib/x86_64-linux-gnu --incdir=/usr/include/x86_64-linux-gnu --enable-gpl --enable-shared --disable-stripping --enable-avresample --enable-avisynth --enable-ladspa --enable-libass --enable-libbluray --enable-libbs2b --enable-libcaca --enable-libcdio --enable-libflite --enable-libfontconfig --enable-libfreetype --enable-libfribidi --enable-libgme --enable-libgsm --enable-libmodplug --enable-libmp3lame --enable-libopenjpeg --enable-libopus --enable-libpulse --enable-libschroedinger --enable-libshine --enable-libspeex --enable-libssh --enable-libtheora --enable-libtwolame --enable-libvorbis --enable-libwavpack --enable-libwebp --enable-libxvid --enable-opengl --enable-x11grab --enable-libdc1394 --enable-libiec61883 --enable-libzvbi --enable-libzmq --enable-frei0r --enable-libvpx --enable-libx264 --enable-libsoxr --enable-gnutls --enable-openal --enable-libopencv --enable-librtmp --enable-libx265
libavutil 54. 15.100 / 54. 15.100
libavcodec 56. 13.100 / 56. 13.100
libavformat 56. 15.102 / 56. 15.102
libavdevice 56. 3.100 / 56. 3.100
libavfilter 5. 2.103 / 5. 2.103
libavresample 2. 1. 0 / 2. 1. 0
libswscale 3. 1.101 / 3. 1.101
libswresample 1. 1.100 / 1. 1.100
libpostproc 53. 3.100 / 53. 3.100
Input #0, mp3, from './Romeo_and_Juliet_Act_1_64kb.mp3':
Metadata:
title : Act 1
artist : William Shakespeare
album : Romeo and Juliet
track : 1
Duration: 00:46:15.25, start: 0.050113, bitrate: 64 kb/s
Stream #0:0: Audio: mp3, 22050 Hz, mono, s16p, 64 kb/s
Metadata:
encoder : LAME3.96r
Side data:
replaygain: track gain - 7.300000, track peak - unknown, album gain - unknown, album peak - unknown,
File './Romeo_and_Juliet_Act_1_64kb.mp3.temp.wav' already exists. Overwrite ? [y/N] n
Not overwriting - exiting
1/1 Transcribing ./Romeo_and_Juliet_Act_1_64kb.mp3.temp.wav
Traceback (most recent call last):
File "./audiogrep.py", line 217, in <module>
transcribe(files)
File "./audiogrep.py", line 37, in transcribe
transcript = subprocess.check_output(['pocketsphinx_continuous', '-infile', f, '-time', 'yes', '-logfn', '/dev/null', '-vad_prespeech', str(pre), '-vad_postspeech', str(post)])
File "/usr/lib/python2.7/subprocess.py", line 573, in check_output
raise CalledProcessError(retcode, cmd, output=output)
subprocess.CalledProcessError: Command '['pocketsphinx_continuous', '-infile', './Romeo_and_Juliet_Act_1_64kb.mp3.temp.wav', '-time', 'yes', '-logfn', '/dev/null', '-vad_prespeech', '10', '-vad_postspeech', '50']' returned non-zero exit status 255
I'm less clear what the error is since it is happening on the far side of the subprocess call.
The script should instruct the user to install pocketsphinx if it isn't installed rather than error out.
transcription.txt for audio file seems ok. Where should I start...?
Logs:
27/02/15 17:08:05,630 apsd[1298]: Unrecognized leaf certificate
27/02/15 17:08:18,832 Console[75959]: setPresentationOptions called with NSApplicationPresentationFullScreen when there is no visible fullscreen window; this call will be ignored.
27/02/15 17:08:38,519 Finder[1299]: FIXME: IOUnserialize has detected a string that is not valid UTF-8, "��{�".
27/02/15 17:08:38,538 Finder[1299]: FIXME: IOUnserialize has detected a string that is not valid UTF-8, "��{�".
27/02/15 17:08:40,214 Finder[1299]: FIXME: IOUnserialize has detected a string that is not valid UTF-8, "��{�".
27/02/15 17:08:41,561 Finder[1299]: FIXME: IOUnserialize has detected a string that is not valid UTF-8, "��{�".
./audiogrep.py --input ./Romeo_and_Juliet_Act_1_64kb.mp3 --transcribe
import: unable to grab mouse `': Resource temporarily unavailable @ error/xwindow.c/XSelectWindow/9199.
from: can't read /var/mail/pydub
./audiogrep.py: 17: ./audiogrep.py: Syntax error: "(" unexpected
According to htop, the command is hanging on import sys
. In the audiogrep directory, I have a 23mb file called sys. While running --transcribe
, I get a different cursor, and something is trying to grab my mouse input.
I am running debian 8.0 with ffmpeg
and pocketsphinx
installed.
This error seems super weird, so I'm happy to dig deeper into the problem.
Audiogrep transcribes the audio files seemingly correctly and when creating a supercut gives many instances of the searched word but the final result is a supercut.mp3
that is 0 bytes.
Currently running this on Linux
Fresh install, Mac Mojave 10.14.4, Python 2.7
I uninstalled and reinstalled pocketsphinx with the mac instructions and got the same error, continuous test seems to work fine.
audiogrep --input test.mp3 --transcribe
test.mp3.temp.wav
ffmpeg version 4.1.3 Copyright (c) 2000-2019 the FFmpeg developers
built with Apple LLVM version 10.0.1 (clang-1001.0.46.4)
configuration: --prefix=/usr/local/Cellar/ffmpeg/4.1.3_1 --enable-shared --enable-pthreads --enable-version3 --enable-hardcoded-tables --enable-avresample --cc=clang --host-cflags='-I/Library/Java/JavaVirtualMachines/adoptopenjdk-11.0.2.jdk/Contents/Home/include -I/Library/Java/JavaVirtualMachines/adoptopenjdk-11.0.2.jdk/Contents/Home/include/darwin' --host-ldflags= --enable-ffplay --enable-gnutls --enable-gpl --enable-libaom --enable-libbluray --enable-libmp3lame --enable-libopus --enable-librubberband --enable-libsnappy --enable-libtesseract --enable-libtheora --enable-libvorbis --enable-libvpx --enable-libx264 --enable-libx265 --enable-libxvid --enable-lzma --enable-libfontconfig --enable-libfreetype --enable-frei0r --enable-libass --enable-libopencore-amrnb --enable-libopencore-amrwb --enable-libopenjpeg --enable-librtmp --enable-libspeex --enable-videotoolbox --disable-libjack --disable-indev=jack --enable-libaom --enable-libsoxr
libavutil 56. 22.100 / 56. 22.100
libavcodec 58. 35.100 / 58. 35.100
libavformat 58. 20.100 / 58. 20.100
libavdevice 58. 5.100 / 58. 5.100
libavfilter 7. 40.101 / 7. 40.101
libavresample 4. 0. 0 / 4. 0. 0
libswscale 5. 3.100 / 5. 3.100
libswresample 3. 3.100 / 3. 3.100
libpostproc 55. 3.100 / 55. 3.100
Input #0, mp3, from 'test.mp3':
Metadata:
major_brand : mp42
minor_version : 0
compatible_brands: isommp42
encoder : Lavf57.56.101
Duration: 00:05:06.94, start: 0.025057, bitrate: 192 kb/s
Stream #0:0: Audio: mp3, 44100 Hz, stereo, fltp, 192 kb/s
Metadata:
encoder : Lavc57.64
Stream mapping:
Stream #0:0 -> #0:0 (mp3 (mp3float) -> pcm_s16le (native))
Press [q] to stop, [?] for help
Output #0, wav, to 'test.mp3.temp.wav':
Metadata:
major_brand : mp42
minor_version : 0
compatible_brands: isommp42
ISFT : Lavf58.20.100
Stream #0:0: Audio: pcm_s16le ([1][0][0][0] / 0x0001), 16000 Hz, mono, s16, 256 kb/s
Metadata:
encoder : Lavc58.35.100 pcm_s16le
size= 9591kB time=00:05:06.89 bitrate= 256.0kbits/s speed= 456x
video:0kB audio:9591kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: 0.000794%
1/1 Transcribing test.mp3.temp.wav
ERROR: "cmd_ln.c", line 942: Unknown argument: -alignctl
File "/usr/local/bin/audiogrep", line 6, in <module>
audiogrep.main()
File "/Library/Python/2.7/site-packages/audiogrep/audiogrep.py", line 392, in main
transcribe(files)
File "/Library/Python/2.7/site-packages/audiogrep/audiogrep.py", line 43, in transcribe
transcript = subprocess.check_output(['pocketsphinx_continuous', '-infile', f, '-time', 'yes', '-logfn', '/dev/null', '-vad_prespeech', str(pre), '-vad_postspeech', str(post)])
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/subprocess.py", line 567, in check_output
output, unused_err = process.communicate()
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/subprocess.py", line 791, in communicate
stdout = _eintr_retry_call(self.stdout.read)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/subprocess.py", line 476, in _eintr_retry_call
return func(*args)
It seems that when constructing a 'franken' sentence, the resulting audio file is always double the length of the audio audible part.
I'm trying to run audiogrep inside of a virtualenv, that has pocketshphinx installed, but audiogrep still gives the error that it's not installed :(
It's clearly coming from this line: https://github.com/antiboredom/audiogrep/blob/master/audiogrep/audiogrep.py#L387
Which refers to pocketsphinx_continuous, which pip doesn't know about, but amending it to refer to plain pocketsphinx returns the same error. Giving Popen the path of pocketsphinx.py in the virtualenv's site-packages directory gives OSError: [Errno 13] Permission denied
(audiogrep):~/Documents/coding/audiogrep$ pip show audiogrep
---
Name: audiogrep
Version: 0.1.2
Location: /home/Documents/coding/audiogrep/lib/python2.7/site-packages
Requires: pydub
(audiogrep):~/Documents/coding/audiogrep$ pip show pocketsphinx
---
Name: pocketsphinx
Version: 0.1.3
Location: /home/Documents/coding/audiogrep/lib/python2.7/site-packages
Requires:
(audiogrep):~/Documents/coding/audiogrep$ audiogrep --input ../mbmbam01.mp3 --transcribe
Error: Please install pocketsphinx to transcribe files.
If audiogrep would be used together with movies / series which have subtitle-files (.srt), there would be a really big source for word matches. As it is specified where to look for a certain word would make it less processing intensive than scanning the whole audio file.
Example from Better Call Saul Subs:
…
14
00:00:57,369 --> 00:00:58,651
It hurts so bad.
15
00:01:00,166 --> 00:01:02,218
Look at that. Yeah, it's this one.
…
Scanning the file for bad
and then using audiogrep on the range [00:00:57, 00:00:58] should identify the vocal, right?!
I'm trying to package audiogrep for conda-forge, however packaging on Windows fails. I can't quite make sense of that failure, any ideas?
It currently creates them by the order of files, one input file at a time.
hi @antiboredom
i am trying to splice a long audio file in all the words that are said, so i want to spliceit when a silence happens, but haven't still figured it out yet.
what is the command for that?
what are the parameters available?
in return for your help i can do a pull request with documentation for the README of audiogrep.
i will keep on looking for a way to do it with ffmpeg too.
Any way to use a different acoustic model?
Is anyone working on porting this to any Phonetic Search solutions? Translation using speech to text is rudimentary at best on files containing varying patterns of speech. I've been looking into an OSS for phonetic speech translation. I was extremely excited when this project was created as I was using Soundbites until this point for my speech recognition projects. I plan to contribute an api for it if I can find an OSS engine.
not really an issue but..
I am looking to strip the "middle bit" out of some podcasts I listen to. These are usually bookended with a certain tune. Does anyone know of any projects similar to this one that can search based on an audio sample and output a timestamp, or make a supercut?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.