antiboredom / audiogrep Goto Github PK

View Code? Open in Web Editor NEW

965.0 965.0 64.0 39 KB

Creates audio supercuts.

Home Page: http://antiboredom.github.io/audiogrep

License: MIT License

Python 100.00%

audiogrep's People

Contributors

Stargazers

Watchers

audiogrep's Issues

audiogrep converts files to wav that are already wav?

I have thousands of 16k mono wav files, which I think is what audiogrep wants to have in order to perform transcription. However, when I point audiogrep at these files, it converts them again. It seems like the convert_to_wav function does not perform any checks on the file type, but just checks for the presence of specific file names.

I can understand that it can be hard to detect some media formats with a high degree of precision, but I did not think wav was one that was hard to detect. If I'm wrong, then I wonder if maybe a '--skip-conversion' flag or similar could be added so the user can tell audiogrep 'Hey, these files are already in your expected format -- please don't convert these files to the same format they've already been converted to'.

If this feature would be welcome but nobody has time to do it, let me know. If I've missed something that makes this unnecessary, let me know how I can stop conversion from happening!

Thanks.

Not able to run audiogrep in Arch Linux

$ audiogrep
Traceback (most recent call last):
File "/usr/bin/audiogrep", line 6, in
audiogrep.main()
AttributeError: module 'audiogrep' has no attribute 'main'

Transcribe Error

On Mac OS X Yosemite, homebrew python version 2.7.9 ——

python audiogrep.py --input /Users/harris/Documents/Books/The\ War\ of\ Art\ -\ Steven\ Pressfield/02\ -\ Book\ One\ -\ Resistance\,\ Defining\ the\ Enemy/22\ -\ Resistance\ and\ Sex.mp3 --transcribe
ffmpeg version 2.5.4 Copyright (c) 2000-2015 the FFmpeg developers
  built on Feb 16 2015 16:34:55 with Apple LLVM version 6.0 (clang-600.0.56) (based on LLVM 3.5svn)
  configuration: --prefix=/usr/local/Cellar/ffmpeg/2.5.4 --enable-shared --enable-pthreads --enable-gpl --enable-version3 --enable-hardcoded-tables --enable-avresample --cc=clang --host-cflags= --host-ldflags= --enable-libx264 --enable-libmp3lame --enable-libvo-aacenc --enable-libxvid --enable-libvorbis --enable-libvpx --enable-vda
  libavutil      54. 15.100 / 54. 15.100
  libavcodec     56. 13.100 / 56. 13.100
  libavformat    56. 15.102 / 56. 15.102
  libavdevice    56.  3.100 / 56.  3.100
  libavfilter     5.  2.103 /  5.  2.103
  libavresample   2.  1.  0 /  2.  1.  0
  libswscale      3.  1.101 /  3.  1.101
  libswresample   1.  1.100 /  1.  1.100
  libpostproc    53.  3.100 / 53.  3.100
[mp3 @ 0x7ff26200da00] Estimating duration from bitrate, this may be inaccurate
Input #0, mp3, from '/Users/harris/Documents/Books/The War of Art - Steven Pressfield/02 - Book One - Resistance, Defining the Enemy/22 - Resistance and Sex.mp3':
  Metadata:
    album           : The War of Art
    genre           : Other
    album_artist    : Steven Pressfield
    track           : 22
    artist          : Steven Pressfield
  Duration: 00:01:05.78, start: 0.000000, bitrate: 201 kb/s
    Stream #0:0: Audio: mp3, 44100 Hz, stereo, s16p, 192 kb/s
    Stream #0:1: Video: mjpeg, yuvj444p(pc, bt470bg/unknown/unknown), 680x680, 90k tbr, 90k tbn, 90k tbc
    Metadata:
      title           :
      comment         : Other
Output #0, wav, to '/Users/harris/Documents/Books/The War of Art - Steven Pressfield/02 - Book One - Resistance, Defining the Enemy/22 - Resistance and Sex.mp3.temp.wav':
  Metadata:
    IPRD            : The War of Art
    IGNR            : Other
    album_artist    : Steven Pressfield
    IPRT            : 22
    IART            : Steven Pressfield
    ISFT            : Lavf56.15.102
    Stream #0:0: Audio: pcm_s16le ([1][0][0][0] / 0x0001), 16000 Hz, mono, s16, 256 kb/s
    Metadata:
      encoder         : Lavc56.13.100 pcm_s16le
Stream mapping:
  Stream #0:0 -> #0:0 (mp3 (native) -> pcm_s16le (native))
Press [q] to stop, [?] for help
size=    2056kB time=00:01:05.77 bitrate= 256.0kbits/s
video:0kB audio:2056kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: 0.007316%
1/1 Transcribing /Users/harris/Documents/Books/The War of Art - Steven Pressfield/02 - Book One - Resistance, Defining the Enemy/22 - Resistance and Sex.mp3.temp.wav
Traceback (most recent call last):
  File "audiogrep.py", line 208, in <module>
    transcribe(files)
  File "audiogrep.py", line 35, in transcribe
    transcript = subprocess.check_output(['pocketsphinx_continuous', '-infile', f, '-time', 'yes', '-logfn', '/dev/null', '-vad_prespeech', str(pre), '-vad_postspeech', str(post)])
  File "/usr/local/Cellar/python/2.7.9/Frameworks/Python.framework/Versions/2.7/lib/python2.7/subprocess.py", line 566, in check_output
    process = Popen(stdout=PIPE, *popenargs, **kwargs)
  File "/usr/local/Cellar/python/2.7.9/Frameworks/Python.framework/Versions/2.7/lib/python2.7/subprocess.py", line 710, in __init__
    errread, errwrite)
  File "/usr/local/Cellar/python/2.7.9/Frameworks/Python.framework/Versions/2.7/lib/python2.7/subprocess.py", line 1335, in _execute_child
    raise child_exception
OSError: [Errno 2] No such file or directory

This happens with any .mp3 file I attempt to transcribe.

Windows support

The Windows version of Pocketsphinx requires to specify the model files manually for whatever reason. These additional parameters make it work

'-hmm', 'model/en-us/en-us', '-lm', 'model/en-us/en-us.lm.dmp', '-dict', 'model/en-us/cmudict-en-us.dict'

I suppose that at some point, you'd want the user to be able to select different recognition models anyway?

Search option appears to not be working

I get the following message instantly

 root@Pocketsphinx28:~/audiogrep# audiogrep --input dnc-2004-speech.mp3 --search 'freedom'
 No results for "freedom"

Support for multiple CPUs?

Does this currently support transcribing using multiple CPUs? If not, any plans to?

Awesome app!

Python 3 support

--extract option not present in pip package

If you install audiogrep using pip, then it doesn't give you the --extract functionality. The package on pip must not be up to date.

backward time stamp

We used audiogrep in a project to extract transcript. We noticed that some of the transcripts contained timestamp that go backwards. Is there a reason why this would happen? Can we do anything in the config/setting to avoid this?

i 1627.530 1628.000 0.453576
ain't 1628.010 1628.300 0.023120
<sil> 1628.310 1628.710 0.988861
[SPEECH] 1628.720 1629.210 0.952747
</s> 1629.220 1629.670 1.000000
<s> 1617.430 1617.450 0.995510
it's 1617.460 1617.770 0.366955
your(2) 1617.780 1617.930 0.049122

Thank you.

"ERROR: "cmd_ln.c", line 942: Unknown argument: -alignctl"

Any chance there are other known ways around this error besides uninstalling and reinstalling cmu-*sphinx* stuff? I'm on MacOS 10.15 for what it's worth.

subprocess.CalledProcessError ... returned non-zero exit status 255

» ./audiogrep.py --input ./Romeo_and_Juliet_Act_1_64kb.mp3 --transcribe                                                                                                                 seth@localhost
ffmpeg version 2.5.4-1 Copyright (c) 2000-2015 the FFmpeg developers
  built with gcc 4.9.2 (Debian 4.9.2-10)
  configuration: --prefix=/usr --extra-version=1 --build-suffix=-ffmpeg --toolchain=hardened --libdir=/usr/lib/x86_64-linux-gnu --shlibdir=/usr/lib/x86_64-linux-gnu --incdir=/usr/include/x86_64-linux-gnu --enable-gpl --enable-shared --disable-stripping --enable-avresample --enable-avisynth --enable-ladspa --enable-libass --enable-libbluray --enable-libbs2b --enable-libcaca --enable-libcdio --enable-libflite --enable-libfontconfig --enable-libfreetype --enable-libfribidi --enable-libgme --enable-libgsm --enable-libmodplug --enable-libmp3lame --enable-libopenjpeg --enable-libopus --enable-libpulse --enable-libschroedinger --enable-libshine --enable-libspeex --enable-libssh --enable-libtheora --enable-libtwolame --enable-libvorbis --enable-libwavpack --enable-libwebp --enable-libxvid --enable-opengl --enable-x11grab --enable-libdc1394 --enable-libiec61883 --enable-libzvbi --enable-libzmq --enable-frei0r --enable-libvpx --enable-libx264 --enable-libsoxr --enable-gnutls --enable-openal --enable-libopencv --enable-librtmp --enable-libx265
  libavutil      54. 15.100 / 54. 15.100
  libavcodec     56. 13.100 / 56. 13.100
  libavformat    56. 15.102 / 56. 15.102
  libavdevice    56.  3.100 / 56.  3.100
  libavfilter     5.  2.103 /  5.  2.103
  libavresample   2.  1.  0 /  2.  1.  0
  libswscale      3.  1.101 /  3.  1.101
  libswresample   1.  1.100 /  1.  1.100
  libpostproc    53.  3.100 / 53.  3.100
Input #0, mp3, from './Romeo_and_Juliet_Act_1_64kb.mp3':
  Metadata:
    title           : Act 1
    artist          : William Shakespeare
    album           : Romeo and Juliet
    track           : 1
  Duration: 00:46:15.25, start: 0.050113, bitrate: 64 kb/s
    Stream #0:0: Audio: mp3, 22050 Hz, mono, s16p, 64 kb/s
    Metadata:
      encoder         : LAME3.96r
    Side data:
      replaygain: track gain - 7.300000, track peak - unknown, album gain - unknown, album peak - unknown, 
File './Romeo_and_Juliet_Act_1_64kb.mp3.temp.wav' already exists. Overwrite ? [y/N] n
Not overwriting - exiting
1/1 Transcribing ./Romeo_and_Juliet_Act_1_64kb.mp3.temp.wav
Traceback (most recent call last):
  File "./audiogrep.py", line 217, in <module>
    transcribe(files)
  File "./audiogrep.py", line 37, in transcribe
    transcript = subprocess.check_output(['pocketsphinx_continuous', '-infile', f, '-time', 'yes', '-logfn', '/dev/null', '-vad_prespeech', str(pre), '-vad_postspeech', str(post)])
  File "/usr/lib/python2.7/subprocess.py", line 573, in check_output
    raise CalledProcessError(retcode, cmd, output=output)
subprocess.CalledProcessError: Command '['pocketsphinx_continuous', '-infile', './Romeo_and_Juliet_Act_1_64kb.mp3.temp.wav', '-time', 'yes', '-logfn', '/dev/null', '-vad_prespeech', '10', '-vad_postspeech', '50']' returned non-zero exit status 255

I'm less clear what the error is since it is happening on the far side of the subprocess call.

Notify user if pocketsphinx isn't installed

The script should instruct the user to install pocketsphinx if it isn't installed rather than error out.

I get 261 bytes long supercut.mp3 which is empty

transcription.txt for audio file seems ok. Where should I start...?

Logs:

27/02/15 17:08:05,630 apsd[1298]: Unrecognized leaf certificate
27/02/15 17:08:18,832 Console[75959]: setPresentationOptions called with NSApplicationPresentationFullScreen when there is no visible fullscreen window; this call will be ignored.
27/02/15 17:08:38,519 Finder[1299]: FIXME: IOUnserialize has detected a string that is not valid UTF-8, "��{�".
27/02/15 17:08:38,538 Finder[1299]: FIXME: IOUnserialize has detected a string that is not valid UTF-8, "��{�".
27/02/15 17:08:40,214 Finder[1299]: FIXME: IOUnserialize has detected a string that is not valid UTF-8, "��{�".
27/02/15 17:08:41,561 Finder[1299]: FIXME: IOUnserialize has detected a string that is not valid UTF-8, "��{�".

Syntax error: "(" unexpected

./audiogrep.py --input ./Romeo_and_Juliet_Act_1_64kb.mp3 --transcribe
import: unable to grab mouse `': Resource temporarily unavailable @ error/xwindow.c/XSelectWindow/9199.
from: can't read /var/mail/pydub
./audiogrep.py: 17: ./audiogrep.py: Syntax error: "(" unexpected

According to htop, the command is hanging on import sys. In the audiogrep directory, I have a 23mb file called sys. While running --transcribe, I get a different cursor, and something is trying to grab my mouse input.

I am running debian 8.0 with ffmpeg and pocketsphinx installed.

This error seems super weird, so I'm happy to dig deeper into the problem.

Output supercut files are 0 bytes

Audiogrep transcribes the audio files seemingly correctly and when creating a supercut gives many instances of the searched word but the final result is a supercut.mp3 that is 0 bytes.
Currently running this on Linux

--Transcribe fails

Fresh install, Mac Mojave 10.14.4, Python 2.7
I uninstalled and reinstalled pocketsphinx with the mac instructions and got the same error, continuous test seems to work fine.

audiogrep --input test.mp3 --transcribe


test.mp3.temp.wav
ffmpeg version 4.1.3 Copyright (c) 2000-2019 the FFmpeg developers
  built with Apple LLVM version 10.0.1 (clang-1001.0.46.4)
  configuration: --prefix=/usr/local/Cellar/ffmpeg/4.1.3_1 --enable-shared --enable-pthreads --enable-version3 --enable-hardcoded-tables --enable-avresample --cc=clang --host-cflags='-I/Library/Java/JavaVirtualMachines/adoptopenjdk-11.0.2.jdk/Contents/Home/include -I/Library/Java/JavaVirtualMachines/adoptopenjdk-11.0.2.jdk/Contents/Home/include/darwin' --host-ldflags= --enable-ffplay --enable-gnutls --enable-gpl --enable-libaom --enable-libbluray --enable-libmp3lame --enable-libopus --enable-librubberband --enable-libsnappy --enable-libtesseract --enable-libtheora --enable-libvorbis --enable-libvpx --enable-libx264 --enable-libx265 --enable-libxvid --enable-lzma --enable-libfontconfig --enable-libfreetype --enable-frei0r --enable-libass --enable-libopencore-amrnb --enable-libopencore-amrwb --enable-libopenjpeg --enable-librtmp --enable-libspeex --enable-videotoolbox --disable-libjack --disable-indev=jack --enable-libaom --enable-libsoxr
  libavutil      56. 22.100 / 56. 22.100
  libavcodec     58. 35.100 / 58. 35.100
  libavformat    58. 20.100 / 58. 20.100
  libavdevice    58.  5.100 / 58.  5.100
  libavfilter     7. 40.101 /  7. 40.101
  libavresample   4.  0.  0 /  4.  0.  0
  libswscale      5.  3.100 /  5.  3.100
  libswresample   3.  3.100 /  3.  3.100
  libpostproc    55.  3.100 / 55.  3.100
Input #0, mp3, from 'test.mp3':
  Metadata:
    major_brand     : mp42
    minor_version   : 0
    compatible_brands: isommp42
    encoder         : Lavf57.56.101
  Duration: 00:05:06.94, start: 0.025057, bitrate: 192 kb/s
    Stream #0:0: Audio: mp3, 44100 Hz, stereo, fltp, 192 kb/s
    Metadata:
      encoder         : Lavc57.64
Stream mapping:
  Stream #0:0 -> #0:0 (mp3 (mp3float) -> pcm_s16le (native))
Press [q] to stop, [?] for help
Output #0, wav, to 'test.mp3.temp.wav':
  Metadata:
    major_brand     : mp42
    minor_version   : 0
    compatible_brands: isommp42
    ISFT            : Lavf58.20.100
    Stream #0:0: Audio: pcm_s16le ([1][0][0][0] / 0x0001), 16000 Hz, mono, s16, 256 kb/s
    Metadata:
      encoder         : Lavc58.35.100 pcm_s16le
size=    9591kB time=00:05:06.89 bitrate= 256.0kbits/s speed= 456x
video:0kB audio:9591kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: 0.000794%
1/1 Transcribing test.mp3.temp.wav
ERROR: "cmd_ln.c", line 942: Unknown argument: -alignctl

File "/usr/local/bin/audiogrep", line 6, in <module>
    audiogrep.main()
  File "/Library/Python/2.7/site-packages/audiogrep/audiogrep.py", line 392, in main
    transcribe(files)
  File "/Library/Python/2.7/site-packages/audiogrep/audiogrep.py", line 43, in transcribe
    transcript = subprocess.check_output(['pocketsphinx_continuous', '-infile', f, '-time', 'yes', '-logfn', '/dev/null', '-vad_prespeech', str(pre), '-vad_postspeech', str(post)])
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/subprocess.py", line 567, in check_output
    output, unused_err = process.communicate()
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/subprocess.py", line 791, in communicate
    stdout = _eintr_retry_call(self.stdout.read)
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/subprocess.py", line 476, in _eintr_retry_call
    return func(*args)

'franken' supercut mp3 length always double

It seems that when constructing a 'franken' sentence, the resulting audio file is always double the length of the audio audible part.

Audiogrep complains pocketsphinx isn't installed when it is

I'm trying to run audiogrep inside of a virtualenv, that has pocketshphinx installed, but audiogrep still gives the error that it's not installed :(

It's clearly coming from this line: https://github.com/antiboredom/audiogrep/blob/master/audiogrep/audiogrep.py#L387

Which refers to pocketsphinx_continuous, which pip doesn't know about, but amending it to refer to plain pocketsphinx returns the same error. Giving Popen the path of pocketsphinx.py in the virtualenv's site-packages directory gives OSError: [Errno 13] Permission denied

(audiogrep):~/Documents/coding/audiogrep$ pip show audiogrep

---
Name: audiogrep
Version: 0.1.2
Location: /home/Documents/coding/audiogrep/lib/python2.7/site-packages
Requires: pydub
(audiogrep):~/Documents/coding/audiogrep$ pip show pocketsphinx

---
Name: pocketsphinx
Version: 0.1.3
Location: /home/Documents/coding/audiogrep/lib/python2.7/site-packages
Requires: 
(audiogrep):~/Documents/coding/audiogrep$ audiogrep --input ../mbmbam01.mp3 --transcribe
Error: Please install pocketsphinx to transcribe files.

[Idea] Using Movie Subs to identify word matches

If audiogrep would be used together with movies / series which have subtitle-files (.srt), there would be a really big source for word matches. As it is specified where to look for a certain word would make it less processing intensive than scanning the whole audio file.

Example from Better Call Saul Subs:

…
14
00:00:57,369 --> 00:00:58,651
It hurts so bad.

15
00:01:00,166 --> 00:01:02,218
Look at that. Yeah, it's this one.
…

Scanning the file for bad and then using audiogrep on the range [00:00:57, 00:00:58] should identify the vocal, right?!

Windows installation fails

I'm trying to package audiogrep for conda-forge, however packaging on Windows fails. I can't quite make sense of that failure, any ideas?

question: is there a way to randomize the order of the clips?

It currently creates them by the order of files, one input file at a time.

command for splicing silences

hi @antiboredom
i am trying to splice a long audio file in all the words that are said, so i want to spliceit when a silence happens, but haven't still figured it out yet.

what is the command for that?
what are the parameters available?

in return for your help i can do a pull request with documentation for the README of audiogrep.

i will keep on looking for a way to do it with ffmpeg too.

Different Acoustic Models

Any way to use a different acoustic model?

Phonetic Search Technology (PST)

Is anyone working on porting this to any Phonetic Search solutions? Translation using speech to text is rudimentary at best on files containing varying patterns of speech. I've been looking into an OSS for phonetic speech translation. I was extremely excited when this project was created as I was using Soundbites until this point for my speech recognition projects. I plan to contribute an api for it if I can find an OSS engine.

Audio grep for tunes.

not really an issue but..
I am looking to strip the "middle bit" out of some podcasts I listen to. These are usually bookended with a certain tune. Does anyone know of any projects similar to this one that can search based on an audio sample and output a timestamp, or make a supercut?

antiboredom / audiogrep Goto Github PK

audiogrep's People

Contributors

Stargazers

Watchers

Forkers

audiogrep's Issues

Recommend Projects

Recommend Topics

Recommend Org

Jobs