maxstrange / audiosegment Goto Github PK

Wrapper for pydub AudioSegment objects

License: MIT License

Python 99.56% Shell 0.22% PowerShell 0.22%

audiosegment's Introduction

AudioSegment

Wrapper for pydub AudioSegment objects. An audiosegment.AudioSegment object wraps a pydub.AudioSegment object. Any methods or properties it has, this also has.

Docs are hosted by GitHub Pages, but are currently hideous. I've got to do something about them as soon as I find some time. You can also try Read The Docs, though the docs there don't seem to be building for some reason.... also something I need to look into. Up-to-date docs are also built and pushed and are in the docs folder of this repository.

Notes

There is a hidden dependency on the command line program 'sox'. Pip will not install it for you. You will have to install sox by:

Debian/Ubuntu: sudo apt-get install sox
Mac OS X: brew install sox
Windows: choco install sox

Also, I use librosa and scipy, for some of the functionality. These dependencies are hefty, and I have decided to make them optional. If you do not install them, you may get warnings when using audiosegment.

So, a full installation on Debian/Ubuntu would like like this:

sudo apt-get install sox
pip3 install --user audiosegment

# To get scipy, you will need some lapack/blas resources:
sudo apt-get install libatlas-base-dev gfortran
pip3 install --user scipy

# To get librosa, you will need numba, which requires LLVMlite, which requires LLVM.
sudo apt-get install llvm
pip3 install --user librosa

Make suitable adjustments to fit your own OS's package management system.

TODO

The following is the list of items I plan on implementing.

Finish implementing auditory scene analysis (a.k.a blind source separation)
Add voice-pass filtering and make voice activity detection better
Add language classification for English and Chinese (and show how to do it for other languages)
Add more examples to README (especially filterbank)
Finish removing the SOX dependency

I am open to other suggestions. Open an issue if you have requests, or better yet, if you can do it yourself and open a pull request, I'll take a look and merge in if I think it makes sense.

Example Usage

Basic information

import audiosegment

print("Reading in the wave file...")
seg = audiosegment.from_file("whatever.wav")

print("Information:")
print("Channels:", seg.channels)
print("Bits per sample:", seg.sample_width * 8)
print("Sampling frequency:", seg.frame_rate)
print("Length:", seg.duration_seconds, "seconds")

Voice Detection

# ...
print("Detecting voice...")
seg = seg.resample(sample_rate_Hz=32000, sample_width=2, channels=1)
results = seg.detect_voice()
voiced = [tup[1] for tup in results if tup[0] == 'v']
unvoiced = [tup[1] for tup in results if tup[0] == 'u']

print("Reducing voiced segments to a single wav file 'voiced.wav'")
voiced_segment = voiced[0].reduce(voiced[1:])
voiced_segment.export("voiced.wav", format="WAV")

print("Reducing unvoiced segments to a single wav file 'unvoiced.wav'")
unvoiced_segment = unvoiced[0].reduce(unvoiced[1:])
unvoiced_segment.export("unvoiced.wav", format="WAV")

Silence Removal

import matplotlib.pyplot as plt

# ...
print("Plotting before silence...")
plt.subplot(211)
plt.title("Before Silence Removal")
plt.plot(seg.get_array_of_samples())

seg = seg.filter_silence(duration_s=0.2, threshold_percentage=5.0)
outname_silence = "nosilence.wav"
seg.export(outname_silence, format="wav")

print("Plotting after silence...")
plt.subplot(212)
plt.title("After Silence Removal")

plt.tight_layout()
plt.plot(seg.get_array_of_samples())
plt.show()

FFT

import matplotlib.pyplot as plt
import numpy as np

#...
# Do it just for the first 3 seconds of audio
hist_bins, hist_vals = seg[1:3000].fft()
hist_vals_real_normed = np.abs(hist_vals) / len(hist_vals)
plt.plot(hist_bins / 1000, hist_vals_real_normed)
plt.xlabel("kHz")
plt.ylabel("dB")
plt.show()

Spectrogram

import matplotlib.pyplot as plt

#...
freqs, times, amplitudes = seg.spectrogram(window_length_s=0.03, overlap=0.5)
amplitudes = 10 * np.log10(amplitudes + 1e-9)

# Plot
plt.pcolormesh(times, freqs, amplitudes)
plt.xlabel("Time in Seconds")
plt.ylabel("Frequency in Hz")
plt.show()

audiosegment's People

Contributors

Stargazers

Watchers

Forkers

renbingfei karlzheng jdanbrown debeat luvwinnie karmanovdd anatanick ahmadabdulnasir elenazy copperdong ahmadhakami baekms

audiosegment's Issues

Incorrect segregation of voiced and unvoiced segments

Hello,

I will like to get the voiced segments from any audio file ( .wav format) and plot it against the time series of the original audio. I modified your code a bit and ran it on a simple audio file. For instance, I recorded a simple audio file with just my voice and tried to find voiced segments, but the code mistakenly gets voiced segments and classifies most of actual human voice as "Unvoiced segments"

What should I do?

audio_data, sampling_rate = librosa.load('try_voice.wav')
plt.figure(figsize=(14, 5))
librosa.display.waveplot(audio_data, sr=sampling_rate)

vad=wb.Vad()
filename= 'try_voice.wav'
audio= audiosegment.from_file(filename)

seg = audio.resample(sample_rate_Hz=32000, sample_width=2, channels=1)
results = seg.detect_voice()
voiced = [tup[1] for tup in results if tup[0] == 'v']
unvoiced = [tup[1] for tup in results if tup[0] == 'u']

voiced_segment = voiced[0].reduce(voiced[1:])
voiced_segment.export("voiced.wav", format="WAV")
voiced, sampling_rate_v= librosa.load('voiced.wav')

duration = len(voiced)/sampling_rate_v
time = np.arange(0,duration,1/sampling_rate_v) #time vector
plt.figure()
librosa.display.waveplot(voiced, sr=sampling_rate_v)
plt.show()

Is the library still being actively maintained ?

I came across this library for a implementing a certain use-case with PyDub and was wondering if it's being actively maintained vs not.

Also, I'd like to know if there's any specific reason for wrapping around pydub.AudioSegment as opposed to subclassing it and implementing / overriding additional functionality as needed ?

Most of the dunder methods (like __getattribute__, __mul__ etc., need not be re-implemented in the child class. Also, this conforms to the Open-Closed design (open for extension, closed for modification).

How do I install? With pip?

Error on noisy signal with silence_removal

First, thanks to your great works making pydub easy to use.

For noisy signal (wav file contains more noise than voice), the audiosegment filter_silence gives an error. Yes, it is make a sense. But, it will be better to give warning and return the originial signal. The needed step is to check whether there is silence below threshold or not, if not, return the original signal.

The error message is,
CouldntDecodeError: Couldn't find data header in wav data

This error message is gone when I make the threshold smaller (e.g. 0.1) for noise-corrupted speech, but for noisy signal it will gives error.

audiosegment.from_file cannot work with readers

I use argparse.FileType to properly enforce that file (paths) passed as parameters exist, are readable, ...
As such, I do not have a Path object to pass to from_file, but an _io.BufferedReader or somesuch.

Generally, it would be expected to be able to pass a reader, as we might be working with in-memory data & such.

load wav named chinese name error

File "vad-master/cut_speech.py", line 135, in
mul_detect(wavdir,outdir,outflag)
File "vad-master/cut_speech.py", line 96, in mul_detect
sound = AudioSegment.from_wav(file_tmp)
File "/data/zyb/miniconda2/lib/python2.7/site-packages/pydub/audio_segment.py", line 728, in from_wav
return cls.from_file(file, 'wav', parameters=parameters)
File "/data/zyb/miniconda2/lib/python2.7/site-packages/pydub/audio_segment.py", line 607, in from_file
filename = fsdecode(file)
File "/data/zyb/miniconda2/lib/python2.7/site-packages/pydub/utils.py", line 208, in fsdecode
return filename.decode(sys.getfilesystemencoding())
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe6 in position 24: ordinal not in range(128)