superkogito / pydiogment Goto Github PK

View Code? Open in Web Editor NEW

85.0 9.0 16.0 90.81 MB

:mega: Python library for audio augmentation

Home Page: https://superkogito.github.io/pydiogment/

License: BSD 3-Clause "New" or "Revised" License

Python 81.42% TeX 15.77% Makefile 2.82%

audio augmentation audio-processing sound sound-processing audio-effects machine-learning deep-learning python

pydiogment's Introduction

🔔 Pydiogment

Pydiogment aims to simplify audio augmentation. It generates multiple audio files based on a starting mono audio file. The library can generates files with higher speed, slower, and different tones etc.

📥 Installation

Dependencies

Pydiogment requires:

Python (>= 3.5)
NumPy (>= 1.17.2)
SciPy (>= 1.3.1)
FFmpeg

On Linux

On Linux you can use the following commands to get the libraries:

Numpy: pip install numpy
Scipy: pip install scipy
FFmpeg: sudo apt install ffmpeg

On Windows

On Windows you can use the following installation binaries:

Numpy: https://www.lfd.uci.edu/~gohlke/pythonlibs/#numpy or if you have Python already installed you can use install it using pip3 install numpy
Scipy: https://www.lfd.uci.edu/~gohlke/pythonlibs/#scipy
FFmpeg: https://ffmpeg.org/download.html#build-windows

On MacOS

On MacOs, use homebrew to install the packages:

Numpy: brew install numpy --with-python3
Scipy: You need to first install a compilation tool like Gfortran using homebrew brew install gfortran when it's done, install Scipy pip install scipy for more information and guidelines you can check this link: https://github.com/scipy/scipy/blob/master/INSTALL.rst.txt#mac-os-x
FFmpeg: brew install ffmpeg

Installation

If you already have a working installation of NumPy and SciPy , you can simply install Pydiogment using pip:

pip install pydiogment

To update an existing version of Pydiogment, use:

pip install -U pydiogment

💡 How to use

Amplitude related augmentation

Apply a fade in and fade out effect

from pydiogment.auga import fade_in_and_out

test_file = "path/test.wav"
fade_in_and_out(test_file)

Apply gain to file

from pydiogment.auga import apply_gain

test_file = "path/test.wav"
apply_gain(test_file, -100)
apply_gain(test_file, -50)

Add Random Gaussian Noise based on SNR to file

from pydiogment.auga import add_noise

test_file = "path/test.wav"
add_noise(test_file, 10)

Frequency related augmentation

Change file tone

from pydiogment.augf import change_tone

test_file = "path/test.wav"
change_tone(test_file, 0.9)
change_tone(test_file, 1.1)

Time related augmentation

Slow-down/ speed-up file

from pydiogment.augt import slowdown, speed

test_file = "path/test.wav"
slowdown(test_file, 0.8)
speed(test_file, 1.2)

Apply random cropping to the file

from pydiogment.augt import random_cropping

test_file = "path/test.wav"
random_cropping(test_file, 1)

Change shift data on the time axis in a certain direction

from pydiogment.augt import shift_time

test_file = "path/test.wav"
shift_time(test_file, 1, "right")
shift_time(test_file, 1, "left")

Audio files format

This library currently supports mono WAV files only.

📑 Documentation

A thorough documentation of the library is available under pydiogment.readthedocs.io.

👷 Contributing and bugs report

Contributions are welcome and encouraged. To learn more about how to contribute to Pydiogment please refer to the Contributing guidelines

To report bugs, request a feature or just ask for help you can refer to the issues section. Before reporting a bug please make sure it is not addressed by an older issue and make sure to add your operating system type, its version number and the versions of the dependencies used.

🎉 Acknowledgment and credits

The test file used in the pytests is OSR_us_000_0060_8k.wav from the Open Speech Repository.

pydiogment's People

Contributors

Stargazers

Watchers

Forkers

dendisuhubdy templeblock sidney1994 kelelexu teora trendingtechnology sandy4321 5l1v3r1 cxzhp cnheider tranmduc lixianyi monk1337 dhockaday ecliptic-y baekms

pydiogment's Issues

[JOSS Review] unit tests and output files

reference review issue

In hitting the "automated tests" section, I noticed a couple of things about the unit test framework that could be improved.

All tests seem to detect existence of the output file, but this does not confirm correct behavior. For many of the tests, it would be simple enough to detect, eg, changes in length or gain. For others (eg IR) you might need to do regression testing against known correct outputs.
Minor thing, but your tests don't clean up after themselves. If you're using something like travis with build caching enabled, outputs from old runs could linger, and lead to spurious passing tests (because the files exist from a previous run). Since you're using pytest already, it wouldn't be too much work to use a tmp_path fixture to prevent this kind of behavior: https://docs.pytest.org/en/latest/tmpdir.html , or otherwise clean up old outputs after tests execute (pass or fail).

Point (2) would be much easier if the API was extended to allow the user to specify a target output path (#11). I see that you use the output filename to encode the deformation parameters, and letting a user specify the path exactly might make that difficult. I have some thoughts about how you might be able to accomplish both things (ie via string interpolation), but that might be out of scope for the testing issue.

Code documentation for pydiogment.utils.filters

the file pydiogment.utils.filters is missing the code documentation.

Add/Remove abbreviation of University names

Add/Remove abbreviation of University names in the paper.

Frequency masking

Any plan to add the ability to mask certain frequency ranges?

Speed benchmark vs nlpaug audio augmentations

Would be nice to have a speed benchmark vs https://github.com/makcedward/nlpaug

[JOSS Review] Paper details, related work, experiments

review issue

I read through the paper, and have a few comments for improving it.

Related work

A few projects that aren't, but should be cited:

Mauch, Matthias, and Sebastian Ewert. "The audio degradation toolbox and its application to robustness evaluation." (2013). (And python port: https://github.com/sevagh/audio-degradation-toolbox )
Schlüter, Jan, and Thomas Grill. "Exploring Data Augmentation for Improved Singing Voice Detection with Neural Networks." ISMIR. 2015.

More generally, it's not clear in the writeup how this project compares to the existing alternatives, functionality-wise. (Full disclosure, there's a bit of an awkward situation here as I'm the author of one of these alternative toolboxes, but I'll try to be objective 😁.) I know these papers are meant to be brief, but it's also important to properly establish context, and make clear to readers how this package differs from others.

(As an aside: I don't think the characterization of muda is entirely accurate: we use it for all sorts of things outside of music, notably environmental sound and bioacoustics.)

Experiment details

It's not clear in the writeup whether your experiment includes augmentation during testing, or only during training.

Paper review

Proposed modifications to the paper :

Page 1: Summary-line 6: "and most deformation..." -> "as most deformation..." (rather an explanation)
Page 1 : Summary-line 11: "the scipy..." -> "the Scipy..."
Page 1 : Amplitude based augmentation.Add Fade-line 1: " and a fade-out effects..." -> "and fade-out effect..." / "and a fade-out effects..."
Page 2: Add Noise: move the entire equation on a separate line.
Page 3-line 2: "a separating features..." -> "a separating feature..."
Page 3: Conclusion-line 2: "These strategies aims.." -> "These strategies aim..."

[JOSS review] supported audio formats

The README, docs, and docstrings don't specify which audio formats are supported. Please add information to all three about which audio formats are supported.

[JOSS review] gain doesn't work as expected

code:

test_file = "liz.wav"
apply_gain(test_file, +20)
apply_gain(test_file, -20)

result: gain_bug.zip

Change authors names in paper.bib to initials

Change the authors names in paper.bib from full to only initials similar to this the style of this reference-paper

FFmpeg developers will stay the same

[JOSS Review] change_tone?

👋 JOSS reviewer here, following on the main review issue openjournals/joss-reviews#2167

In working through the documentation, I was confused by the change_tone function. I gather that this is a pitch shifting operation, but it's not stated anywhere in the documentation / docstring https://pydiogment.readthedocs.io/en/latest/augf.html#pydiogment.augf.change_tone

Avoid `os.system()`

Replace all os.system() statements with subprocess.Popen().

[JOSS review] Docstrings missing information for output parameters

The docstrings are missing information about what the augmentation functions actually return.

For example, the docs for fade_in_and_out say:

pydiogment.auga.fade_in_and_out(infile)
Add a fade in and out effect to the audio file.

Args:
infile (str) : input filename/path.

This doesn't explain where the processed audio goes (saved to disk? returned as numpy array? as some other data format?), and whether there are other output parameters. The same applies to the other augmentation functions.

[JOSS Review] Community guidelines

review issue

There are guidelines for contributors in the readme (great!), but there need to also be guidelines for users seeking to report bugs and get help.

[JOSS review] installation instructions for macOS and Windows

The installation instructions in the README and the docs assume the user is using a linux OS that supports apt install, which isn't the case for windows and macOS.

Please add instructions for installing non-python dependencies on macOS and windows, unless this package only supports linux, in which case it should be clearly stated in the READM/docs.

I have an example of instructions for installing ffmpeg across all 3 OSs here:
https://github.com/justinsalamon/scaper/#non-python-dependencies

[JOSS review] fade_in_and_out doesn't work as expected

Just tried fade_in_and_out() and got unexpected behavior.

Here are my input/output files (the input is called input.wav): fade_bug.zip

The input is a 2-second, 16-bit, mono, WAV file. The output is heavily distorted and doesn't include the expected fade-in and fade-out behavior.

While on this function, also note it'd be convenient for the user to be able to choose the destination folder for the output as well the filename, or at least a custom suffix as opposed to the hard coded output path and filename.