kaist-maclab / pytsmod Goto Github PK

An open-source Python library for audio time-scale modification.

License: GNU General Public License v3.0

Python 100.00%

audio python music dsp numpy tsm time-scale scipy librosa

pytsmod's Introduction

PyTSMod

PyTSMod is an open-source library for Time-Scale Modification algorithms in Python 3. PyTSMod contains basic TSM algorithms such as Overlap-Add (OLA), Waveform-Similarity Overlap-Add (WSOLA), Time-Domain Pitch-Synchronous Overlap-Add (TD-PSOLA), and Phase Vocoder (PV-TSM). We are also planning to add more TSM algorithms and pitch shifting algorithms.

Full documentation is available on https://pytsmod.readthedocs.io

The implementation of the algorithms are based on those papers and libraries:

TSM Toolbox: MATLAB Implementations of Time-Scale Modification Algorithms.
Jonathan Driedger, Meinard Müller.
Proceedings of the 17th International Conference on Digital Audio Effects (DAFx-14), 2014.

A review of time-scale modification of music signals.
Jonathan Driedger, Meinard Müller.
Applied Sciences, 6(2), 57, 2016.

DAFX: digital audio effects
Udo Zölzer.
John Wiley & Sons, 2011.

Installing PyTSMod

PyTSMod is hosted on PyPI. To install, run the following command in your Python environment:

$ pip install pytsmod

Or if you use poetry, you can clone the repository and build the package through the following command:

$ poetry build

Requirements

To use the latest version of PyTSMod, Python with version >= 3.8 and following packages are required.

NumPy (>=1.20.0)
SciPy (>=1.8.0)
soundfile (>=0.10.0)

Using PyTSMod

Using OLA, WSOLA, and PV-TSM

OLA, WSOLA, and PV-TSM can be imported as module to be used directly in Python. To get the result easily, all you need is just two parameters, the input audio sequence x and the time stretching factor s. Here's a minimal example:

import numpy as np
import pytsmod as tsm
import soundfile as sf  # you can use other audio load packages.

x, sr = sf.read('/FILEPATH/AUDIOFILE.wav')
x = x.T
x_length = x.shape[-1]  # length of the audio sequence x.

s_fixed = 1.3  # stretch the audio signal 1.3x times.
s_ap = np.array([[0, x_length / 2, x_length], [0, x_length, x_length * 1.5]])  # double the first half of the audio only and preserve the other half.

x_s_fixed = tsm.wsola(x, s_fixed)
x_s_ap = tsm.wsola(x, s_ap)

Time stretching factor s

Time stretching factor s can either be a constant value (alpha) or an 2 x n array of anchor points which contains the sample points of the input signal in the first row and the sample points of the output signal in the second row.

Using TD-PSOLA

When using TD-PSOLA, the estimated pitch information of the source you want to modify is needed. Also, you should know the hop size and frame length of the pitch tracking algorithm you used. Here's a minimal example:

import numpy as np
import pytsmod as tsm
import crepe  # you can use other pitch tracking algorithms.
import soundfile as sf  # you can use other audio load packages.

x, sr = sf.read('/FILEPATH/AUDIOFILE.wav')

_, f0_crepe, _, _ = crepe.predict(x, sr, viterbi=True, step_size=10)

x_double_stretched = tsm.tdpsola(x, sr, f0_crepe, alpha=2, p_hop_size=441, p_win_size=1470)  # hop_size and frame_length for CREPE step_size=10 with sr=44100
x_3keyup = tsm.tdpsola(x, sr, f0_crepe, beta=pow(2, 3/12), p_hop_size=441, p_win_size=1470)
x_3keydown = tsm.tdpsola(x, sr, f0_crepe, tgt_f0=f0_crepe * pow(2, -3/12), p_hop_size=441, p_win_size=1470)

Time stretching factor alpha

In this version, TD-PSOLA only supports the fixed time stretching factor alpha.

Pitch shifting factor beta and target_f0

You can modify pitch of the audio sequence in two ways. The first one is beta, which is the fixed pitch shifting factor. The other one is target_f0, which supports target pitch sequence you want to convert. You cannot use both of the parameters.

Using PyTSMod from the command line

From version 0.3.0, this package includes a command-line tool named tsmod, which can create the result file easily from a shell. To generate the WSOLA result of input.wav with stretching factor 1.3 and save to output.wav, please run:

$ tsmod wsola input.wav output.wav 1.3  # ola, wsola, pv, pv_int are available.

Currently, OLA, WSOLA, and Phase Vocoder(PV) are supported. TD-PSOLA is excluded due to the difficulty of sending extracted pitch data to TD-PSOLA. Also, non-linear TSM is not supported in command-line.

For more information, use -h or --help command to see the detailed usage of tsmod.

Audio examples

The original audio is from TSM toolbox.

Stretching factor α=0.5

Name	Method	Original	OLA	WSOLA	Phase Vocoder	Phase Vocoder (phase locking)	TSM based on HPSS
CastanetsViolin	TSM Toolbox	wav	wav	wav	wav	wav	wav
-	PyTSMod	-	wav	wav	wav	wav	wav
DrumSolo	TSM Toolbox	wav	wav	wav	wav	wav	wav
-	PyTSMod	-	wav	wav	wav	wav	wav
Pop	TSM Toolbox	wav	wav	wav	wav	wav	wav
-	PyTSMod	-	wav	wav	wav	wav	wav
SingingVoice	TSM Toolbox	wav	wav	wav	wav	wav	wav
-	PyTSMod	-	wav	wav	wav	wav	wav

Stretching factor α=1.2

Name	Method	Original	OLA	WSOLA	Phase Vocoder	Phase Vocoder (phase locking)	TSM based on HPSS
CastanetsViolin	TSM Toolbox	wav	wav	wav	wav	wav	wav
-	PyTSMod	-	wav	wav	wav	wav	wav
DrumSolo	TSM Toolbox	wav	wav	wav	wav	wav	wav
-	PyTSMod	-	wav	wav	wav	wav	wav
Pop	TSM Toolbox	wav	wav	wav	wav	wav	wav
-	PyTSMod	-	wav	wav	wav	wav	wav
SingingVoice	TSM Toolbox	wav	wav	wav	wav	wav	wav
-	PyTSMod	-	wav	wav	wav	wav	wav

Stretching factor α=1.8

Name	Method	Original	OLA	WSOLA	Phase Vocoder	Phase Vocoder (phase locking)	TSM based on HPSS
CastanetsViolin	TSM Toolbox	wav	wav	wav	wav	wav	wav
-	PyTSMod	-	wav	wav	wav	wav	wav
DrumSolo	TSM Toolbox	wav	wav	wav	wav	wav	wav
-	PyTSMod	-	wav	wav	wav	wav	wav
Pop	TSM Toolbox	wav	wav	wav	wav	wav	wav
-	PyTSMod	-	wav	wav	wav	wav	wav
SingingVoice	TSM Toolbox	wav	wav	wav	wav	wav	wav
-	PyTSMod	-	wav	wav	wav	wav	wav

References

[1] Jonathan Driedger, Meinard Müller. "TSM Toolbox: MATLAB Implementations of Time-Scale Modification Algorithms", Proceedings of the 17th International Conference on Digital Audio Effects (DAFx-14). 2014.

[2] Jonathan Driedger, Meinard Müller. "A review of time-scale modification of music signals", Applied Sciences, 6(2), 57. 2016.

[3] Udo Zölzer. "DAFX: digital audio effects", John Wiley & Sons. 2011.

pytsmod's People

Contributors

Stargazers

Watchers

pytsmod's Issues

[BUG] Please avoid upper limits on dependency versions

Describe the bug

Not really a bug, more of an (big) annoyance: it is a well-known issue with poetry that it insists to add upper limits on dependency versions. While this fine for an application which is at the end of the dependency tree, it is a much bigger problem for a library that will be installed together with many other dependencies, such as pytsmod.

pytsmod 0.3.7 has the following dependencies:

numpy = "^1.20"
scipy = "^1.8"
soundfile = "^0.10"

This means that installing it will force a downgrade of numpy to 1.20 (current version = 1.26), of scipy to 1.8 (instead of 1.11), and soundfile to 0.10 (instead of 0.12).

Unless you are absolutely sure that any version newer than those will break your package, this should really be avoided for a library, unless you update your library sufficiently often so as to always keep these upper bounds to the latest available working versions of all your dependencies (using e.g. dependabot). But evidently, this is not the case here (which is not a critique - it doesn't really make sense to release a new version every week just because the requirements have to be updated).

Note: what is a bit strange is that in the current main branch (not released), the soundfile dependency was changed to soundfile = ">=0.10", which is better, but the issue remains for numpy and scipy.

To Reproduce
Steps to reproduce the behavior:

pip install soundfile numpy
pip install pytsmod

Expected behavior

soundfile and numpy should not be downgraded when installing pytsmod.

Desktop (please complete the following information):

OS: any
Python version: any/3.10
Version 0.3.7

Additional context

See this blog post: https://iscinumpy.dev/post/bound-version-constraints/

[BUG] padding is not working with numpy 1.16

Describe the bug
A clear and concise description of what the bug is.

x_padded = np.pad(x, ((0, 0), (left_pad, right_pad)))
TypeError: pad() missing 1 required positional argument: 'mode'

To Reproduce
Steps to reproduce the behavior:

Use TSM functions with numpy 1.16

Expected behavior
A clear and concise description of what you expected to happen.

Screenshots
If applicable, add screenshots to help explain your problem.

Desktop (please complete the following information):

OS: [e.g. Ubuntu 20.04]
Python version: [e.g. Python 3.7.9]
Version [e.g. 0.3.1]

Additional context
Add any other context about the problem here.

Latest Version of PyTSMod Unavailable for Python 3.11

PyTSMod versions 0.3.6, 0.3.5, and 0.3.4 all require Python versions 3.8, 3.9, or 3.10. Since version 0.3.3 is less restricted, it's what gets installed when running "pip install pytsmod" on Python 3.11. In particular, this means installing on Python 3.11 still gives librosa (and by extension numba) as a dependency, which is a bit of bloat and causes issues with things like nuitka compilation.

Great package by the way, love your work!

[Feature Request] Transposed output when using PyTSMod with SoundFile

When using read() from soundfile library channel places on 2nd axis.
But the shape of the audio is transposed after using the implemented TSM algorithms.
This might be confusing for some people.
It would be grateful if you take consideration of this.

[BUG]

Describe the bug
TypeError: slice indices must be integers or None or have an index method

This error occered when I was runing the code of tutorial.

Targeted TD-PSOLA fails

Hello, I am trying to follow step by the step of the tutorial. My input is a single channel audio file.

Small edit to the tutorial:
I assume it should be
x_3keydown = tsm.tdpsola(x, sr, src_f0=f0_crepe, **tgt_f0**=f0_crepe * pow(2, -3/12), p_hop_size=441, p_win_size=1470)
and not target_f0.

The issue is that
tsm.tdpsola(x, sr, src_f0=f0_crepe, tgt_f0=f0_crepe * pow(2, -3/12), p_hop_size=441, p_win_size=1470)
fails with this error.
Is it because the input is single channel ?

`---------------------------------------------------------------------------
IndexError Traceback (most recent call last)
in
1 target = f0_crepe * pow(2, -3/12)
----> 2 x_3keydown = tsm.tdpsola(x.T, sr, src_f0=f0_crepe, tgt_f0=target, p_hop_size=441, p_win_size=1470)

c:\users\stefano\appdata\local\programs\python\python37\lib\site-packages\pytsmod\tdpsolatsm.py in tdpsola(x, sr, src_f0, tgt_f0, alpha, beta, win_type, p_hop_size, p_win_size)
64 tgt_f0_chan = tgt_f0[c]
65 beta_seq = _target_f0_to_beta(x_chan, pm_chan,
---> 66 src_f0_chan, tgt_f0_chan)
67 else:
68 beta_seq = np.ones(pitch_period.size) * beta

c:\users\stefano\appdata\local\programs\python\python37\lib\site-packages\pytsmod\tdpsolatsm.py in _target_f0_to_beta(x, pitch_mark, source_f0, target_f0)
148 idx = source_f0.size - 1
149
--> 150 if (not target_f0[idx] == 0) and (not source_f0[idx] == 0):
151 beta[i] = target_f0[idx] / source_f0[idx]
152 else:

IndexError: only integers, slices (:), ellipsis (...), numpy.newaxis (None) and integer or boolean arrays are valid indices
`

Getting Error while Implementing wsola for time stretching using 2 x n array of anchor points

While implementing wsola for time stretching using 2 x n array of anchor points got the following error:
ValueError: index can't contain negative values
It happens for only certain combination of anchor points.
for eg if I want to change the audio length in the following ratio:[ 1, 1.1, 1.2, 0.9] i give the 2 X n array as:
np.array([[0, x_length/4, 2x_length/4, 3x_length/4, x_length], [0, x_length/4 * 1 , 2x_length/4 * 1.1, 3x_length/4 * 1.2 ,
x_length * 0.9]]])
it gives the error :ValueError: array is too big; arr.size * arr.dtype.itemsize is larger than the maximum possible size. or sometimes
ValueError: index can't contain negative values.
but it works fine for the anchor list [ 1, 1.1, 1.2, 0.8] or [1, 1.1, 1.2, 1]. Is there anything I am missing