GithubHelp home page GithubHelp logo

mir-aidj / all-in-one Goto Github PK

View Code? Open in Web Editor NEW
356.0 9.0 34.0 278 KB

All-In-One Music Structure Analyzer

Home Page: http://arxiv.org/abs/2307.16425

License: MIT License

Python 100.00%
beat-tracking pytorch music-structure-analysis allin1

all-in-one's Introduction

All-In-One Music Structure Analyzer

Visual Demo arXiv Hugging Face Space PyPI - Version PyPI - Python Version

This package provides models for music structure analysis, predicting:

  1. Tempo (BPM)
  2. Beats
  3. Downbeats
  4. Functional segment boundaries
  5. Functional segment labels (e.g., intro, verse, chorus, bridge, outro)

Table of Contents

Installation

1. Install PyTorch

Visit PyTorch and install the appropriate version for your system.

2. Install NATTEN (Required for Linux and Windows; macOS will auto-install)

  • Linux: Download from NATTEN website
  • macOS: Auto-installs with allin1.
  • Windows: Build from source:
pip install ninja # Recommended, not required
git clone https://github.com/SHI-Labs/NATTEN
cd NATTEN
make

3. Install the package

pip install git+https://github.com/CPJKU/madmom  # install the latest madmom directly from GitHub
pip install allin1  # install this package

4. (Optional) Install FFmpeg for MP3 support

For ubuntu:

sudo apt install ffmpeg

For macOS:

brew install ffmpeg

Usage for CLI

To analyze audio files:

allin1 your_audio_file1.wav your_audio_file2.mp3

Results will be saved in the ./struct directory by default:

./struct
└── your_audio_file1.json
└── your_audio_file2.json

The analysis results will be saved in JSON format:

{
  "path": "/path/to/your_audio_file.wav",
  "bpm": 100,
  "beats": [ 0.33, 0.75, 1.14, ... ],
  "downbeats": [ 0.33, 1.94, 3.53, ... ],
  "beat_positions": [ 1, 2, 3, 4, 1, 2, 3, 4, 1, ... ],
  "segments": [
    {
      "start": 0.0,
      "end": 0.33,
      "label": "start"
    },
    {
      "start": 0.33,
      "end": 13.13,
      "label": "intro"
    },
    {
      "start": 13.13,
      "end": 37.53,
      "label": "chorus"
    },
    {
      "start": 37.53,
      "end": 51.53,
      "label": "verse"
    },
    ...
  ]
}

All available options are as follows:

$ allin1 -h

usage: allin1 [-h] [-o OUT_DIR] [-v] [--viz-dir VIZ_DIR] [-s] [--sonif-dir SONIF_DIR] [-a] [-e] [-m MODEL] [-d DEVICE] [-k]
              [--demix-dir DEMIX_DIR] [--spec-dir SPEC_DIR]
              paths [paths ...]

positional arguments:
  paths                 Path to tracks

options:
  -h, --help            show this help message and exit
  -o OUT_DIR, --out-dir OUT_DIR
                        Path to a directory to store analysis results (default: ./struct)
  -v, --visualize       Save visualizations (default: False)
  --viz-dir VIZ_DIR     Directory to save visualizations if -v is provided (default: ./viz)
  -s, --sonify          Save sonifications (default: False)
  --sonif-dir SONIF_DIR
                        Directory to save sonifications if -s is provided (default: ./sonif)
  -a, --activ           Save frame-level raw activations from sigmoid and softmax (default: False)
  -e, --embed           Save frame-level embeddings (default: False)
  -m MODEL, --model MODEL
                        Name of the pretrained model to use (default: harmonix-all)
  -d DEVICE, --device DEVICE
                        Device to use (default: cuda if available else cpu)
  -k, --keep-byproducts
                        Keep demixed audio files and spectrograms (default: False)
  --demix-dir DEMIX_DIR
                        Path to a directory to store demixed tracks (default: ./demix)
  --spec-dir SPEC_DIR   Path to a directory to store spectrograms (default: ./spec)

Usage for Python

Available functions:

analyze()

Analyzes the provided audio files and returns the analysis results.

import allin1

# You can analyze a single file:
result = allin1.analyze('your_audio_file.wav')

# Or multiple files:
results = allin1.analyze(['your_audio_file1.wav', 'your_audio_file2.mp3'])

A result is a dataclass instance containing:

AnalysisResult(
  path='/path/to/your_audio_file.wav', 
  bpm=100,
  beats=[0.33, 0.75, 1.14, ...],
  beat_positions=[1, 2, 3, 4, 1, 2, 3, 4, 1, ...],
  downbeats=[0.33, 1.94, 3.53, ...], 
  segments=[
    Segment(start=0.0, end=0.33, label='start'), 
    Segment(start=0.33, end=13.13, label='intro'), 
    Segment(start=13.13, end=37.53, label='chorus'), 
    Segment(start=37.53, end=51.53, label='verse'), 
    Segment(start=51.53, end=64.34, label='verse'), 
    Segment(start=64.34, end=89.93, label='chorus'), 
    Segment(start=89.93, end=105.93, label='bridge'), 
    Segment(start=105.93, end=134.74, label='chorus'), 
    Segment(start=134.74, end=153.95, label='chorus'), 
    Segment(start=153.95, end=154.67, label='end'),
  ]),

Unlike CLI, it does not save the results to disk by default. You can save them as follows:

result = allin1.analyze(
  'your_audio_file.wav',
  out_dir='./struct',
)

Parameters:

  • paths : Union[PathLike, List[PathLike]]
    List of paths or a single path to the audio files to be analyzed.

  • out_dir : PathLike (optional)
    Path to the directory where the analysis results will be saved. By default, the results will not be saved.

  • visualize : Union[bool, PathLike] (optional)
    Whether to visualize the analysis results or not. If a path is provided, the visualizations will be saved in that directory. Default is False. If True, the visualizations will be saved in './viz'.

  • sonify : Union[bool, PathLike] (optional)
    Whether to sonify the analysis results or not. If a path is provided, the sonifications will be saved in that directory. Default is False. If True, the sonifications will be saved in './sonif'.

  • model : str (optional)
    Name of the pre-trained model to be used for the analysis. Default is 'harmonix-all'. Please refer to the documentation for the available models.

  • device : str (optional)
    Device to be used for computation. Default is 'cuda' if available, otherwise 'cpu'.

  • include_activations : bool (optional)
    Whether to include activations in the analysis results or not.

  • include_embeddings : bool (optional)
    Whether to include embeddings in the analysis results or not.

  • demix_dir : PathLike (optional)
    Path to the directory where the source-separated audio will be saved. Default is './demix'.

  • spec_dir : PathLike (optional)
    Path to the directory where the spectrograms will be saved. Default is './spec'.

  • keep_byproducts : bool (optional)
    Whether to keep the source-separated audio and spectrograms or not. Default is False.

  • multiprocess : bool (optional)
    Whether to use multiprocessing for extracting spectrograms. Default is True.

Returns:

  • Union[AnalysisResult, List[AnalysisResult]]
    Analysis results for the provided audio files.

load_result()

Loads the analysis results from the disk.

result = allin1.load_result('./struct/24k_Magic.json')

visualize()

Visualizes the analysis results.

fig = allin1.visualize(result)
fig.show()

Parameters:

  • result : Union[AnalysisResult, List[AnalysisResult]]
    List of analysis results or a single analysis result to be visualized.

  • out_dir : PathLike (optional)
    Path to the directory where the visualizations will be saved. By default, the visualizations will not be saved.

Returns:

  • Union[Figure, List[Figure]] List of figures or a single figure containing the visualizations. Figure is a class from matplotlib.pyplot.

sonify()

Sonifies the analysis results. It will mix metronome clicks for beats and downbeats, and event sounds for segment boundaries to the original audio file.

y, sr = allin1.sonify(result)
# y: sonified audio with shape (channels=2, samples)
# sr: sampling rate (=44100)

Parameters:

  • result : Union[AnalysisResult, List[AnalysisResult]]
    List of analysis results or a single analysis result to be sonified.
  • out_dir : PathLike (optional)
    Path to the directory where the sonifications will be saved. By default, the sonifications will not be saved.

Returns:

  • Union[Tuple[NDArray, float], List[Tuple[NDArray, float]]]
    List of tuples or a single tuple containing the sonified audio and the sampling rate.

Visualization & Sonification

This package provides a simple visualization (-v or --visualize) and sonification (-s or --sonify) function for the analysis results.

allin1 -v -s your_audio_file.wav

The visualizations will be saved in the ./viz directory by default:

./viz
└── your_audio_file.pdf

The sonifications will be saved in the ./sonif directory by default:

./sonif
└── your_audio_file.sonif.wav

For example, a visualization looks like this: Visualization

You can try it at Hugging Face Space.

Available Models

The models are trained on the Harmonix Set with 8-fold cross-validation. For more details, please refer to the paper.

  • harmonix-all: (Default) An ensemble model averaging the predictions of 8 models trained on each fold.
  • harmonix-foldN: A model trained on fold N (0~7). For example, harmonix-fold0 is trained on fold 0.

By default, the harmonix-all model is used. To use a different model, use the --model option:

allin1 --model harmonix-fold0 your_audio_file.wav

Speed

With an RTX 4090 GPU and Intel i9-10940X CPU (14 cores, 28 threads, 3.30 GHz), the harmonix-all model processed 10 songs (33 minutes) in 73 seconds.

Advanced Usage for Research

This package provides researchers with advanced options to extract frame-level raw activations and embeddings without post-processing. These have a resolution of 100 FPS, equivalent to 0.01 seconds per frame.

CLI

Activations

The --activ option also saves frame-level raw activations from sigmoid and softmax:

$ allin1 --activ your_audio_file.wav

You can find the activations in the .npz file:

./struct
└── your_audio_file1.json
└── your_audio_file1.activ.npz

To load the activations in Python:

>>> import numpy as np
>>> activ = np.load('./struct/your_audio_file1.activ.npz')
>>> activ.files
['beat', 'downbeat', 'segment', 'label']
>>> beat_activations = activ['beat']
>>> downbeat_activations = activ['downbeat']
>>> segment_boundary_activations = activ['segment']
>>> segment_label_activations = activ['label']

Details of the activations are as follows:

  • beat: Raw activations from the sigmoid layer for beat tracking (shape: [time_steps])
  • downbeat: Raw activations from the sigmoid layer for downbeat tracking (shape: [time_steps])
  • segment: Raw activations from the sigmoid layer for segment boundary detection (shape: [time_steps])
  • label: Raw activations from the softmax layer for segment labeling (shape: [label_class=10, time_steps])

You can access the label names as follows:

>>> allin1.HARMONIX_LABELS
['start',
 'end',
 'intro',
 'outro',
 'break',
 'bridge',
 'inst',
 'solo',
 'verse',
 'chorus']

Embeddings

This package also provides an option to extract raw embeddings from the model.

$ allin1 --embed your_audio_file.wav

You can find the embeddings in the .npy file:

./struct
└── your_audio_file1.json
└── your_audio_file1.embed.npy

To load the embeddings in Python:

>>> import numpy as np
>>> embed = np.load('your_audio_file1.embed.npy')

Each model embeds for every source-separated stem per time step, resulting in embeddings shaped as [stems=4, time_steps, embedding_size=24]:

  1. The number of source-separated stems (the order is bass, drums, other, vocals).
  2. The number of time steps (frames). The time step is 0.01 seconds (100 FPS).
  3. The embedding size of 24.

Using the --embed option with the harmonix-all ensemble model will stack the embeddings, saving them with the shape [stems=4, time_steps, embedding_size=24, models=8].

Python

The Python API allin1.analyze() offers the same options as the CLI:

>>> allin1.analyze(
      paths='your_audio_file.wav',
      include_activations=True,
      include_embeddings=True,
    )

AnalysisResult(
  path='/path/to/your_audio_file.wav', 
  bpm=100, 
  beats=[...],
  downbeats=[...],
  segments=[...],
  activations={
    'beat': array(...), 
    'downbeat': array(...), 
    'segment': array(...), 
    'label': array(...)
  }, 
  embeddings=array(...),
)

Concerning MP3 Files

Due to variations in decoders, MP3 files can have slight offset differences. I recommend you to first convert your audio files to WAV format using FFmpeg (as shown below), and use the WAV files for all your data processing pipelines.

ffmpeg -i your_audio_file.mp3 your_audio_file.wav

In this package, audio files are read using Demucs. To my understanding, Demucs converts MP3 files to WAV using FFmpeg before reading them. However, using a different MP3 decoder can yield different offsets. I've observed variations of about 20~40ms, which is problematic for tasks requiring precise timing like beat tracking, where the conventional tolerance is just 70ms. Hence, I advise standardizing inputs to the WAV format for all data processing, ensuring straightforward decoding.

Training

Please refer to TRAINING.md.

Citation

If you use this package for your research, please cite the following paper:

@inproceedings{taejun2023allinone,
  title={All-In-One Metrical And Functional Structure Analysis With Neighborhood Attentions on Demixed Audio},
  author={Kim, Taejun and Nam, Juhan},
  booktitle={IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA)},
  year={2023}
}

all-in-one's People

Contributors

tae-jun avatar vpavlenko avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

all-in-one's Issues

huggingface_hub.utils._errors.LocalEntryNotFoundError

Hi, I keep having the following issue, can u help me plz! Thank you!!!!!

=> Found 0 tracks already analyzed and 1 tracks to analyze.
=> Found 1 tracks already demixed, 0 to demix.
=> Found 1 spectrograms already extracted, 0 to extract.
Traceback (most recent call last):
File "/Users/tingqiwang/anaconda3/envs/AI_DJ/lib/python3.11/site-packages/urllib3/connectionpool.py", line 775, in urlopen
self._prepare_proxy(conn)
File "/Users/tingqiwang/anaconda3/envs/AI_DJ/lib/python3.11/site-packages/urllib3/connectionpool.py", line 1044, in _prepare_proxy
conn.connect()
File "/Users/tingqiwang/anaconda3/envs/AI_DJ/lib/python3.11/site-packages/urllib3/connection.py", line 652, in connect
sock_and_verified = _ssl_wrap_socket_and_match_hostname(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/tingqiwang/anaconda3/envs/AI_DJ/lib/python3.11/site-packages/urllib3/connection.py", line 805, in ssl_wrap_socket_and_match_hostname
ssl_sock = ssl_wrap_socket(
^^^^^^^^^^^^^^^^
File "/Users/tingqiwang/anaconda3/envs/AI_DJ/lib/python3.11/site-packages/urllib3/util/ssl
.py", line 465, in ssl_wrap_socket
ssl_sock = ssl_wrap_socket_impl(sock, context, tls_in_tls, server_hostname)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/tingqiwang/anaconda3/envs/AI_DJ/lib/python3.11/site-packages/urllib3/util/ssl
.py", line 509, in _ssl_wrap_socket_impl
return ssl_context.wrap_socket(sock, server_hostname=server_hostname)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/tingqiwang/anaconda3/envs/AI_DJ/lib/python3.11/ssl.py", line 517, in wrap_socket
return self.sslsocket_class._create(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/tingqiwang/anaconda3/envs/AI_DJ/lib/python3.11/ssl.py", line 1104, in _create
self.do_handshake()
File "/Users/tingqiwang/anaconda3/envs/AI_DJ/lib/python3.11/ssl.py", line 1382, in do_handshake
self._sslobj.do_handshake()
TimeoutError: _ssl.c:989: The handshake operation timed out

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "/Users/tingqiwang/anaconda3/envs/AI_DJ/lib/python3.11/site-packages/requests/adapters.py", line 667, in send
resp = conn.urlopen(
^^^^^^^^^^^^^
File "/Users/tingqiwang/anaconda3/envs/AI_DJ/lib/python3.11/site-packages/urllib3/connectionpool.py", line 843, in urlopen
retries = retries.increment(
^^^^^^^^^^^^^^^^^^
File "/Users/tingqiwang/anaconda3/envs/AI_DJ/lib/python3.11/site-packages/urllib3/util/retry.py", line 474, in increment
raise reraise(type(error), error, _stacktrace)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/tingqiwang/anaconda3/envs/AI_DJ/lib/python3.11/site-packages/urllib3/util/util.py", line 39, in reraise
raise value
File "/Users/tingqiwang/anaconda3/envs/AI_DJ/lib/python3.11/site-packages/urllib3/connectionpool.py", line 777, in urlopen
self._raise_timeout(
File "/Users/tingqiwang/anaconda3/envs/AI_DJ/lib/python3.11/site-packages/urllib3/connectionpool.py", line 369, in _raise_timeout
raise ReadTimeoutError(
urllib3.exceptions.ReadTimeoutError: HTTPSConnectionPool(host='huggingface.co', port=443): Read timed out. (read timeout=10)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/Users/tingqiwang/anaconda3/envs/AI_DJ/lib/python3.11/site-packages/huggingface_hub/file_download.py", line 1722, in _get_metadata_or_catch_error
metadata = get_hf_file_metadata(url=url, proxies=proxies, timeout=etag_timeout, headers=headers)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/tingqiwang/anaconda3/envs/AI_DJ/lib/python3.11/site-packages/huggingface_hub/utils/_validators.py", line 114, in _inner_fn
return fn(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^
File "/Users/tingqiwang/anaconda3/envs/AI_DJ/lib/python3.11/site-packages/huggingface_hub/file_download.py", line 1645, in get_hf_file_metadata
r = _request_wrapper(
^^^^^^^^^^^^^^^^^
File "/Users/tingqiwang/anaconda3/envs/AI_DJ/lib/python3.11/site-packages/huggingface_hub/file_download.py", line 372, in _request_wrapper
response = _request_wrapper(
^^^^^^^^^^^^^^^^^
File "/Users/tingqiwang/anaconda3/envs/AI_DJ/lib/python3.11/site-packages/huggingface_hub/file_download.py", line 395, in _request_wrapper
response = get_session().request(method=method, url=url, **params)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/tingqiwang/anaconda3/envs/AI_DJ/lib/python3.11/site-packages/requests/sessions.py", line 589, in request
resp = self.send(prep, **send_kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/tingqiwang/anaconda3/envs/AI_DJ/lib/python3.11/site-packages/requests/sessions.py", line 703, in send
r = adapter.send(request, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/tingqiwang/anaconda3/envs/AI_DJ/lib/python3.11/site-packages/huggingface_hub/utils/_http.py", line 66, in send
return super().send(request, *args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/tingqiwang/anaconda3/envs/AI_DJ/lib/python3.11/site-packages/requests/adapters.py", line 713, in send
raise ReadTimeout(e, request=request)
requests.exceptions.ReadTimeout: (ReadTimeoutError("HTTPSConnectionPool(host='huggingface.co', port=443): Read timed out. (read timeout=10)"), '(Request ID: a23d297c-fee0-4956-ae58-15e0ed71a05e)')

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "/Users/tingqiwang/anaconda3/envs/AI_DJ/bin/allin1", line 8, in
sys.exit(main())
^^^^^^
File "/Users/tingqiwang/anaconda3/envs/AI_DJ/lib/python3.11/site-packages/allin1/cli.py", line 53, in main
analyze(
File "/Users/tingqiwang/anaconda3/envs/AI_DJ/lib/python3.11/site-packages/allin1/analyze.py", line 124, in analyze
model = load_pretrained_model(
^^^^^^^^^^^^^^^^^^^^^^
File "/Users/tingqiwang/anaconda3/envs/AI_DJ/lib/python3.11/site-packages/allin1/models/loaders.py", line 41, in load_pretrained_model
return load_ensemble_model(model_name, cache_dir, device)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/tingqiwang/anaconda3/envs/AI_DJ/lib/python3.11/site-packages/allin1/models/loaders.py", line 72, in load_ensemble_model
model = load_pretrained_model(model_name, cache_dir, device)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/tingqiwang/anaconda3/envs/AI_DJ/lib/python3.11/site-packages/allin1/models/loaders.py", line 53, in load_pretrained_model
checkpoint_path = hf_hub_download(repo_id='taejunkim/allinone', filename=filename, cache_dir=cache_dir)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/tingqiwang/anaconda3/envs/AI_DJ/lib/python3.11/site-packages/huggingface_hub/utils/_validators.py", line 114, in _inner_fn
return fn(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^
File "/Users/tingqiwang/anaconda3/envs/AI_DJ/lib/python3.11/site-packages/huggingface_hub/file_download.py", line 1221, in hf_hub_download
return _hf_hub_download_to_cache_dir(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/tingqiwang/anaconda3/envs/AI_DJ/lib/python3.11/site-packages/huggingface_hub/file_download.py", line 1325, in _hf_hub_download_to_cache_dir
_raise_on_head_call_error(head_call_error, force_download, local_files_only)
File "/Users/tingqiwang/anaconda3/envs/AI_DJ/lib/python3.11/site-packages/huggingface_hub/file_download.py", line 1826, in _raise_on_head_call_error
raise LocalEntryNotFoundError(
huggingface_hub.utils._errors.LocalEntryNotFoundError: An error happened while trying to locate the file on the Hub and we cannot find the requested files in the local cache. Please check your connection and try again or make sure your Internet connection is on.

IndexError: index 0 is out of bounds for axis 0 with size 0

Hi, I'm currently running all-in-one on Ubuntu using WSL for Windows (because I wasn't able to compile Natten from source to run it natively. I was analyzing some songs and it worked out fine until I faced the following error:

Analyzing 02 - Genesis.mp3:  30%|████████████████▏                                     | 72/241 [05:07<12:02,  4.28s/it]
Traceback (most recent call last):
  File "bin/allin1", line 8, in <module>
    sys.exit(main())
  File "lib/python3.9/site-packages/allin1/cli.py", line 49, in main
    analyze(
  File "lib/python3.9/site-packages/allin1/analyze.py", line 114, in analyze
    bpm = estimate_tempo_from_beats(metrical_structure['beats'])
  File "lib/python3.9/site-packages/allin1/postprocessing/tempo.py", line 19, in estimate_tempo_from_beats
    bpm_est = bpm_cand[0, 0]
IndexError: index 0 is out of bounds for axis 0 with size 0```

Label Preprocessing Problem

Hello! I would like to know how you deal with the silence tag in the tag, because I see that there is no silence in your tag

MPS Alternatives to Natten

For mac users analysing audio files takes a really long time because it's all done on CPU without utilising Metal acceleration. Is there a way to provide alternative kernels for these tasks that have sliding window self-attention, or equivalent algorithm but also have kernel backend compiled for MPS?

[macOS] [natten] failed to install natten

hi @tae-jun

I found that on macOS, the natten installation fails.

but if use this version of natten, it will installed successfully.

pip uninstall natten
pip install --no-cache-dir git+https://github.com/alihassanijr/NATTEN-Torch.git@78b8681a14bad3c3cb365b0ecfc18b6151c7d354

tested on
platform : osx-arm64
macOS: Sonoma (14.0)
python version : 3.10.9

TypeError: natten1dqkrpb() takes 4 positional arguments but 5 were given

Dear Taejun, dear Vitaly,

We really love your work and would like to point out that it's brilliant!
We'd love to try it out but are currently stuck with the latest error:

TypeError: natten1dqkrpb() takes 4 positional arguments but 5 were given

We're using Python 3.11.7 on a Apple M1 Silicon with manually installed Natten 0.14.4 - might there be a problem with Natten 0.14.4 beeing older than the latest version 0.17.1?

We'd love to hear from you!

Thank you so much and best regards!


TypeError Traceback (most recent call last)
Cell In[6], line 1
----> 1 result = allin1.analyze('/Users/justus/Downloads/Happy.wav')

File /opt/anaconda3/lib/python3.11/site-packages/allin1/analyze.py:134, in analyze(paths, out_dir, visualize, sonify, model, device, include_activations, include_embeddings, demix_dir, spec_dir, keep_byproducts, overwrite, multiprocess)
131 for path, spec_path in pbar:
132 pbar.set_description(f'Analyzing {path.name}')
--> 134 result = run_inference(
135 path=path,
136 spec_path=spec_path,
137 model=model,
138 device=device,
139 include_activations=include_activations,
140 include_embeddings=include_embeddings,
141 )
143 # Save the result right after the inference.
144 # Checkpointing is always important for this kind of long-running tasks...
145 # for my mental health...
146 if out_dir is not None:

File /opt/anaconda3/lib/python3.11/site-packages/allin1/helpers.py:29, in run_inference(path, spec_path, model, device, include_activations, include_embeddings)
26 spec = np.load(spec_path)
27 spec = torch.from_numpy(spec).unsqueeze(0).to(device)
---> 29 logits = model(spec)
31 metrical_structure = postprocess_metrical_structure(logits, model.cfg)
32 functional_structure = postprocess_functional_structure(logits, model.cfg)

File /opt/anaconda3/lib/python3.11/site-packages/torch/nn/modules/module.py:1532, in Module._wrapped_call_impl(self, *args, **kwargs)
1530 return self._compiled_call_impl(*args, **kwargs) # type: ignore[misc]
1531 else:
-> 1532 return self._call_impl(*args, **kwargs)

File /opt/anaconda3/lib/python3.11/site-packages/torch/nn/modules/module.py:1541, in Module._call_impl(self, *args, **kwargs)
1536 # If we don't have any hooks, we want to skip the rest of the logic in
1537 # this function, and just call forward.
1538 if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks or self._forward_pre_hooks
1539 or _global_backward_pre_hooks or _global_backward_hooks
1540 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1541 return forward_call(*args, **kwargs)
1543 try:
1544 result = None

File /opt/anaconda3/lib/python3.11/site-packages/allin1/models/ensemble.py:21, in Ensemble.forward(self, x)
20 def forward(self, x):
---> 21 outputs: List[AllInOneOutput] = [model(x) for model in self.models]
22 avg = AllInOneOutput(
23 logits_beat=torch.stack([output.logits_beat for output in outputs], dim=0).mean(dim=0),
24 logits_downbeat=torch.stack([output.logits_downbeat for output in outputs], dim=0).mean(dim=0),
(...)
27 embeddings=torch.stack([output.embeddings for output in outputs], dim=-1),
28 )
30 return avg

File /opt/anaconda3/lib/python3.11/site-packages/allin1/models/ensemble.py:21, in (.0)
20 def forward(self, x):
---> 21 outputs: List[AllInOneOutput] = [model(x) for model in self.models]
22 avg = AllInOneOutput(
23 logits_beat=torch.stack([output.logits_beat for output in outputs], dim=0).mean(dim=0),
24 logits_downbeat=torch.stack([output.logits_downbeat for output in outputs], dim=0).mean(dim=0),
(...)
27 embeddings=torch.stack([output.embeddings for output in outputs], dim=-1),
28 )
30 return avg

File /opt/anaconda3/lib/python3.11/site-packages/torch/nn/modules/module.py:1532, in Module._wrapped_call_impl(self, *args, **kwargs)
1530 return self._compiled_call_impl(*args, **kwargs) # type: ignore[misc]
1531 else:
-> 1532 return self._call_impl(*args, **kwargs)

File /opt/anaconda3/lib/python3.11/site-packages/torch/nn/modules/module.py:1541, in Module._call_impl(self, *args, **kwargs)
1536 # If we don't have any hooks, we want to skip the rest of the logic in
1537 # this function, and just call forward.
1538 if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks or self._forward_pre_hooks
1539 or _global_backward_pre_hooks or _global_backward_hooks
1540 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1541 return forward_call(*args, **kwargs)
1543 try:
1544 result = None

File /opt/anaconda3/lib/python3.11/site-packages/allin1/models/allinone.py:51, in AllInOne.forward(self, inputs, output_attentions)
48 inputs = inputs.reshape(-1, 1, T, F) # N x K, C=1, T, F=81
49 frame_embed = self.embeddings(inputs) # NK, T, C=16
---> 51 encoder_outputs = self.encoder(
52 frame_embed,
53 output_attentions=output_attentions,
54 )
55 hidden_state_levels = encoder_outputs[0]
57 hidden_states = hidden_state_levels[-1].reshape(N, K, T, -1) # N, K, T, C=16

File /opt/anaconda3/lib/python3.11/site-packages/torch/nn/modules/module.py:1532, in Module._wrapped_call_impl(self, *args, **kwargs)
1530 return self._compiled_call_impl(*args, **kwargs) # type: ignore[misc]
1531 else:
-> 1532 return self._call_impl(*args, **kwargs)

File /opt/anaconda3/lib/python3.11/site-packages/torch/nn/modules/module.py:1541, in Module._call_impl(self, *args, **kwargs)
1536 # If we don't have any hooks, we want to skip the rest of the logic in
1537 # this function, and just call forward.
1538 if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks or self._forward_pre_hooks
1539 or _global_backward_pre_hooks or _global_backward_hooks
1540 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1541 return forward_call(*args, **kwargs)
1543 try:
1544 result = None

File /opt/anaconda3/lib/python3.11/site-packages/allin1/models/allinone.py:110, in AllInOneEncoder.forward(self, frame_embed, output_attentions)
108 hidden_states = frame_embed
109 for i, layer in enumerate(self.layers):
--> 110 layer_outputs = layer(hidden_states, output_attentions)
111 hidden_states = layer_outputs[0]
112 hidden_state_levels.append(hidden_states)

File /opt/anaconda3/lib/python3.11/site-packages/torch/nn/modules/module.py:1532, in Module._wrapped_call_impl(self, *args, **kwargs)
1530 return self._compiled_call_impl(*args, **kwargs) # type: ignore[misc]
1531 else:
-> 1532 return self._call_impl(*args, **kwargs)

File /opt/anaconda3/lib/python3.11/site-packages/torch/nn/modules/module.py:1541, in Module._call_impl(self, *args, **kwargs)
1536 # If we don't have any hooks, we want to skip the rest of the logic in
1537 # this function, and just call forward.
1538 if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks or self._forward_pre_hooks
1539 or _global_backward_pre_hooks or _global_backward_hooks
1540 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1541 return forward_call(*args, **kwargs)
1543 try:
1544 result = None

File /opt/anaconda3/lib/python3.11/site-packages/allin1/models/allinone.py:170, in AllInOneBlock.forward(self, hidden_states, output_attentions)
167 NK, T, C = hidden_states.shape
168 N, K = NK // self.cfg.data.num_instruments, self.cfg.data.num_instruments
--> 170 timelayer_outputs = self.timelayer(hidden_states, output_attentions)
171 hidden_states = timelayer_outputs[0]
172 if self.cfg.instrument_attention:

File /opt/anaconda3/lib/python3.11/site-packages/torch/nn/modules/module.py:1532, in Module._wrapped_call_impl(self, *args, **kwargs)
1530 return self._compiled_call_impl(*args, **kwargs) # type: ignore[misc]
1531 else:
-> 1532 return self._call_impl(*args, **kwargs)

File /opt/anaconda3/lib/python3.11/site-packages/torch/nn/modules/module.py:1541, in Module._call_impl(self, *args, **kwargs)
1536 # If we don't have any hooks, we want to skip the rest of the logic in
1537 # this function, and just call forward.
1538 if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks or self._forward_pre_hooks
1539 or _global_backward_pre_hooks or _global_backward_hooks
1540 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1541 return forward_call(*args, **kwargs)
1543 try:
1544 result = None

File /opt/anaconda3/lib/python3.11/site-packages/allin1/models/dinat.py:298, in _DinatLayerNd.forward(self, hidden_states, output_attentions)
295 if attention is None:
296 continue
--> 298 attention_output = attention(attention_inputs, output_attentions=output_attentions)
299 attention_output = attention_output[0]
301 if is_2d:

File /opt/anaconda3/lib/python3.11/site-packages/torch/nn/modules/module.py:1532, in Module._wrapped_call_impl(self, *args, **kwargs)
1530 return self._compiled_call_impl(*args, **kwargs) # type: ignore[misc]
1531 else:
-> 1532 return self._call_impl(*args, **kwargs)

File /opt/anaconda3/lib/python3.11/site-packages/torch/nn/modules/module.py:1541, in Module._call_impl(self, *args, **kwargs)
1536 # If we don't have any hooks, we want to skip the rest of the logic in
1537 # this function, and just call forward.
1538 if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks or self._forward_pre_hooks
1539 or _global_backward_pre_hooks or _global_backward_hooks
1540 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1541 return forward_call(*args, **kwargs)
1543 try:
1544 result = None

File /opt/anaconda3/lib/python3.11/site-packages/allin1/models/dinat.py:193, in _NeighborhoodAttentionModuleNd.forward(self, hidden_states, output_attentions)
188 def forward(
189 self,
190 hidden_states: torch.Tensor,
191 output_attentions: Optional[bool] = False,
192 ) -> Tuple[torch.Tensor]:
--> 193 self_outputs = self.self(hidden_states, output_attentions)
194 attention_output = self.output(self_outputs[0])
195 outputs = (attention_output,) + self_outputs[1:] # add attentions if we output them

File /opt/anaconda3/lib/python3.11/site-packages/torch/nn/modules/module.py:1532, in Module._wrapped_call_impl(self, *args, **kwargs)
1530 return self._compiled_call_impl(*args, **kwargs) # type: ignore[misc]
1531 else:
-> 1532 return self._call_impl(*args, **kwargs)

File /opt/anaconda3/lib/python3.11/site-packages/torch/nn/modules/module.py:1541, in Module._call_impl(self, *args, **kwargs)
1536 # If we don't have any hooks, we want to skip the rest of the logic in
1537 # this function, and just call forward.
1538 if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks or self._forward_pre_hooks
1539 or _global_backward_pre_hooks or _global_backward_hooks
1540 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1541 return forward_call(*args, **kwargs)
1543 try:
1544 result = None

File /opt/anaconda3/lib/python3.11/site-packages/allin1/models/dinat.py:99, in _NeighborhoodAttentionNd.forward(self, hidden_states, output_attentions)
95 query_layer = query_layer / math.sqrt(self.attention_head_size)
97 # Compute NA between "query" and "key" to get the raw attention scores, and add relative positional biases.
98 # attention_scores = natten2dqkrpb(query_layer, key_layer, self.rpb, self.dilation)
---> 99 attention_scores = self.nattendqkrpb(query_layer, key_layer, self.rpb, self.kernel_size, self.dilation)
101 # Normalize the attention scores to probabilities.
102 attention_probs = nn.functional.softmax(attention_scores, dim=-1)

TypeError: natten1dqkrpb() takes 4 positional arguments but 5 were given

Differences between models on HF

Hey @tae-jun
Thank you for open sourcing such exciting work!
I noticed that there are many models on hf , among which HarmoniX-ALL is the use of all models prefixed with "harmonix", so what is the difference between models prefixed with "all" and them, and can I use models prefixed with all in my code.

tks
looking forward to your reply

Specifying device for inference gives error

First of all Great work,

I can't understand why but when I try to specify a device like:

result = allin1.analyze(
    'Antonio_Vivaldi_Concerto.wav',
    device="cuda:2",
    overwrite = True,
    out_dir = output_dir,
)

the prediction is all weird and I get errors like "ValueError: zero-size array to reduction operation minimum which has no identity " because the model predictions are weird like this :

[AnalysisResult(path=PosixPath('/home/ubaid/Music_Image_CM/MuIm_model/Music2Image/music2visual_story/output/Antonio_Vivaldi_Concerto.wav'), bpm=None, beats=[], downbeats=[], beat_positions=[], segments=[Segment(start=0.0, end=18.1, label='chorus'), Segment(start=18.1, end=53.94, label='chorus'), Segment(start=53.94, end=69.3, label='chorus'), Segment(start=69.3, end=84.66, label='chorus'), Segment(start=84.66, end=105.14, label='chorus'), Segment(start=105.14, end=135.86, label='chorus'), Segment(start=135.86, end=151.22, label='chorus'), Segment(start=151.22, end=181.94, label='chorus'), Segment(start=181.94, end=197.3, label='chorus'), Segment(start=197.3, end=217.78, label='chorus'), Segment(start=217.78, end=233.14, label='chorus'), Segment(start=233.14, end=248.5, label='chorus'), Segment(start=248.5, end=274.1, label='chorus'), Segment(start=274.1, end=304.82, label='chorus'), Segment(start=304.82, end=325.3, label='chorus'), Segment(start=325.3, end=373.3, label='chorus'), Segment(start=373.3, end=396.98, label='chorus'), Segment(start=396.98, end=417.46, label='chorus'), Segment(start=417.46, end=448.38, label='chorus')], activations=None, embeddings=None)]

however when there is no device given then it works normally. can you specify why this is happening?

Training code

Great job with the paper! Have been testing and it works really well.
Any chance of sharing a traning code example/guide?

Thanks in advance

HarmonixSet segment pre-processing?

Hey @tae-jun,

Thanks for your research and this amazing model! I'm trying to re-create your training process, and I ran into an issue while trying to run allin1-train command. I'm getting this kind of error:

File "/workspace/src/allin1/training/data/datasets/harmonix/dataset.py", line 74, in __getitem__
    data = super().__getitem__(idx)
           ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/workspace/src/allin1/training/data/datasets/datasetbase.py", line 78, in __getitem__
    true_function = st.section.of_frames(encode=True, return_labels=True)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/workspace/src/allin1/training/data/eventconverters/eventconverters.py", line 153, in of_frames
    labels = np.array([self.label_map[l] for l in labels])
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/workspace/src/allin1/training/data/eventconverters/eventconverters.py", line 153, in <listcomp>
    labels = np.array([self.label_map[l] for l in labels])
                       ~~~~~~~~~~~~~~^^^
KeyError: 'prechorus'

As far as I understand, that error is caused by the fact that the HARMONIX_LABELS doesn't contain the value prechrous, but this value exists as one of the segments in the Harmonix data set for example here: https://github.com/urinieto/harmonixset/blob/master/dataset/segments/0006_aint2proud2beg.txt

There are more labels that exist in the Harmonix dataset, but they are not listed in HARMONIX_LABELS, so I'm wondering if you did some kind of manual data preprocessing that converted all segment labels from Harmonix dataset into your list?

I would appreciate your guidance, as I would love to re-create your training process, to see if this model could be trained not only on full songs, but also on parts of songs for beat/downbeat detection.

Thanks in advance for your help!

dinat.py - line 109 - natten1dav

=> Found 1 tracks to analyze.
=> Found 1 tracks already demixed, 0 to demix.
=> Found 1 spectrograms already extracted, 0 to extract.
Analyzing 000RDkxd2pP2Td.mp3: 0%| | 0/1 [00:06<?, ?it/s]

Exception has occurred: TypeError
natten1dav() takes 3 positional arguments but 4 were given
File "/ai-music/all-in-one/src/allin1/models/dinat.py", line 109, in forward
context_layer = self.nattendav(attention_probs, value_layer, self.kernel_size, self.dilation)
File "/ai-music/all-in-one/src/allin1/models/dinat.py", line 193, in forward
self_outputs = self.self(hidden_states, output_attentions)
File "/ai-music/all-in-one/src/allin1/models/dinat.py", line 298, in forward
attention_output = attention(attention_inputs, output_attentions=output_attentions)
File "/ai-music/all-in-one/src/allin1/models/allinone.py", line 170, in forward
timelayer_outputs = self.timelayer(hidden_states, output_attentions)
File "/ai-music/all-in-one/src/allin1/models/allinone.py", line 110, in forward
layer_outputs = layer(hidden_states, output_attentions)
File "/ai-music/all-in-one/src/allin1/models/allinone.py", line 51, in forward
encoder_outputs = self.encoder(
File "/ai-music/all-in-one/src/allin1/models/ensemble.py", line 21, in
outputs: List[AllInOneOutput] = [model(x) for model in self.models]
File "/ai-music/all-in-one/src/allin1/models/ensemble.py", line 21, in forward
outputs: List[AllInOneOutput] = [model(x) for model in self.models]
File "/ai-music/all-in-one/src/allin1/analyze.py", line 110, in analyze
logits = model(spec)
File "/ai-music/all-in-one/src/main.py", line 3, in
result = allin1.analyze("/ai-music/MIA/input/000RDkxd2pP2Td.mp3")
TypeError: natten1dav() takes 3 positional arguments but 4 were given

when i run the python script, i got the above issue. The model loaded is https://huggingface.co/taejunkim/allinone/resolve/main/harmonix-fold0-0vra4ys2.pth, am i loading the right model?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.