GithubHelp home page GithubHelp logo

voidful / codec-superb Goto Github PK

View Code? Open in Web Editor NEW
187.0 12.0 22.0 3.37 MB

Audio Codec Speech processing Universal PERformance Benchmark

Home Page: https://codecsuperb.com

Python 88.53% HTML 1.45% CSS 1.57% JavaScript 6.28% Dockerfile 2.17%
audio audio-codec codec speech superb

codec-superb's Introduction

Codec-SUPERB: Sound Codec Speech Processing Universal Performance Benchmark

Overview

Codec-SUPERB is a comprehensive benchmark designed to evaluate audio codec models across a variety of speech tasks. Our goal is to facilitate community collaboration and accelerate advancements in the field of speech processing by preserving and enhancing speech information quality.

Table of Contents

Introduction

Codec-SUPERB sets a new benchmark in evaluating sound codec models, providing a rigorous and transparent framework for assessing performance across a range of speech processing tasks. Our goal is to foster innovation and set new standards in audio quality and processing efficiency.

Key Features

Out-of-the-Box Codec Interface

Codec-SUPERB offers an intuitive, out-of-the-box codec interface that allows for easy integration and testing of various codec models, facilitating quick iterations and experiments.

Multi-Perspective Leaderboard

Codec-SUPERB's unique blend of multi-perspective evaluation and an online leaderboard drives innovation in sound codec research by providing a comprehensive assessment and fostering competitive transparency among developers.

Standardized Environment

We ensure a standardized testing environment to guarantee fair and consistent comparison across all models. This uniformity brings reliability to benchmark results, making them universally interpretable.

Unified Datasets

We provide a collection of unified datasets, curated to test a wide range of speech processing scenarios. This ensures that models are evaluated under diverse conditions, reflecting real-world applications.

Installation

git clone https://github.com/voidful/Codec-SUPERB.git
cd Codec-SUPERB
pip install -r requirements.txt

Usage

Out of the Box Codec Interface

from SoundCodec import codec
import torchaudio

# get all available codec
print(codec.list_codec())
# load codec by name, use encodec as example
encodec_24k_6bps = codec.load_codec('encodec_24k_6bps')

# load audio
waveform, sample_rate = torchaudio.load('sample audio')
resampled_waveform = waveform.numpy()[-1]
data_item = {'audio': {'array': resampled_waveform,
                       'sampling_rate': sample_rate}}

# extract unit
sound_unit = encodec_24k_6bps.extract_unit(data_item).unit

# sound synthesis
decoded_waveform = encodec_24k_6bps.synth(sound_unit, local_save=False)['audio']['array']

Citation

If you use this code or result in your paper, please cite our work as:

@misc{wu2024codecsuperb,
      title={Codec-SUPERB: An In-Depth Analysis of Sound Codec Models}, 
      author={Haibin Wu and Ho-Lam Chung and Yi-Cheng Lin and Yuan-Kuei Wu and Xuanjun Chen and Yu-Chi Pai and Hsiu-Hsuan Wang and Kai-Wei Chang and Alexander H. Liu and Hung-yi Lee},
      year={2024},
      eprint={2402.13071},
      archivePrefix={arXiv},
      primaryClass={eess.AS}
}
@article{wu2024towards,
  title={Towards audio language modeling-an overview},
  author={Wu, Haibin and Chen, Xuanjun and Lin, Yi-Cheng and Chang, Kai-wei and Chung, Ho-Lam and Liu, Alexander H and Lee, Hung-yi},
  journal={arXiv preprint arXiv:2402.13236},
  year={2024}
}

Contribution

Contributions are highly encouraged, whether it's through adding new codec models, expanding the dataset collection, or enhancing the benchmarking framework. Please see CONTRIBUTING.md for more details.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Reference Sound Codec Repositories:

codec-superb's People

Contributors

hbwu-ntu avatar kuan2jiu99 avatar stanwang1210 avatar voidful avatar ywk991112 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

codec-superb's Issues

Results of LaDiffCodec (1.5kbps)

Scores updated:


Acc_ground_truth: 93.85%
Acc_resync_audio: 16.10%
Cos_similarity: 36.48%
ACC: 16.10%


Log results

File Name: crema_d.log
Codec SUPERB objective metric evaluation on crema_d

Stage 1: Run SDR evaluation.
SDR: mean score is: -0.6618466287421877

Stage 2: Run Mel Spectrogram Loss.
mel_loss: mean score is: 4.7431045

Stage 3: Run STOI.
stoi:

Stage 4: Run PESQ.
pesq: mean score is: 1.1791039681434632

File Name: esc50.log
Codec SUPERB objective metric evaluation on esc50

Stage 1: Run SDR evaluation.
SDR: mean score is: -7.735703443297681

Stage 2: Run Mel Spectrogram Loss.
mel_loss: mean score is: 3.6174948

File Name: fluent_speech_commands.log
Codec SUPERB objective metric evaluation on fluent_speech_commands

Stage 1: Run SDR evaluation.
SDR: mean score is: 4.330545305329152

Stage 2: Run Mel Spectrogram Loss.
mel_loss: mean score is: 2.7490408

Stage 3: Run STOI.
stoi: mean score is: 0.7800622448245815

Stage 4: Run PESQ.
pesq: mean score is: 1.6228661406040192

File Name: fsd50k.log
Codec SUPERB objective metric evaluation on fsd50k

Stage 1: Run SDR evaluation.
SDR: mean score is: -5.688258628657724

Stage 2: Run Mel Spectrogram Loss.
mel_loss: mean score is: 4.0113335

File Name: gunshot_triangulation.log
Codec SUPERB objective metric evaluation on gunshot_triangulation

Stage 1: Run SDR evaluation.
SDR: mean score is: -2.769766115983086

Stage 2: Run Mel Spectrogram Loss.
mel_loss: mean score is: 2.239529

File Name: libri2Mix_test.log
Codec SUPERB objective metric evaluation on libri2Mix_test

Stage 1: Run SDR evaluation.
SDR: mean score is: 1.2123890992883006

Stage 2: Run Mel Spectrogram Loss.
mel_loss: mean score is: 1.7746849

Stage 3: Run STOI.
stoi: mean score is: 0.7529617185269315

Stage 4: Run PESQ.
pesq: mean score is: 1.3319110035896302

File Name: librispeech.log
Codec SUPERB objective metric evaluation on librispeech

Stage 1: Run SDR evaluation.
SDR: mean score is: 4.48363052891714

Stage 2: Run Mel Spectrogram Loss.
mel_loss: mean score is: 2.1447082

Stage 3: Run STOI.
stoi: mean score is: 0.8117344206829971

Stage 4: Run PESQ.
pesq: mean score is: 1.7257570731639862

File Name: quesst.log
Codec SUPERB objective metric evaluation on quesst

Stage 1: Run SDR evaluation.
SDR: mean score is: 3.0613881509402994

Stage 2: Run Mel Spectrogram Loss.
mel_loss: mean score is: 3.3179162

Stage 3: Run STOI.
stoi: mean score is: 0.7105730301462775

Stage 4: Run PESQ.
pesq: mean score is: 1.4366185867786407

File Name: snips_test_valid_subset.log
Codec SUPERB objective metric evaluation on snips_test_valid_subset

Stage 1: Run SDR evaluation.
SDR: mean score is: 6.483090668408405

Stage 2: Run Mel Spectrogram Loss.
mel_loss: mean score is: 1.9094324

Stage 3: Run STOI.
stoi: mean score is: 0.8549385395393462

Stage 4: Run PESQ.
pesq: mean score is: 1.8450518810749055

File Name: vox_lingua_top10.log
Codec SUPERB objective metric evaluation on vox_lingua_top10

Stage 1: Run SDR evaluation.
SDR: mean score is: 2.299034565789743

Stage 2: Run Mel Spectrogram Loss.
mel_loss: mean score is: 3.4319177

Stage 3: Run STOI.
stoi:

Stage 4: Run PESQ.
pesq: mean score is: 1.3151621878147126

File Name: voxceleb1.log
Codec SUPERB objective metric evaluation on voxceleb1

Stage 1: Run SDR evaluation.
SDR: mean score is: 2.138264888912873

Stage 2: Run Mel Spectrogram Loss.
mel_loss: mean score is: 1.984266

Stage 3: Run STOI.
stoi: mean score is: 0.7347235382930105

Stage 4: Run PESQ.
pesq: mean score is: 1.54294668674469

Average SDR for speech datasets: 3.618218106966028
Average Mel_Loss for speech datasets: 2.313341416666667
Average STOI for speech datasets: 0.7741655820021908
Average PESQ for speech datasets: 1.5841918953259786
Average SDR for audio datasets: -5.397909395979497
Average Mel_Loss for audio datasets: 3.289452433333333

[Questions]overall score for signal metric and script for downstream tasks

Hi developers,

I find your paper is quite useful to evaluate the generated codec.
Here are two questions I have so far:
Do we have the overall score measurement script ? I find there's a metrics.py implementing all the required signal level metrics. And I've tested it and it works well. But I couldn't find how to combine the metrics together to get an overall score as mentioned in the paper.

And for the downstream tasks, do we have any plan to share the evaluation scripts?

Thanks!

audiodec model downloaded from voidful/Audiodec has import error in dataset creator

I follow the exception message to
pip install git+https://github.com/voidful/AudioDec.git
But there seem to have import errors.
Error message:

Synthesizing dataset with audiodec_24k_320d
Traceback (most recent call last):
  File "~/Codec-SUPERB/codec/audiodec_24k_320d.py", line 9, in config
    from AudioDec.utils.audiodec import AudioDec as AudioDecModel, assign_model
  File "~/.local/lib/python3.10/site-packages/AudioDec/utils/audiodec.py", line 15, in <module>
    from AudioDec.models.vocoder.HiFiGAN import StreamGenerator as generator_hifigan
  File "~/.local/lib/python3.10/site-packages/AudioDec/models/vocoder/HiFiGAN.py", line 24, in <module>
    from AudioDec.models.vocoder.modules.discriminator import HiFiGANMultiScaleDiscriminator
  File "~/.local/lib/python3.10/site-packages/AudioDec/models/vocoder/modules/discriminator.py", line 24, in <module>
    from layers.conv_layer import NonCausalConv1d, NonCausalConv2d
ModuleNotFoundError: No module named 'layers'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "~/Codec-SUPERB/dataset_creator.py", line 73, in <module>
    run_experiment(args.dataset)
  File "~/Codec-SUPERB/dataset_creator.py", line 33, in run_experiment
    codec = load_codec(codec_name)
  File "~/Codec-SUPERB/codec/__init__.py", line 6, in load_codec
    return module.Codec()
  File "~/Codec-SUPERB/base_codec/audiodec.py", line 9, in __init__
    self.config()
  File "~/Codec-SUPERB/codec/audiodec_24k_320d.py", line 11, in config
    raise Exception("Please install AudioDec first. pip install git+https://github.com/voidful/AudioDec.git")

Query on Resampling and Audio Format Compliance in Competition Rules

Hello, in the released development set, different test sets have varying sampling rates such as 8kHz, 16kHz, 44.1kHz, and 48kHz, as well as different audio formats like WAV and FLAC. My model was trained on 16kHz speech data. During inference, if the input audio is not 16kHz, it will be automatically resampled to 16kHz before encoding and reconstruction. Does this comply with the competition rules?

Result For SpeechTokenizer

Here is the result for SpeechTokenizer.

The bit rate is 2kbps, following are the results:

Results in exps/results.txt

Codec SUPERB application evaluation

Stage 1: Run speech emotion recognition.
Acc: 72.15%

Stage 2: Run speaker related evaluation.
EER: 4.03%

Stage 3: Run automatic speech recognition.
WER: 4.55%

Stage 4: Run audio event classification.
ACC: 25.50%


Result in src/codec_metrics/exps/results.txt

Log results

File Name: crema_d.log
Codec SUPERB objective metric evaluation on crema_d

Stage 1: Run SDR evaluation.
SDR: mean score is: -29.90983049070145

Stage 2: Run Mel Spectrogram Loss.
mel_loss: mean score is: 5.345735

Stage 3: Run STOI.
stoi: mean score is: 0.06024890838574476

Stage 4: Run PESQ.
pesq: mean score is: 1.586073912382126

File Name: esc50.log
Codec SUPERB objective metric evaluation on esc50

Stage 1: Run SDR evaluation.
SDR: mean score is: -22.282276880645814

Stage 2: Run Mel Spectrogram Loss.
mel_loss: mean score is: 3.4074209

File Name: fluent_speech_commands.log
Codec SUPERB objective metric evaluation on fluent_speech_commands

Stage 1: Run SDR evaluation.
SDR: mean score is: 1.5112133717223253

Stage 2: Run Mel Spectrogram Loss.
mel_loss: mean score is: 0.8877456

Stage 3: Run STOI.
stoi: mean score is: 0.8648300690857609

Stage 4: Run PESQ.
pesq: mean score is: 2.170962030887604

File Name: fsd50k.log
Codec SUPERB objective metric evaluation on fsd50k

Stage 1: Run SDR evaluation.
SDR: mean score is: -21.45771079855064

Stage 2: Run Mel Spectrogram Loss.
mel_loss: mean score is: 3.1137948

File Name: gunshot_triangulation.log
Codec SUPERB objective metric evaluation on gunshot_triangulation

Stage 1: Run SDR evaluation.
SDR: mean score is: -22.950851389668035

Stage 2: Run Mel Spectrogram Loss.
mel_loss: mean score is: 4.621136

File Name: libri2Mix_test.log
Codec SUPERB objective metric evaluation on libri2Mix_test

Stage 1: Run SDR evaluation.
SDR: mean score is: -3.846337947640395

Stage 2: Run Mel Spectrogram Loss.
mel_loss: mean score is: 0.9027287

Stage 3: Run STOI.
stoi: mean score is: 0.8309377170272262

Stage 4: Run PESQ.
pesq: mean score is: 1.5058157062530517

File Name: librispeech.log
Codec SUPERB objective metric evaluation on librispeech

Stage 1: Run SDR evaluation.
SDR: mean score is: 1.0211239468849096

Stage 2: Run Mel Spectrogram Loss.
mel_loss: mean score is: 0.8223095

Stage 3: Run STOI.
stoi: mean score is: 0.8872668136911973

Stage 4: Run PESQ.
pesq: mean score is: 2.2581932806968688

File Name: quesst.log
Codec SUPERB objective metric evaluation on quesst

Stage 1: Run SDR evaluation.
SDR: mean score is: -1.774289102870904

Stage 2: Run Mel Spectrogram Loss.
mel_loss: mean score is: 1.153448

Stage 3: Run STOI.
stoi: mean score is: 0.7758606059083771

Stage 4: Run PESQ.
pesq: mean score is: 1.8245106658550223

File Name: snips_test_valid_subset.log
Codec SUPERB objective metric evaluation on snips_test_valid_subset

Stage 1: Run SDR evaluation.
SDR: mean score is: 3.7615257663215895

Stage 2: Run Mel Spectrogram Loss.
mel_loss: mean score is: 0.8986037

Stage 3: Run STOI.
stoi: mean score is: 0.9141771654461831

Stage 4: Run PESQ.
pesq: mean score is: 2.2321277034282683

File Name: vox_lingua_top10.log
Codec SUPERB objective metric evaluation on vox_lingua_top10

Stage 1: Run SDR evaluation.
SDR: mean score is: -27.182861328199774

Stage 2: Run Mel Spectrogram Loss.
mel_loss: mean score is: 5.430982

Stage 3: Run STOI.
stoi: mean score is: 0.14532493265232807

Stage 4: Run PESQ.
pesq: mean score is: 1.6926373445987701

File Name: voxceleb1.log
Codec SUPERB objective metric evaluation on voxceleb1

Stage 1: Run SDR evaluation.
SDR: mean score is: -1.9323934995843512

Stage 2: Run Mel Spectrogram Loss.
mel_loss: mean score is: 0.823112

Stage 3: Run STOI.
stoi: mean score is: 0.8241731080501418

Stage 4: Run PESQ.
pesq: mean score is: 1.9483790636062621

Average SDR for speech datasets: -4.06314554192561
Average Mel_Loss for speech datasets: 1.5598471285714286
Average STOI for speech datasets: 0.7489386302658877
Average PESQ for speech datasets: 1.9475179707608352
Average SDR for audio datasets: -22.23027968958767
Average Mel_Loss for audio datasets: 3.714117233333333

Codec SUPERB Challenge——How to use codec_superb_data for evaluation?

Hi, I found that codec_superb_data contains many datasets and does not give the code for data preprocessing, does it mean that I need to resynthesize each dataset separately by myself according to the two dataset classifications of SPEECH and AUDIO, and run run.sh separately for evaluating resynthesized audio obtained based on each dataset? Or do I need to put similar resynthesized files under either SPEECH or AUDIO classification together in advance, and run run.sh to get a score for resynthesized audio for all datasets under the same classification? I'm a bit confused about the evaluation rules and would appreciate an answer.

Minimal example

Thanks for your meaningful work. Could you share a minimal examples like superb with us to help us test our own model?
includeing: dataset, how to evaluate with a specific metric and so on :)

How to run the evaluations on GPU?

I want to use metrics in the project to make evaluations on my dataset. I write my testing script according to the benchmarking.py, and all metrics runs well when on CPU defaultly, but when I try to put generated audio and original audio onto cuda, it will run into error about the tqdm.contrib.multiprocessing.process_map will raise exceptions about: "Cannot re-initialize CUDA in forked subprocess. To use CUDA with multiprocessing, you must use the spawn start method".

The evaluation procedure and the required results

Hi, I have some questions about the submission.
For the evaluation of the Codec-SUPERB at SLT 2024, the repository instructions mention running both the bash run.sh and bash run_wrapper.sh scripts, followed by submitting the exps/results.txt and src/codec_metrics/exps/results.txt files. However, I have a few questions regarding the process:
The run.sh script includes four different stages, with the default being set to stage 4. Should we configure and run all the stages sequentially, or is it sufficient to only execute the default stage?
For the codec_metrics/run.sh script, there are several categories and datasets available for selection. Do we need to test each category and dataset, or should we only use the default settings provided in the script?
For the submission, is it correct that we only need to submit the two results.txt files by opening a new issue on the repository?

Thank you for your assistance in clarifying these points. I look forward to your response.

Results for APCodec

16 kHz 2kbps

parameter size:

encoder (including quantizer) : 29MB decoder: 40MB

exps/results.txt

Codec SUPERB application evaluation

Stage 1: Run speech emotion recognition.
Acc: 74.93%

Stage 2: Run speaker related evaluation.
Parsing the resyn_trial.txt for resyn wavs

Run speaker verification.
EER: 3.02%

Stage 3: Run automatic speech recognition.
WER: 4.74%

Stage 4: Run audio event classification.
ACC: 55.25%

src/codec_metrics/exps/results.txt

Log results

File Name: crema_d.log
Codec SUPERB objective metric evaluation on crema_d

Stage 1: Run SDR evaluation.
SDR: mean score is: -2.618520825954788

Stage 2: Run Mel Spectrogram Loss.
mel_loss: mean score is: 0.64869004

Stage 3: Run STOI.
stoi: mean score is: 0.717766808809779

Stage 4: Run PESQ.
pesq: mean score is: 1.5509950947761535

File Name: esc50.log
Codec SUPERB objective metric evaluation on esc50

Stage 1: Run SDR evaluation.
SDR: mean score is: -9.309950038168095

Stage 2: Run Mel Spectrogram Loss.
mel_loss: mean score is: 2.002597

File Name: fluent_speech_commands.log
Codec SUPERB objective metric evaluation on fluent_speech_commands

Stage 1: Run SDR evaluation.
SDR: mean score is: 2.68255129531442

Stage 2: Run Mel Spectrogram Loss.
mel_loss: mean score is: 0.87451327

Stage 3: Run STOI.
stoi: mean score is: 0.8740794643709145

Stage 4: Run PESQ.
pesq: mean score is: 2.1911674320697783

File Name: fsd50k.log
Codec SUPERB objective metric evaluation on fsd50k

Stage 1: Run SDR evaluation.
SDR: mean score is: -6.6539098549604345

Stage 2: Run Mel Spectrogram Loss.
mel_loss: mean score is: 1.8435475

File Name: gunshot_triangulation.log
Codec SUPERB objective metric evaluation on gunshot_triangulation

Stage 1: Run SDR evaluation.
SDR: mean score is: -3.0264018525811536

Stage 2: Run Mel Spectrogram Loss.
mel_loss: mean score is: 1.4431057

File Name: libri2Mix_test.log
Codec SUPERB objective metric evaluation on libri2Mix_test

Stage 1: Run SDR evaluation.
SDR: mean score is: -1.3850498167169416

Stage 2: Run Mel Spectrogram Loss.
mel_loss: mean score is: 0.85650134

Stage 3: Run STOI.
stoi: mean score is: 0.8534544293908012

Stage 4: Run PESQ.
pesq: mean score is: 1.5768725705146789

File Name: librispeech.log
Codec SUPERB objective metric evaluation on librispeech

Stage 1: Run SDR evaluation.
SDR: mean score is: 2.5759249020219706

Stage 2: Run Mel Spectrogram Loss.
mel_loss: mean score is: 0.8179612

Stage 3: Run STOI.
stoi: mean score is: 0.8975456227011622

Stage 4: Run PESQ.
pesq: mean score is: 2.2901515591144563

File Name: quesst.log
Codec SUPERB objective metric evaluation on quesst

Stage 1: Run SDR evaluation.
SDR: mean score is: -1.3464429268284184

Stage 2: Run Mel Spectrogram Loss.
mel_loss: mean score is: 0.9656775

Stage 3: Run STOI.
stoi: mean score is: 0.7968180258305204

Stage 4: Run PESQ.
pesq: mean score is: 1.7317036986351013

File Name: snips_test_valid_subset.log
Codec SUPERB objective metric evaluation on snips_test_valid_subset

Stage 1: Run SDR evaluation.
SDR: mean score is: 4.364046016689939

Stage 2: Run Mel Spectrogram Loss.
mel_loss: mean score is: 0.8910932

Stage 3: Run STOI.
stoi: mean score is: 0.9133034388476792

Stage 4: Run PESQ.
pesq: mean score is: 2.245469583272934

File Name: voxceleb1.log
Codec SUPERB objective metric evaluation on voxceleb1

Stage 1: Run SDR evaluation.
SDR: mean score is: 1.5015711204024194

Stage 2: Run Mel Spectrogram Loss.
mel_loss: mean score is: 0.78175646

Stage 3: Run STOI.
stoi: mean score is: 0.8577775240334691

Stage 4: Run PESQ.
pesq: mean score is: 2.120602227449417

File Name: vox_lingua_top10.log
Codec SUPERB objective metric evaluation on vox_lingua_top10

Stage 1: Run SDR evaluation.
SDR: mean score is: -0.22438148479388495

Stage 2: Run Mel Spectrogram Loss.
mel_loss: mean score is: 0.6584927

Stage 3: Run STOI.
stoi: mean score is: 0.8339094697226409

Stage 4: Run PESQ.
pesq: mean score is: 1.8127213382720948

Average SDR for speech datasets: 0.6937122850168396
Average Mel_Loss for speech datasets: 0.8118357137500001
Average STOI for speech datasets: 0.8430818479633708
Average PESQ for speech datasets: 1.9399604380130768
Average SDR for audio datasets: -6.330087248569893
Average Mel_Loss for audio datasets: 1.7630834000000002

Results of Funcodec

Bit rate=8k

Downstream tasks (only 16khz model used)

Stage 1: Run speech emotion recognition.
Acc: 75.21%

Stage 2: Run speaker related evaluation.
Parsing the resyn_trial.txt for resyn wavs

Run speaker verification.
EER: 1.56%

Stage 3: Run automatic speech recognition.
WER: 3.13%

Stage 4: Run audio event classification.
ACC: 83.30%

For reference, DAC 44.1khz for audio_event_classification got ACC: 90.55%

Objective Results (16khz model for 16khz samples and 48khz model for 48khz samples)

Log results
--------------------------------------------------
File Name: crema_d.log
Codec SUPERB objective metric evaluation on crema_d

Stage 1: Run SDR evaluation.
SDR: mean score is: 7.664355354532293

Stage 2: Run Mel Spectrogram Loss.
mel_loss: mean score is: 1.9301372

Stage 3: Run STOI.
stoi: mean score is: 0.8652290511677259

Stage 4: Run PESQ.
pesq: mean score is: 1.9714515495300293
--------------------------------------------------
File Name: esc50.log
Codec SUPERB objective metric evaluation on esc50

Stage 1: Run SDR evaluation.
SDR: mean score is: 0.28843353322945814

Stage 2: Run Mel Spectrogram Loss.
mel_loss: mean score is: 1.5668296
--------------------------------------------------
File Name: fluent_speech_commands.log
Codec SUPERB objective metric evaluation on fluent_speech_commands

Stage 1: Run SDR evaluation.
SDR: mean score is: 8.47528477173951

Stage 2: Run Mel Spectrogram Loss.
mel_loss: mean score is: 1.4804714

Stage 3: Run STOI.
stoi: mean score is: 0.9478413458556251

Stage 4: Run PESQ.
pesq: mean score is: 3.0518312084674837
--------------------------------------------------
File Name: fsd50k.log
Codec SUPERB objective metric evaluation on fsd50k

Stage 1: Run SDR evaluation.
SDR: mean score is: 1.651041018826226

Stage 2: Run Mel Spectrogram Loss.
mel_loss: mean score is: 1.9033759
--------------------------------------------------
File Name: gunshot_triangulation.log
Codec SUPERB objective metric evaluation on gunshot_triangulation

Stage 1: Run SDR evaluation.
SDR: mean score is: 6.275478100428441

Stage 2: Run Mel Spectrogram Loss.
mel_loss: mean score is: 1.23099
--------------------------------------------------
File Name: libri2Mix_test.log
Codec SUPERB objective metric evaluation on libri2Mix_test

Stage 1: Run SDR evaluation.
SDR: mean score is: 3.6701485211578273

Stage 2: Run Mel Spectrogram Loss.
mel_loss: mean score is: 1.5391313

Stage 3: Run STOI.
stoi: mean score is: 0.9362651811605514

Stage 4: Run PESQ.
pesq: mean score is: 2.1895537614822387
--------------------------------------------------
File Name: librispeech.log
Codec SUPERB objective metric evaluation on librispeech

Stage 1: Run SDR evaluation.
SDR: mean score is: 8.627505998814492

Stage 2: Run Mel Spectrogram Loss.
mel_loss: mean score is: 1.5454265

Stage 3: Run STOI.
stoi: mean score is: 0.9568509707064634

Stage 4: Run PESQ.
pesq: mean score is: 3.316485096216202
--------------------------------------------------
File Name: quesst.log
Codec SUPERB objective metric evaluation on quesst

Stage 1: Run SDR evaluation.
SDR: mean score is: 6.899273166546299

Stage 2: Run Mel Spectrogram Loss.
mel_loss: mean score is: 2.237886

Stage 3: Run STOI.
stoi: mean score is: 0.9110949624359219

Stage 4: Run PESQ.
pesq: mean score is: 2.5656625175476075
--------------------------------------------------
File Name: snips_test_valid_subset.log
Codec SUPERB objective metric evaluation on snips_test_valid_subset

Stage 1: Run SDR evaluation.
SDR: mean score is: 11.001265123350482

Stage 2: Run Mel Spectrogram Loss.
mel_loss: mean score is: 1.7819229

Stage 3: Run STOI.
stoi: mean score is: 0.9753332596498754

Stage 4: Run PESQ.
pesq: mean score is: 3.383010833263397
--------------------------------------------------
File Name: vox_lingua_top10.log
Codec SUPERB objective metric evaluation on vox_lingua_top10

Stage 1: Run SDR evaluation.
SDR: mean score is: 8.071351215845228

Stage 2: Run Mel Spectrogram Loss.
mel_loss: mean score is: 1.1897244

Stage 3: Run STOI.
stoi: mean score is: 0.9018324319464593

Stage 4: Run PESQ.
pesq: mean score is: 1.928473423719406
--------------------------------------------------
File Name: voxceleb1.log
Codec SUPERB objective metric evaluation on voxceleb1

Stage 1: Run SDR evaluation.
SDR: mean score is: 7.051308404176289

Stage 2: Run Mel Spectrogram Loss.
mel_loss: mean score is: 1.8565342

Stage 3: Run STOI.
stoi: mean score is: 0.9340248268933423

Stage 4: Run PESQ.
pesq: mean score is: 3.0424613475799562
--------------------------------------------------
Average SDR for speech datasets: 7.682561569520302
Average Mel_Loss for speech datasets: 1.6951542375
Average STOI for speech datasets: 0.9285590037269955
Average PESQ for speech datasets: 2.68111621722579
Average SDR for audio datasets: 2.7383175508280417
Average Mel_Loss for audio datasets: 1.5670651666666666

results

for the 16kHz Codec model: the bitrate is 2kbps;
for the 44.1kHz Codec model: the bitrate is 6.89kbps;
for the 48kHz Codec model: the bitrate is 7.5kbps;

#1、Here is the exps/results.txt
Codec SUPERB application evaluation

Stage 1: Run speech emotion recognition.
Acc: 75.97%

Stage 2: Run speaker related evaluation.
Parsing the resyn_trial.txt for resyn wavs

Run speaker verification.
EER: 2.57%

Stage 3: Run automatic speech recognition.
WER: 3.67%

Stage 4: Run audio event classification.
ACC: 86.80%

#2、Here is the src/codec_metrics/exps/results.txt
Log results

File Name: crema_d.log
Codec SUPERB objective metric evaluation on crema_d

Stage 1: Run SDR evaluation.
SDR: mean score is: 12.264864005831004

Stage 2: Run Mel Spectrogram Loss.
mel_loss: mean score is: 0.46461612

Stage 3: Run STOI.
stoi: mean score is: 0.9201546369667847

Stage 4: Run PESQ.
pesq: mean score is: 2.9032970213890077

File Name: esc50.log
Codec SUPERB objective metric evaluation on esc50

Stage 1: Run SDR evaluation.
SDR: mean score is: 6.726699210213638

Stage 2: Run Mel Spectrogram Loss.
mel_loss: mean score is: 0.89280885

File Name: fluent_speech_commands.log
Codec SUPERB objective metric evaluation on fluent_speech_commands

Stage 1: Run SDR evaluation.
SDR: mean score is: 8.476522537066758

Stage 2: Run Mel Spectrogram Loss.
mel_loss: mean score is: 0.75807977

Stage 3: Run STOI.
stoi: mean score is: 0.9238519743607232

Stage 4: Run PESQ.
pesq: mean score is: 2.8522612583637237

File Name: fsd50k.log
Codec SUPERB objective metric evaluation on fsd50k

Stage 1: Run SDR evaluation.
SDR: mean score is: 6.95385805941422

Stage 2: Run Mel Spectrogram Loss.
mel_loss: mean score is: 0.8306656

File Name: gunshot_triangulation.log
Codec SUPERB objective metric evaluation on gunshot_triangulation

Stage 1: Run SDR evaluation.
SDR: mean score is: 8.291245593533532

Stage 2: Run Mel Spectrogram Loss.
mel_loss: mean score is: 0.95218104

File Name: libri2Mix_test.log
Codec SUPERB objective metric evaluation on libri2Mix_test

Stage 1: Run SDR evaluation.
SDR: mean score is: 4.233350120341239

Stage 2: Run Mel Spectrogram Loss.
mel_loss: mean score is: 0.7518116

Stage 3: Run STOI.
stoi: mean score is: 0.9050623419177468

Stage 4: Run PESQ.
pesq: mean score is: 2.0071350967884065

File Name: librispeech.log
Codec SUPERB objective metric evaluation on librispeech

Stage 1: Run SDR evaluation.
SDR: mean score is: 7.751003745240329

Stage 2: Run Mel Spectrogram Loss.
mel_loss: mean score is: 0.72347593

Stage 3: Run STOI.
stoi: mean score is: 0.9340773701364049

Stage 4: Run PESQ.
pesq: mean score is: 2.903846046924591

File Name: quesst.log
Codec SUPERB objective metric evaluation on quesst

Stage 1: Run SDR evaluation.
SDR: mean score is: 8.4340708735918

Stage 2: Run Mel Spectrogram Loss.
mel_loss: mean score is: 0.8294336

Stage 3: Run STOI.
stoi: mean score is: 0.8863192140533341

Stage 4: Run PESQ.
pesq: mean score is: 2.6509935235977173

File Name: snips_test_valid_subset.log
Codec SUPERB objective metric evaluation on snips_test_valid_subset

Stage 1: Run SDR evaluation.
SDR: mean score is: 9.542545404819807

Stage 2: Run Mel Spectrogram Loss.
mel_loss: mean score is: 0.7959907

Stage 3: Run STOI.
stoi: mean score is: 0.9531058100873113

Stage 4: Run PESQ.
pesq: mean score is: 2.7776152551174165

File Name: voxceleb1.log
Codec SUPERB objective metric evaluation on voxceleb1

Stage 1: Run SDR evaluation.
SDR: mean score is: 6.524681732109078

Stage 2: Run Mel Spectrogram Loss.
mel_loss: mean score is: 0.71494424

Stage 3: Run STOI.
stoi: mean score is: 0.8977601804462474

Stage 4: Run PESQ.
pesq: mean score is: 2.5823002088069917

File Name: vox_lingua_top10.log
Codec SUPERB objective metric evaluation on vox_lingua_top10

Stage 1: Run SDR evaluation.
SDR: mean score is: 13.074802660696786

Stage 2: Run Mel Spectrogram Loss.
mel_loss: mean score is: 0.49565125

Stage 3: Run STOI.
stoi: mean score is: 0.9516724002511663

Stage 4: Run PESQ.
pesq: mean score is: 2.9390562558174134

Average SDR for speech datasets: 8.7877301349621
Average Mel_Loss for speech datasets: 0.69175040125
Average STOI for speech datasets: 0.9215004910274648
Average PESQ for speech datasets: 2.7020630833506587
Average SDR for audio datasets: 7.323934287720463
Average Mel_Loss for audio datasets: 0.8918851633333333

Results for SemantiCodec

Here is the result for SemantiCodec
This is a 16Khz codec with three different bit rates:

  1. For token rate 100 with book size 16384 the bit rate is 1.35 kbps
  2. For token rate 100 with book size 32768 the bit rate is 1.40 kbps
  3. For token rate 50 with book size 16384 the bit rate is 0.68 kbps
  4. For token rate 50 with book size 32768 the bit rate is 0.70 kbps
  5. For token rate 25 with book size 16384 the bit rate is 0.34 kbps
  6. For token rate 25 with book size 32768 the bit rate is 0.35 kbps

The inference code and checkpoint model can be found here

The results of the system under six different configurations are displayed as follow (one comment per system):

Paper release?

Thanks for your work, I want to know if this repository has relevant paper.
There are many AudioCodec Benchmarks, I also want to know what is your repo important improvements compared to previous work?
Looking forward to your early reply. [btw: we can communicate with Chinese]

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.