The cross-accent-maml-asr from audioku

Learning Fast Adaptation on Cross-Accented Speech Recognition

Genta Indra Winata, Samuel Cahyawijaya, Zihan Liu, Zhaojiang Lin, Andrea Madotto, Peng Xu, Pascale Fung

This is the implementation of our paper accepted in Interspeech 2020 and can be downloaded here.

This code has been written using PyTorch. If you use any source codes or datasets included in this toolkit in your work, please cite the following paper.

@inproceedings{winata2020crossaccent,
  doi = {10.21437/interspeech.2020-0045},
  url = {https://doi.org/10.21437/interspeech.2020-0045},
  year = {2020},
  month = oct,
  publisher = {{ISCA}},
  author = {Genta Indra Winata and Samuel Cahyawijaya and Zihan Liu and Zhaojiang Lin and Andrea Madotto and Peng Xu and Pascale Fung},
  title = {Learning Fast Adaptation on Cross-Accented Speech Recognition},
  booktitle = {Interspeech 2020}
}

Abstract

Local dialects influence people to pronounce words of the same language differently from each other. The great variability and complex characteristics of accents creates a major challenge for training a robust and accent-agnostic automatic speech recognition (ASR) system. In this paper, we introduce a cross-accented English speech recognition task as a benchmark for measuring the ability of the model to adapt to unseen accents using the existing CommonVoice corpus. We also propose an accent-agnostic approach that extends the model-agnostic meta-learning (MAML) algorithm for fast adaptation to unseen accents. Our approach significantly outperforms joint training in both zero-shot, few-shot, and all-shot in the mixed-region and cross-region settings in terms of word error rate.

Download data

Execute the following command from the base folder

cd data && bash download_cv2.sh

Setup Requirement

Install PyTorch (Tested in PyTorch 1.0 and Python 3.6)
Install library dependencies (requirement.txt)

Model

Run the code

Configuration

train-manifest-list: a list of training csv
valid-manifest-list: a list of valid csv
test-manifest-list: a list of test csv
labels-path: a vocabulary list
k-train: number of training samples per batch (or in the meta-train inner loop in MAML)
k-valid: (only for MAML) number of meta-validation samples per batch
save-folder: the location of the saved models
feat_extractor: the module to generate audio input features (vgg)
train-partition-list: to set the data percentage

Train from scratch

You can train a model from scratch using the following arguments:

Train a model with joint-training objective

python joint_train.py
--train-manifest-list ./data/manifests/cv_20190612_wales_train.csv
--valid-manifest-list ./data/manifests/cv_20190612_wales_test.csv
--test-manifest-list ./data/manifests/cv_20190612_wales_test.csv
--cuda --k-train 6 --labels-path data/labels/cv_labels.json --lr 1e-4 --name wales_enc2_dec4_512_b6 --save-folder save/ --save-every 10000 --feat_extractor vgg_cnn --dropout 0.1 --num-enc-layers 2 --num-dec-layers 4 --num-heads 8 --dim-model 512 --dim-key 64 --dim-value 64 --dim-input 5120 --dim-inner 512 --dim-emb 512 --early-stop cer,20 --src-max-len 5000 --tgt-max-len 2500 --evaluate-every 1000 --epochs 500000 --sample-rate 16000 --train-partition-list 1

Train a model with first-order MAML objective

python meta_train.py
--train-manifest-list ./data/manifests/cv_20190612_us.csv ./data/manifests/cv_20190612_england.csv ./data/manifests/cv_20190612_indian.csv ./data/manifests/cv_20190612_australia.csv ./data/manifests/cv_20190612_newzealand.csv ./data/manifests/cv_20190612_african.csv ./data/manifests/cv_20190612_ireland.csv ./data/manifests/cv_20190612_hongkong.csv ./data/manifests/cv_20190612_malaysia.csv ./data/manifests/cv_20190612_singapore.csv
--valid-manifest-list ./data/manifests/cv_20190612_canada.csv ./data/manifests/cv_20190612_scotland.csv ./data/manifests/cv_20190612_southatlandtic.csv
--test-manifest-list ./data/manifests/cv_20190612_philippines.csv ./data/manifests/cv_20190612_wales.csv ./data/manifests/cv_20190612_bermuda.csv
--cuda --k-train 6 --k-valid 6 --labels-path data/labels/cv_labels.json --lr 1e-4 --name maml_10_3_3_enc2_dec4_512_b6_copy_grad --save-folder save/ --save-every 10000 --feat_extractor vgg_cnn --dropout 0.1 --num-enc-layers 2 --num-dec-layers 4 --num-heads 8 --dim-model 512 --dim-key 64 --dim-value 64 --dim-input 5120 --dim-inner 512 --dim-emb 512 --early-stop cer,50 --src-max-len 5000 --tgt-max-len 2500 --evaluate-every 100 --epochs 500000 --sample-rate 16000 --copy-grad --num-meta-test 10

Fine-tune a trained model

You can pre-train the model with other datasets and fine-tune the trained model.

Fine-tune a trained model with MAML objective.

python finetune.py
--train-manifest-list ./data/manifests/cv_20190612_philippines_train.csv
--valid-manifest-list ./data/manifests/cv_20190612_philippines_test.csv
--test-manifest-list ./data/manifests/cv_20190612_philippines_test.csv
--train-partition-list 0.1
--cuda --k-train 6 --labels-path data/labels/cv_labels.json --lr 1e-4 --name multi_accent_finetune_10shot_5updates_philippines_maml_10_3_3_enc2_dec4_512_b6_copy_grad_early10000 --save-folder save/ --feat_extractor vgg_cnn --dropout 0.1 --num-enc-layers 2 --num-dec-layers 4 --num-heads 8 --dim-model 512 --dim-key 64 --dim-value 64 --dim-input 5120 --dim-inner 512 --dim-emb 512 --early-stop cer,50 --src-max-len 5000 --tgt-max-len 2500 --epochs 5 --sample-rate 16000 --continue-from save/maml_10_3_3_enc2_dec4_512_b6_copy_grad_early10000/epoch_220000.th --beam-search --beam-width 5 --save-every 5 --opt_name sgd --evaluate-every 5 &

Fine-tune a trained model with joint-training objective

python finetune.py
--train-manifest-list ./data/manifests/cv_20190612_philippines_train.csv
--valid-manifest-list ./data/manifests/cv_20190612_philippines_test.csv
--test-manifest-list ./data/manifests/cv_20190612_philippines_test.csv
--train-partition-list 0.1
--cuda --k-train 6 --labels-path data/labels/cv_labels.json --lr 1e-4 --name multi_accent_finetune_10shot_5updates_philippines_joint_10_3_3_enc2_dec4_512_b6_22050hz --save-folder save/ --feat_extractor vgg_cnn --dropout 0.1 --num-enc-layers 2 --num-dec-layers 4 --num-heads 8 --dim-model 512 --dim-key 64 --dim-value 64 --dim-input 5120 --dim-inner 512 --dim-emb 512 --early-stop cer,50 --src-max-len 5000 --tgt-max-len 2500 --epochs 5 --sample-rate 16000 --continue-from save/joint_10_3_3_enc2_dec4_512_b6/epoch_220000.th --beam-search --beam-width 5 --save-every 5 --opt_name sgd --evaluate-every 5 --training-mode joint &

Test

python test.py
--test-manifest-list ./data/manifests/cv_20190612_philippines_test.csv
--cuda --labels-path data/labels/cv_labels.json --lr 1e-4 --training-mode meta --continue-from save/maml_10_3_3_enc2_dec4_512_b6_copy_grad_early10000/epoch_220000.th --tgt-max-len 150 --k-test 1 --beam-search --beam-width 5

Bug Report

Feel free to create an issue or send email to [email protected] or [email protected]

Unable to run meta training the model

@gentaiscool @SamuelCahyawijaya can you explain how to fix it because I tried to run meta_train.py but the clips are named like ffffd0eda81a6e48d9d3a5cf9f2b0aa5f17db75603fa7e32c5b978f568528e5129f77af9ffca8eb01dd4512c3a93a569df29678ca3e810a1bbfa2a65359704a7.mp3 which is different from the csv file in manifests folder with names like
./data/CommonVoice2_dataset/clips/common_voice_en_1100186.mp3 so that's why the code doesn't run for training the model. Here is the output when I tried to run the command line to train the meta_train:

RuntimeError: Error loading audio file: failed to open file ./data/CommonVoice2_dataset/clips/f91b898dcfaf8655bdbbed448068c1faae5ebe598b045de2a04a6335c967846990ae8f24070b64b8a3e33ba11aaaec51b7be927ab680eeb652945f974b77e8f3

Error: pop from empty list, fetching new data...
formats: can't open input file `./data/CommonVoice2_dataset/clips/1f68b4dfb437b760d5b2f0c86ebd19a0c425ecb5474778c50adbaee40cd50800b97411ece34a531250857868c048a3da53f4a087464319f1b88c97b16ea6e40d': No such file or directory
Exception in thread Thread-361:
Traceback (most recent call last):
  File "/usr/lib/python3.6/threading.py", line 916, in _bootstrap_inner
    self.run()
  File "/usr/lib/python3.6/threading.py", line 864, in run
    self._target(*self._args, **self._kwargs)
  File "/opt/vud/thesis/cross-accent-maml-asr/trainer/asr/meta_trainer.py", line 125, in fetch_train_batch
    batch_data = train_data_list[manifest_id].sample(k_train, k_valid, manifest_id)
  File "/opt/vud/thesis/cross-accent-maml-asr/utils/data_loader.py", line 274, in sample
    spect = self.parse_audio(audio_path)[:,:self.args.src_max_len]
  File "/opt/vud/thesis/cross-accent-maml-asr/utils/data_loader.py", line 69, in parse_audio
    y = load_audio(audio_path)
  File "/opt/vud/thesis/cross-accent-maml-asr/utils/audio.py", line 8, in load_audio
    sound, _ = torchaudio.load(path, format="mp3") #remove normalization=True
  File "/opt/vud/thesis/cross-accent-maml-asr/venv/lib/python3.6/site-packages/torchaudio/backend/sox_io_backend.py", line 153, in load
    filepath, frame_offset, num_frames, normalize, channels_first, format)

audioku / cross-accent-maml-asr Goto Github PK

cross-accent-maml-asr's Introduction

Learning Fast Adaptation on Cross-Accented Speech Recognition

Genta Indra Winata, Samuel Cahyawijaya, Zihan Liu, Zhaojiang Lin, Andrea Madotto, Peng Xu, Pascale Fung

Abstract

Download data

Setup Requirement

Model

Run the code

Configuration

Train from scratch

Train a model with joint-training objective

Train a model with first-order MAML objective

Fine-tune a trained model

Fine-tune a trained model with MAML objective.

Fine-tune a trained model with joint-training objective

Test

Bug Report

cross-accent-maml-asr's People

Contributors

Stargazers

Watchers

Forkers

cross-accent-maml-asr's Issues

Recommend Projects

Recommend Topics

Recommend Org

Jobs