GithubHelp home page GithubHelp logo

audioku / cross-accent-maml-asr Goto Github PK

View Code? Open in Web Editor NEW
43.0 5.0 6.0 22.79 MB

Meta-learning model agnostic (MAML) implementation for cross-accented ASR

License: MIT License

Python 81.57% Jupyter Notebook 18.27% Shell 0.15%
maml meta-learning asr cross-accent accent pytorch speech

cross-accent-maml-asr's Introduction

Learning Fast Adaptation on Cross-Accented Speech Recognition

Genta Indra Winata, Samuel Cahyawijaya, Zihan Liu, Zhaojiang Lin, Andrea Madotto, Peng Xu, Pascale Fung

License: MIT

This is the implementation of our paper accepted in Interspeech 2020 and can be downloaded here.

This code has been written using PyTorch. If you use any source codes or datasets included in this toolkit in your work, please cite the following paper.

@inproceedings{winata2020crossaccent,
  doi = {10.21437/interspeech.2020-0045},
  url = {https://doi.org/10.21437/interspeech.2020-0045},
  year = {2020},
  month = oct,
  publisher = {{ISCA}},
  author = {Genta Indra Winata and Samuel Cahyawijaya and Zihan Liu and Zhaojiang Lin and Andrea Madotto and Peng Xu and Pascale Fung},
  title = {Learning Fast Adaptation on Cross-Accented Speech Recognition},
  booktitle = {Interspeech 2020}
}

Abstract

Local dialects influence people to pronounce words of the same language differently from each other. The great variability and complex characteristics of accents creates a major challenge for training a robust and accent-agnostic automatic speech recognition (ASR) system. In this paper, we introduce a cross-accented English speech recognition task as a benchmark for measuring the ability of the model to adapt to unseen accents using the existing CommonVoice corpus. We also propose an accent-agnostic approach that extends the model-agnostic meta-learning (MAML) algorithm for fast adaptation to unseen accents. Our approach significantly outperforms joint training in both zero-shot, few-shot, and all-shot in the mixed-region and cross-region settings in terms of word error rate.

Download data

Execute the following command from the base folder

cd data && bash download_cv2.sh

Setup Requirement

  • Install PyTorch (Tested in PyTorch 1.0 and Python 3.6)
  • Install library dependencies (requirement.txt)

Model

Run the code

Configuration

  • train-manifest-list: a list of training csv
  • valid-manifest-list: a list of valid csv
  • test-manifest-list: a list of test csv
  • labels-path: a vocabulary list
  • k-train: number of training samples per batch (or in the meta-train inner loop in MAML)
  • k-valid: (only for MAML) number of meta-validation samples per batch
  • save-folder: the location of the saved models
  • feat_extractor: the module to generate audio input features (vgg)
  • train-partition-list: to set the data percentage

Train from scratch

You can train a model from scratch using the following arguments:

Train a model with joint-training objective

python joint_train.py
--train-manifest-list ./data/manifests/cv_20190612_wales_train.csv
--valid-manifest-list ./data/manifests/cv_20190612_wales_test.csv
--test-manifest-list ./data/manifests/cv_20190612_wales_test.csv
--cuda --k-train 6 --labels-path data/labels/cv_labels.json --lr 1e-4 --name wales_enc2_dec4_512_b6 --save-folder save/ --save-every 10000 --feat_extractor vgg_cnn --dropout 0.1 --num-enc-layers 2 --num-dec-layers 4 --num-heads 8 --dim-model 512 --dim-key 64 --dim-value 64 --dim-input 5120 --dim-inner 512 --dim-emb 512 --early-stop cer,20 --src-max-len 5000 --tgt-max-len 2500 --evaluate-every 1000 --epochs 500000 --sample-rate 16000 --train-partition-list 1

Train a model with first-order MAML objective

python meta_train.py
--train-manifest-list ./data/manifests/cv_20190612_us.csv ./data/manifests/cv_20190612_england.csv ./data/manifests/cv_20190612_indian.csv ./data/manifests/cv_20190612_australia.csv ./data/manifests/cv_20190612_newzealand.csv ./data/manifests/cv_20190612_african.csv ./data/manifests/cv_20190612_ireland.csv ./data/manifests/cv_20190612_hongkong.csv ./data/manifests/cv_20190612_malaysia.csv ./data/manifests/cv_20190612_singapore.csv
--valid-manifest-list ./data/manifests/cv_20190612_canada.csv ./data/manifests/cv_20190612_scotland.csv ./data/manifests/cv_20190612_southatlandtic.csv
--test-manifest-list ./data/manifests/cv_20190612_philippines.csv ./data/manifests/cv_20190612_wales.csv ./data/manifests/cv_20190612_bermuda.csv
--cuda --k-train 6 --k-valid 6 --labels-path data/labels/cv_labels.json --lr 1e-4 --name maml_10_3_3_enc2_dec4_512_b6_copy_grad --save-folder save/ --save-every 10000 --feat_extractor vgg_cnn --dropout 0.1 --num-enc-layers 2 --num-dec-layers 4 --num-heads 8 --dim-model 512 --dim-key 64 --dim-value 64 --dim-input 5120 --dim-inner 512 --dim-emb 512 --early-stop cer,50 --src-max-len 5000 --tgt-max-len 2500 --evaluate-every 100 --epochs 500000 --sample-rate 16000 --copy-grad --num-meta-test 10

Fine-tune a trained model

You can pre-train the model with other datasets and fine-tune the trained model.

Fine-tune a trained model with MAML objective.

python finetune.py
--train-manifest-list ./data/manifests/cv_20190612_philippines_train.csv
--valid-manifest-list ./data/manifests/cv_20190612_philippines_test.csv
--test-manifest-list ./data/manifests/cv_20190612_philippines_test.csv
--train-partition-list 0.1
--cuda --k-train 6 --labels-path data/labels/cv_labels.json --lr 1e-4 --name multi_accent_finetune_10shot_5updates_philippines_maml_10_3_3_enc2_dec4_512_b6_copy_grad_early10000 --save-folder save/ --feat_extractor vgg_cnn --dropout 0.1 --num-enc-layers 2 --num-dec-layers 4 --num-heads 8 --dim-model 512 --dim-key 64 --dim-value 64 --dim-input 5120 --dim-inner 512 --dim-emb 512 --early-stop cer,50 --src-max-len 5000 --tgt-max-len 2500 --epochs 5 --sample-rate 16000 --continue-from save/maml_10_3_3_enc2_dec4_512_b6_copy_grad_early10000/epoch_220000.th --beam-search --beam-width 5 --save-every 5 --opt_name sgd --evaluate-every 5 &

Fine-tune a trained model with joint-training objective

python finetune.py
--train-manifest-list ./data/manifests/cv_20190612_philippines_train.csv
--valid-manifest-list ./data/manifests/cv_20190612_philippines_test.csv
--test-manifest-list ./data/manifests/cv_20190612_philippines_test.csv
--train-partition-list 0.1
--cuda --k-train 6 --labels-path data/labels/cv_labels.json --lr 1e-4 --name multi_accent_finetune_10shot_5updates_philippines_joint_10_3_3_enc2_dec4_512_b6_22050hz --save-folder save/ --feat_extractor vgg_cnn --dropout 0.1 --num-enc-layers 2 --num-dec-layers 4 --num-heads 8 --dim-model 512 --dim-key 64 --dim-value 64 --dim-input 5120 --dim-inner 512 --dim-emb 512 --early-stop cer,50 --src-max-len 5000 --tgt-max-len 2500 --epochs 5 --sample-rate 16000 --continue-from save/joint_10_3_3_enc2_dec4_512_b6/epoch_220000.th --beam-search --beam-width 5 --save-every 5 --opt_name sgd --evaluate-every 5 --training-mode joint &

Test

python test.py
--test-manifest-list ./data/manifests/cv_20190612_philippines_test.csv
--cuda --labels-path data/labels/cv_labels.json --lr 1e-4 --training-mode meta --continue-from save/maml_10_3_3_enc2_dec4_512_b6_copy_grad_early10000/epoch_220000.th --tgt-max-len 150 --k-test 1 --beam-search --beam-width 5

Bug Report

Feel free to create an issue or send email to [email protected] or [email protected]

cross-accent-maml-asr's People

Contributors

gentaiscool avatar samuelcahyawijaya avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

cross-accent-maml-asr's Issues

Error

Hi,
Thanks for the support.
I am trying to execute the command "python meta_train.py" but i am getting following error. Please help !!

**_Error: pop from empty list, fetching new data...
Exception in thread Thread-58:
Traceback (most recent call last):
File "/home/ubuntu/anaconda3/envs/aws_neuron_pytorch_p36/lib/python3.6/threading.py", line 916, in _bootstrap_inner
self.run()
File "/home/ubuntu/anaconda3/envs/aws_neuron_pytorch_p36/lib/python3.6/threading.py", line 864, in run
self._target(*self._args, self.kwargs)
File "/home/ubuntu/ca_speaker/cross-accent-maml-asr/trainer/asr/meta_trainer.py", line 128, in fetch_train_batch
batch_data = train_data_list[manifest_id].sample(k_train, k_valid, manifest_id)
File "/home/ubuntu/ca_speaker/cross-accent-maml-asr/utils/data_loader.py", line 274, in sample
spect = self.parse_audio(audio_path)[:,:self.args.src_max_len]
File "/home/ubuntu/ca_speaker/cross-accent-maml-asr/utils/data_loader.py", line 69, in parse_audio
y = load_audio(audio_path)
File "/home/ubuntu/ca_speaker/cross-accent-maml-asr/utils/audio.py", line 8, in load_audio
sound, _ = torchaudio.load(path, normalization=True)
TypeError: load() got an unexpected keyword argument 'normalization'

Unable to run meta training the model

@gentaiscool @SamuelCahyawijaya can you explain how to fix it because I tried to run meta_train.py but the clips are named like ffffd0eda81a6e48d9d3a5cf9f2b0aa5f17db75603fa7e32c5b978f568528e5129f77af9ffca8eb01dd4512c3a93a569df29678ca3e810a1bbfa2a65359704a7.mp3 which is different from the csv file in manifests folder with names like
./data/CommonVoice2_dataset/clips/common_voice_en_1100186.mp3 so that's why the code doesn't run for training the model. Here is the output when I tried to run the command line to train the meta_train:

RuntimeError: Error loading audio file: failed to open file ./data/CommonVoice2_dataset/clips/f91b898dcfaf8655bdbbed448068c1faae5ebe598b045de2a04a6335c967846990ae8f24070b64b8a3e33ba11aaaec51b7be927ab680eeb652945f974b77e8f3

Error: pop from empty list, fetching new data...
formats: can't open input file `./data/CommonVoice2_dataset/clips/1f68b4dfb437b760d5b2f0c86ebd19a0c425ecb5474778c50adbaee40cd50800b97411ece34a531250857868c048a3da53f4a087464319f1b88c97b16ea6e40d': No such file or directory
Exception in thread Thread-361:
Traceback (most recent call last):
  File "/usr/lib/python3.6/threading.py", line 916, in _bootstrap_inner
    self.run()
  File "/usr/lib/python3.6/threading.py", line 864, in run
    self._target(*self._args, **self._kwargs)
  File "/opt/vud/thesis/cross-accent-maml-asr/trainer/asr/meta_trainer.py", line 125, in fetch_train_batch
    batch_data = train_data_list[manifest_id].sample(k_train, k_valid, manifest_id)
  File "/opt/vud/thesis/cross-accent-maml-asr/utils/data_loader.py", line 274, in sample
    spect = self.parse_audio(audio_path)[:,:self.args.src_max_len]
  File "/opt/vud/thesis/cross-accent-maml-asr/utils/data_loader.py", line 69, in parse_audio
    y = load_audio(audio_path)
  File "/opt/vud/thesis/cross-accent-maml-asr/utils/audio.py", line 8, in load_audio
    sound, _ = torchaudio.load(path, format="mp3") #remove normalization=True
  File "/opt/vud/thesis/cross-accent-maml-asr/venv/lib/python3.6/site-packages/torchaudio/backend/sox_io_backend.py", line 153, in load
    filepath, frame_offset, num_frames, normalize, channels_first, format)

Video file name appeared garbled

Hello.

Thanks for the sharing of the code.
I've tried to download the data by the command
cd data && bash download_cv2.sh
But the video names in clips directory are all garbled like below.
ffffd0eda81a6e48d9d3a5cf9f2b0aa5f17db75603fa7e32c5b978f568528e5129f77af9ffca8eb01dd4512c3a93a569df29678ca3e810a1bbfa2a65359704a7.mp3

Is that normal?

I use Ubuntu 18.04.4

file missing in folder modules

hi , thanks for your great work. When I try to run your work, error appears:
"ModuleNotFounderror:No module named 'modules.discriminator'

Seems this file is missing.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.