GithubHelp home page GithubHelp logo

espnet-semi-supervised's Introduction

ESPnet extensions for semi-supervised end-to-end speech recognition

This repository contains evaluation scripts used in our paper

@inproceedings{Karita2018,
  author={Shigeki Karita and Shinji Watanabe and Tomoharu Iwata and Atsunori Ogawa and Marc Delcroix},
  title={Semi-Supervised End-to-End Speech Recognition},
  year=2018,
  booktitle={Proc. Interspeech 2018},
  pages={2--6},
  doi={10.21437/Interspeech.2018-1746},
  url={http://dx.doi.org/10.21437/Interspeech.2018-1746}
}

Full PDF is available in https://www.isca-speech.org/archive/Interspeech_2018/abstracts/1746.html.

how to setup

$ git clone https://github.com/nttcslab-sp/espnet-semisupervised --recursive
$ cd espnet-semisupervised/espnet/tools; make PYTHON_VERSION=3 -f conda.mk
$ cd ../..
$ ./run.sh --gpu 0 --wsj0 <your-wsj0-path> --wsj1 <your-wsj1-path>

NOTE: you need to install pytorch 0.3.1.

scripts

in root dir

  • run.sh : end-to-end recipe for this experiment (do not forget to set –gpu 0 if you have that)
  • sbatch.sh : slurm job script for sevaral pair/unpair data ratio and hyper parameter search (requires finished run_retrain_wsj.sh expdir for pretrained model params)

in shell/ dir

  • show_results.sh : summarize CER/WER/SER from decoded results of dev93/test92 sets (usage: `show_results.sh exp/train_si84_xxx`)
  • decode.sh : a script for decode and evaluate training model (usage: `decode.sh –expdir exp/train_si84_xxx`)
  • debug.sh : we recommend to source debug.sh before using ipython to set path to everything you need

in python/ dir

  • asr_train_loop_th.py : is a python script for initial-training with the paired dataset (train_si84)
  • retrain_loop_th.py : is a python script for re-training with the unpaired dataset (train_si284)
  • unsupervised_recog_th.py : is a python script for decoding by the re-trained model
  • unsupervised.py : implements pytorch model for paired/unpaired learning
  • results.py : implements chainer like reporter without chainer iterator used in training loop

results

train_setdev93 Accdev93 CEReval92 CERdev93 WEReval92 WERdev93 SEReval92 SERpath
train_si84 (7138, 15 hours)77.625.415.861.944.299.898.5exp/train_si84_blstmp_e6_subsample1_2_2_1_1_unit320_proj320_d1_unit300_location_aconvc10_aconvf100_mtlalpha0.5_adadelta_bs30_mli800_mlo150
+ train_si284 RNNLM19.316.651.347.799.899.7exp/rnnlm_train_si84_blstmp_e6_subsample1_2_2_1_1_unit320_proj320_d1_unit300_location_aconvc10_aconvf100_mtlalpha0.5_adadelta_bs30_mli800_mlo150_epochs15
+ unpaired train_si284 retrain83.828.215.661.240.599.697.6./exp/train_si84_retrain_None_alpha0.5_adadelta_lr1.0_bs30_el6_dl1_att_location_batch30_data_loss0.9
+ RNNLM22.117.251.644.299.099.4./exp/train_si84_retrain_None_alpha0.5_adadelta_lr1.0_bs30_el6_dl1_att_location_batch30_data_loss0.9/rnnlm0.1
+ unpaired train_si284 retrain w/ GAN-si8483.526.315.059.940.099.497.3exp/train_si84_paired_hidden_gan_alpha0.5_bnFalse_adadelta_lr1.0_bs30_el6_dl1_att_location_batch30_data_loss0.9_st0.5_train_si84_epochs15
+ unpaired train_si284 retrain w/ KL-si8483.628.515.660.540.499.697.3exp/train_si84_paired_hidden_gausslogdet_alpha0.5_bnFalse_adadelta_lr1.0_bs30_el6_dl1_att_location_batch30_data_loss0.9_st0.9_train_si84_epochs15
+ unpaired train_si284 retrain w/ GAN84.222.117.950.944.299.299.4./exp/train_si84_retrain84_gan_alpha0.5_adadelta_lr1.0_bs30_el6_dl1_att_location_batch30_data_loss0.9_st0.9_train_si84_iter5
+ RNNLM22.117.950.944.299.299.4./exp/train_si84_retrain84_gan_alpha0.5_adadelta_lr1.0_bs30_el6_dl1_att_location_batch30_data_loss0.9_st0.9_train_si84_iter5/rnnlm0.2
+ unpaired train_si284 retrain w/ KL84.024.814.458.139.599.696.4./exp/train_si84_ret3_gausslogdet_alpha0.5_bnFalse_adadelta_lr1.0_bs30_el6_dl1_att_location_batch30_data_loss0.9_st0.5_train_si84_epochs30
+ RNNLM20.016.948.942.799.099.1./exp/train_si84_retrain84_gausslogdet_alpha0.5_adadelta_lr1.0_bs30_el6_dl1_att_location_batch30_data_loss0.99_st0.99_train_si84/rnnlm0.2
+ unpaired train_si284 retrain w/ MMD82.925.913.959.738.499.296.7./exp/train_si84_ret3_mmd_alpha0.5_bnFalse_adadelta_lr1.0_bs30_el6_dl1_att_location_batch30_data_loss0.5_st0.99_train_si84_epochs30
train_si284 (37416 utt, 81 hours)93.98.16.323.818.992.487.4exp/train_si284_blstmp_e6_subsample1_2_2_1_1_unit320_proj320_d1_unit300_location_aconvc10_aconvf100_mtlalpha0.5_adadelta_bs30_mli800_mlo150
+ train_si284 RNNLM7.96.122.718.389.784.1./exp/rnnlm_train_si284_blstmp_e6_subsample1_2_2_1_1_unit320_proj320_d1_unit300_location_aconvc10_aconvf100_mtlalpha0.5_adadelta_bs30_mli800_mlo150_epochs15
  • Acc: character accuracy during training with forced decoding
  • CER: character error rate (edit distance based error)
  • WER: word error rate (edit distance based error)
  • SER: sentence error rate (exact match error)
  • all the exp path starts with exp/... is placed to /nfs/kswork/kishin/karita/experiments/espnet-unspervised/egs/wsj/unsupervised on NTT ks-servers

smaller paired train data results

plot.png

contact

email: [email protected]

espnet-semi-supervised's People

Contributors

shigekikarita avatar

Watchers

James Cloos avatar Rpersie avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.