GithubHelp home page GithubHelp logo

imlhf / aesrc2020 Goto Github PK

View Code? Open in Web Editor NEW

This project forked from r1ckshi/aesrc2020

0.0 1.0 0.0 88 KB

Data preperation scripts, training pipeline and baseline experiment results for the Interspeech 2020 Accented English Speech Recognition Challenge (AESRC).

License: Apache License 2.0

Shell 43.54% Python 56.46%

aesrc2020's Introduction

AESRC2020

介绍

Interspeech 2020 口音英语识别挑战赛数据准备相关脚本、训练流程代码与基线实验结果。

Data preparation scripts and training pipeline for the Interspeech 2020 Accented English Speech Recognition Challenge (AESRC).

依赖环境

  1. 安装Kaldi (数据准备有关功能脚本、Track2传统模型训练) Github链接
  2. 安装ESPnet(Track1 E2E AR Model训练、Track2 E2E ASR Transformer训练) Github链接
  3. (可选)安装Google SentencePiece (Track2 E2E ASR 词表缩减、建模单元构建) Github链接
  4. (可选)安装KenLM (N-gram语言模型训练) Github链接

使用说明

数据准备 Data Preparation

  1. 下载评测数据
  2. 准备数据,划分开发集,特征准备以及训练BPE模型 ./local/prepare_data.sh

口音识别赛道 AR Track

训练Track1 ESPnet AR模型 ./local/track1_espnet_transformer_train.sh

语音识别赛道 ASR Track

  1. 训练Track2 Kaldi GMM对齐模型 ./local/track2_kaldi_gmm_train.sh
  2. 生成Lattice,决策树,训练Track2 Kaldi Chain Model ./local/track2_kaldi_chain_train.sh
  3. 训练Track2 ESPnet Transformer模型(Track2 ESPnet RNN语言模型) ./local/track2_espnet_transformer_train.sh

注意

  1. 官方不提供Kaldi模型所需的英文的发音词典
  2. 训练脚本中不包括数据扩充、添加Librispeech数据等,参赛者可按需添加
  3. 正确安装并激活Kaldi与ESPnet的环境之后才能运行相关脚本
  4. ASR Track中Baseline提供了多种数据的组合、Librispeech全量数据预训练等试验结果
  5. 参赛者应严格按照评测中关于数据使用的相关规则训练模型,以确保结果的公平可比性

基线实验结果

Track1基线实验结果

Model RU KR US PT JPN UK CHN IND AVE
Transformer-3L 30.0 45.0 45.7 57.2 48.5 70.0 56.2 83.5 54.1
Transformer-6L 34.0 43.7 30.6 65.7 44.0 74.5 50.9 75.2 52.2
Transformer-12L 49.6 26.0 21.2 51.8 42.7 85.0 38.2 66.1 47.8
+ ASR-init 75.7 55.6 60.2 85.5 73.2 93.9 67.0 97.0 76.1

Transformer-3L、Transformer-6L、Transformer-12L均使用./local/track1_espnet_transformer_train.sh训练(elayers分别为3、6、12),ASR-init实验使用Track2中Joint CTC/Attention模型进行初始化

*在cv集的结果上发现了某个语种的acc与说话人强相关的现象,由于cv集说话人较少,所以上述结果的绝对数值并不具备统计意义,测试集将包含更多的说话人

Track2基线实验结果

Kaldi Hybrid Chain Model: CNN + 18 TDNN *基于内部的非开源英文发音词典 *随后会公布基于CMU词典的结果

ESPnet Transformer Model: 12 Encoder + 6 Decoder (simple self-attention, CTC joint training used, 1k sub-word BPE)

详细超参数见./local/files/conf/目录中模型配置与相关脚本中的设置

Data Decode Related WER on cv set
RU KR US PT JPN UK CHN IND AVE
Kaldi
Accent160 - 6.67 11.46 15.95 10.27 9.78 16.88 20.97 17.48 13.68
Libri960 ~ Accent160 6.61 10.95 15.33 9.79 9.75 16.03 19.68 16.93 13.13
Accent160 + Libri160 6.95 11.76 13.05 9.96 10.15 14.21 20.76 18.26 13.14
ESPnet
Accent160 +0.3RNNLM 5.26 7.69 9.96 7.45 6.79 10.06 11.77 10.05 8.63
Libri960 ~ Accent160 +0.3RNNLM 4.6 6.4 7.42 5.9 5.71 7.64 9.87 7.85 6.92
Accent160 +Libri160
- 5.35 9.07 8.52 7.13 7.29 8.6 12.03 9.05 8.38
+0.3RNNLM 4.68 7.59 7.7 6.42 6.37 7.76 10.88 8.41 7.48
+0.3RNNLM+0.3CTC 4.76 7.81 7.71 6.36 6.4 7.23 10.77 8.01 7.38
* Data A ~ Data B指使用Data B fine-tune Data A训练的模型

aesrc2020's People

Contributors

luyizhou4 avatar r1ckshi avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.