GithubHelp home page GithubHelp logo

xuridongsheng7142 / 3m-asr Goto Github PK

View Code? Open in Web Editor NEW

This project forked from tencent-ailab/3m-asr

0.0 0.0 0.0 112 KB

License: Apache License 2.0

Python 84.04% C 1.39% C++ 7.31% Cuda 5.45% Perl 0.81% Shell 1.00%

3m-asr's Introduction

3M-ASR for End-to-End Speech Recognition

This project is used to build an End-to-End Speech Recognition system based on Mixture-of-Experts(MoE) model. MoE is an efficient way to train a large scale model and we have proved its efficiency on public dataset. More details about the algorithm can be found in "3M: Multi-loss, Multi-path and Multi-level Neural Networks for speech recognition".

Installation

  • Clone this repo
git clone https://github.com/tencent-ailab/3m-asr.git
conda create -n moe python=3.8
conda activate moe
pip install -r requirements.txt
conda install pytorch==1.9.0 torchvision==0.10.0 torchaudio==0.9.0 cudatoolkit=11.1 -c pytorch -c conda-forge
  • Follow the instruction under directory fastmoe to install fastmoe

Performance Benchmark

We evaluate our system on the public WenetSpeech dataset and the recipe of Conformer-MoE is provided. CER results are listed below and the first three lines are provided by WenetSpeech

Toolkit Dev Test_net Test_Meeting AIShell-1
Kaldi 9.07 12.83 24.72 5.41
Espnet 9.70 8.90 15.90 3.90
WeNet 8.88 9.70 15.59 4.61
Conformer-MoE(32e) 7.49 7.99 13.69 4.03

Acknowledge

  • We used FastMoE to support Mixture-of-Experts model training in Pytorch
  • We borrowed a lot of code from WeNet for the implementation of Conformer and data processing

Reference

[1] SpeechMoE: Scaling to Large Acoustic Models with Dynamic Routing Mixture of Experts(InterSpeech 2021)

[2] 3M: Multi-loss, Multi-path and Multi-level Neural Networks for speech recognition(Submitted to InterSpeech 2022)

Citation

@inproceedings{you21_interspeech,
  author={Zhao You and Shulin Feng and Dan Su and Dong Yu},
  title={{SpeechMoE: Scaling to Large Acoustic Models with Dynamic Routing Mixture of Experts}},
  year=2021,
  booktitle={Proc. Interspeech 2021},
  pages={2077--2081},
  doi={10.21437/Interspeech.2021-478}
}

@article{you20223m,
  title={3M: Multi-loss, Multi-path and Multi-level Neural Networks for speech recognition},
  author={You, Zhao and Feng, Shulin and Su, Dan and Yu, Dong},
  journal={arXiv preprint arXiv:2204.03178},
  year={2022}
}

Contact

If you have any questions about this project, please feel free to contact [email protected] or [email protected]

Disclaimer

This is not an officially supported Tencent product

3m-asr's People

Contributors

tencent-ailab avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.