GithubHelp home page GithubHelp logo

nikhil-garg / neural_sp Goto Github PK

View Code? Open in Web Editor NEW

This project forked from hirofumi0810/neural_sp

0.0 1.0 0.0 9.21 MB

End-to-end ASR/LM implementation with PyTorch

License: Apache License 2.0

Python 98.31% Shell 1.18% Makefile 0.51%

neural_sp's Introduction

Build Status

NeuralSP: Neural network based Speech Processing

How to install

# Set path to CUDA, NCCL
CUDAROOT=/usr/local/cuda
NCCL_ROOT=/usr/local/nccl

export CPATH=$NCCL_ROOT/include:$CPATH
export LD_LIBRARY_PATH=$NCCL_ROOT/lib/:$CUDAROOT/lib64:$LD_LIBRARY_PATH
export LIBRARY_PATH=$NCCL_ROOT/lib/:$LIBRARY_PATH
export CUDA_HOME=$CUDAROOT
export CUDA_PATH=$CUDAROOT
export CPATH=$CUDA_PATH/include:$CPATH  # for warp-rnnt

# Install miniconda, python libraries, and other tools
cd tools
make KALDI=/path/to/kaldi

Key features

Corpus

  • ASR

    • AISHELL-1
    • CSJ
    • Librispeech
    • Switchboard (+ Fisher)
    • TEDLIUM2/TEDLIUM3
    • TIMIT
    • WSJ
  • LM

    • Penn Tree Bank
    • WikiText2

Front-end

  • Frame stacking
  • Sequence summary network [link]
  • SpecAugment [link]
  • Adaptive SpecAugment [link]

Encoder

  • RNN encoder
    • (CNN-)BLSTM, (CNN-)LSTM, (CNN-)BLGRU, (CNN-)LGRU
    • Latency-controlled BLSTM [link]
  • Transformer encoder [link]
    • (CNN-)Transformer
    • Chunk hopping mechanism [link]
    • Relative positional encoding [link]
  • Time-depth separable (TDS) convolution encoder [link] [line]
  • Gated CNN encoder (GLU) [link]
  • Conformer encoder [link]

Connectionist Temporal Classification (CTC) decoder

  • Forced alignment
  • Beam search
  • Shallow fusion

Attention-based decoder

  • RNN decoder
    • Shallow fusion
    • Cold fusion [link]
    • Deep fusion [link]
    • Forward-backward attention decoding [link]
    • Ensemble decoding
  • Streaming RNN decoder
    • Hard monotonic attention [link]
    • Monotonic chunkwise attention (MoChA) [link]
    • CTC-synchronous training (CTC-ST) [link]
  • RNN transducer [link]
  • Transformer decoder [link]
  • Streaming Transformer decoder
    • Monotonic Multihead Attention [link] [link]

Language model (LM)

  • RNNLM (recurrent neural network language model)
  • Gated convolutional LM [link]
  • Transformer LM
  • Transformer-XL LM [link]
  • Adaptive softmax [link]

Output units

  • Phoneme
  • Grapheme
  • Wordpiece (BPE, sentencepiece)
  • Word
  • Word-char mix

Multi-task learning (MTL)

Multi-task learning (MTL) with different units are supported to alleviate data sparseness.

  • Hybrid CTC/attention [link]
  • Hierarchical Attention (e.g., word attention + character attention) [link]
  • Hierarchical CTC (e.g., word CTC + character CTC) [link]
  • Hierarchical CTC+Attention (e.g., word attention + character CTC) [link]
  • Forward-backward attention [link]
  • LM objective

ASR Performance

AISHELL-1 (CER)

model dev test
Transformer 5.0 5.4
Conformer 4.7 5.2
Streaming MMA 5.5 6.1

CSJ (WER)

model eval1 eval2 eval3
LAS 6.5 5.1 5.6

Switchboard 300h (WER)

model SWB CH
LAS 9.1 18.8

Switchboard+Fisher 2000h (WER)

model SWB CH
LAS 7.8 13.8

Librispeech (WER)

model dev-clean dev-other test-clean test-other
Transformer 2.1 5.3 2.4 5.7
Streaming MMA 2.5 6.9 2.7 7.1

TEDLIUM2 (WER)

model dev test
LAS 10.9 11.2

WSJ (WER)

model test_dev93 test_eval92
LAS 8.8 6.2

LM Performance

Penn Tree Bank (PPL)

model valid test
RNNLM 87.99 86.06
+ cache=100 79.58 79.12
+ cache=500 77.36 76.94

WikiText2 (PPL)

model valid test
RNNLM 104.53 98.73
+ cache=100 90.86 85.87
+ cache=2000 76.10 72.77

Reference

Dependency

neural_sp's People

Contributors

hirofumi0810 avatar zh794390558 avatar sunski avatar elgeish avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.