GithubHelp home page GithubHelp logo

ainisa20 / automatic-speech-recognition-from-scratch Goto Github PK

View Code? Open in Web Editor NEW

This project forked from xiabingquan/automatic-speech-recognition-from-scratch

0.0 0.0 0.0 22 KB

An minimal Seq2Seq example of Automatic Speech Recognition (ASR) based on Transformer

License: MIT License

Python 100.00%

automatic-speech-recognition-from-scratch's Introduction

Automatic-Speech-Recognition-from-Scratch

A minimal Seq2Seq example of Automatic Speech Recognition (ASR) based on Transformer

Before launch training, you should download the train and test sub-sets of LRS2, and prepare ./data/LRS2/train.pathsใ€./data/LRS2/train.textใ€./data/LRS2/train.lengths with the format that train.py requires.

Each line in train.paths represents the local path of an audio file.

Each line in train.text represents a text sentence.

Each line in train.lengths represents an integer value indicating the length of the audio (number of seconds).

The following table suggests a minimal example of the above three files.

train.paths train.text train.lengths
1.wav good morning 1.6
2.wav good afternoon 2
3.wav nice to meet you 3.1

๐Ÿ’ก If you have difficulty in accessing dataset LRS2, you may use other ASR datasets, such as LibriSpeech or TEDLIUM-v3

Use torchaudio, ffmpeg or any other tools to get the length information of audio

If you are experiencing convergence issue, try subword-based tokenizers (ref) or more sophisticated feature extractors (e.g. 1D ResNet).

Training: python3 ./train.py

Inference: python3 ./test.py <ckpt_path>

Contact

bingquanxia AT qq.com

automatic-speech-recognition-from-scratch's People

Contributors

xiabingquan avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.