GithubHelp home page GithubHelp logo

symblai / speech-recognition-evaluation Goto Github PK

View Code? Open in Web Editor NEW
35.0 4.0 7.0 37 KB

Evaluate results from ASR/Speech-to-Text quickly

License: Apache License 2.0

JavaScript 100.00%
speech-recognition asr wer stt evaluation punctuations transcriptions comparison difference diff

speech-recognition-evaluation's Introduction

Automatic Speech Recognition (ASR) Evaluation

If you're using any Speech-to-Text or Speech Recognition system to generate transcriptions from your audio/video content, then you can use this tool to compare how well it is doing against a human generated transcription. If you're not sure how to generate transcription, you can take a look here for list of tutorials to help you get started.

What can this utility do?

This is a simple utility to perform a quick evaluation on the results generated by any Speech to text (STT) or Automatic Speech Recognition (ASR) System.

This utility can calculate following metrics -

  • Word Error Rate (WER), which is a most common metric of measuring the performance of a Speech Recognition or Machine translation system
  • Word Information Loss (WIL), which is a simple approximation to the proportion of word information lost. Refer to this paper for more info.
  • Levenshtein Distance calculated at word level.
  • Number of Word level insertions, deletions and mismatches between the original file and the generated file.
  • Number of Phrase level insertions, deletions and mismatches between the original file and the generated file.
  • Color Highlighted text Comparison to visualize the differences.
  • General Statistics about the original and generated files (bytes, characters, words, new lines etc.)

The utility also performs the pre-processing or normalization of the text in the provided files based on following operations -

  • Remove Speaker Name: Remove the Speaker name at the beginning of the line.
  • Remove Annotations: Remove any custom annotations added during transcriptions.
  • Remove Whitespaces: Remove any extra white spaces.
  • Remove Quotes: Remove any double quotes
  • Remove Dashes: Remove any dashes
  • Remove Punctuations: Remove any punctuations (.,?!)
  • Convert contents to lower case

Pre-requisites

Make sure that you have NodeJS v8+ installed on your system.

Installation

npm install -g speech-recognition-evaluation

Verify installation by simply running:

asr-eval

Usage

Simplest way to run your first evaluation is by simply passing original and generated options to asr-eval command. Where, original is a plain text file containing original transcript to be used as reference; usually this is generated by human beings. And generated is a plain text file containing generated transcript by the STT/ASR system.

asr-eval --original ./original-file.txt --generated ./generated-file.txt

This would print simply the Word Error Rate (WER) between the provided files. This is how the output should look like:

Word Error Rate (WER): 13.61350109561817%

To find more information about all the available options:

asr-eval --help

All the available usage options would be printed:

Synopsis

  $ asr-eval --original file --generated file           
  $ asr-eval [options] --original file --generated file 
  $ asr-eval --help                                     

Options

  -o, --original file                 Original File to be used as reference. Usually, this should be the            
                                      transcribed file by a Human being.                                            
  -g, --generated file                File with the output generated by Speech Recognition System.                  
  -e, --wer [true|false]              Default: true. Print Word Error Rate (WER).                                   
  -i, --wil [true|false]              Default: true. Print Word Information Loss (WIL).                             
  --distance [true|false]             Default: false. Print total word distance after comparison.                   
  --stats [true|false]                Default: false. Print statistics about original and generate files, before    
                                      and after pre-processing. Also prints statistics about word level and phrase  
                                      level differences.                                                            
  --pairs [true|false]                Default: false. Print all the difference pairs with type of difference.       
  --textcomparison [true|false]       Default: false. Print the text comparison between two files with              
                                      highlighting.                                                                 
  --removespeakers [true|false]       Default: true. Remove the speaker at the start of each line in files before   
                                      calculations. The speaker should be separated by colon ":" i.e. speaker_name: 
                                      text For e.g. "John Doe: Hello, I am John." would get converted to simply     
                                      "Hello, I am John."                                                           
  --removeannotations [true|false]    Default: true. Remove any custom annotations in the transcript before         
                                      calculations. This is useful when removing custom annotations done by human   
                                      transcribers.  Anything in square brackets [] are detected as annotations.    
                                      For e.g. "Hello, I am [inaudible 00:12] because of few reasons." would get    
                                      converted to "Hello, I am because of few reasons."                            
  --removewhitespaces [true|false]    Default: true. Remove any extra white spaces before calculations.             
  --removequotes [true|false]         Default: true. Remove any double quotes '"' from the files before             
                                      calculations.                                                                 
  --removedashes [true|false]         Default: true. Remove any dashes (hyphens) "-" from the files before          
                                      calculations.                                                                 
  --removepunctuations [true|false]   Default: true. Remove any punctuations ".,?!" from the files before           
                                      calculations.                                                                 
  --lowercase [true|false]            Default: true. Convert both files to lower case before calculations. This is  
                                      useful if evaluation needs to be done in case-insensitive way.                
  --help [true|false]                 Print this usage guide.                                                                                   

Getting help

If you need help installing or using the utility, please give a shout out in our slack channel

If you've instead found a bug or would like new features added, go ahead and open issues or pull requests against this repo!

speech-recognition-evaluation's People

Contributors

mjabali avatar shoshka-gajdosh avatar toshish avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

speech-recognition-evaluation's Issues

Make library callable within JavaScript

Right now, it seems like this tool is only accessible via CLI. I think it would be very helpful to be able to call this library programmatically using Node.js code, instead of having to open a shell to run the command.

asr-eval: command not found

After I install speech-recognition-evaluation.
When I run the command asr-eval, my ubuntu system tell me asr-eval: command not found

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.