GithubHelp home page GithubHelp logo

kvishnivetsky / kaldi-serve Goto Github PK

View Code? Open in Web Editor NEW

This project forked from skit-ai/kaldi-serve

0.0 2.0 0.0 581 KB

gRPC server component for Kaldi based ASR

License: Apache License 2.0

Dockerfile 3.36% Makefile 3.34% Python 42.03% C++ 51.19% Emacs Lisp 0.08%

kaldi-serve's Introduction

Kaldi-Serve

GitHub tag (latest by date) GitHub

gRPC server component for Kaldi based ASR.

Key Features:

  • Multithreaded gRPC server.
  • Supports bi-directional streaming recognition.
  • Thread-safe concurrent queue to process each audio stream separately.
  • N-best alternatives with LM and AM costs.
  • Word level timing and confidence scores.

Getting Started

Setup

Make sure you have gRPC, protobuf and Boost C++ libraries installed on your system. Kaldi also needs to be present and built. Let's build the server:

make KALDI_ROOT=/path/to/local/repo/for/kaldi/ -j8

Run make clean to clear old build files.

Running the server

For running the server, you need to first specify model config in a toml which tells the program which models to load, where to look for etc. Structure of model_spec_toml file is specified in a sample in resources.

# Make sure to have kaldi and openfst library available using LD_LIBRARY_PATH or something
# e.g. env LD_LIBRARY_PATH=../../asr/kaldi/tools/openfst/lib/:../../asr/kaldi/src/lib/ ./build/kaldi_serve_app

# Alternatively, you can also put all the required .so files in the ./lib/ directory since
# that is added to the binary's rpath.

./build/kaldi_serve_app --help

Kaldi gRPC server
Usage: ./build/kaldi_serve_app [OPTIONS] model_spec_toml

Positionals:
  model_spec_toml TEXT:FILE REQUIRED
                              Path to toml specifying models to load

Options:
  -h,--help                   Print this help message and exit
  -v,--version                Show program version and exit

Clients

For simple microphone testing, you can do something like the following (needs evans installed):

audio_bytes=$(arecord -f S16_LE -d 5 -r 8000 -c 1 | base64 -w0) # Recording 5 seconds of audio
echo "{\"audio\": {\"content\": \"$audio_bytes\"}, \"config\": {\"max_alternatives\": 2, \"model\": \"general\", \"language_code\": \"hi\"} }" | evans --package kaldi_serve --service KaldiServe ./protos/kaldi_serve.proto  --call Recognize --port 5016 | jq

The output structure looks like the following:

{
  "results": [
    {
      "alternatives": [
        {
          "transcript": "हेलो दुनिया",
          "confidence": 0.95897794,
          "amScore": -374.5963,
          "lmScore": 131.33058
        },
        {
          "transcript": "हैलो दुनिया",
          "confidence": 0.95882875,
          "amScore": -372.76187,
          "lmScore": 131.84035
        }
      ]
    }
  ]
}

A Python client is also present in python directory with a few example scripts.

Load testing

We perform load testing using ghz which is a gRPC benchmarking and load testing tool. You can use the following command template:

ghz \
--insecure \
--proto ./protos/kaldi_serve.proto \
--call kaldi_serve.KaldiServe.StreamingRecognize \
-n [NUM REQUESTS] -c [CONCURRENT REQUESTS] \
--cpus [NUM CORES] \
-d "[{\"audio\": {\"content\": \"$chunk1\"}, \"config\": {\"max_alternatives\": [N_BEST], \"language_code\": \"[LANGUUAGE]\", \"model\": \"[MODEL]\"}}, ...more chunks]" \
0.0.0.0:5016

kaldi-serve's People

Contributors

lepisma avatar pskrunner14 avatar deep110 avatar mithunarunan avatar

Watchers

James Cloos avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.