GithubHelp home page GithubHelp logo

technicianted / msspeech-gbridge Goto Github PK

View Code? Open in Web Editor NEW
1.0 2.0 1.0 61 KB

Bridge service to enable using Google Cloud Speech client SDKs with Microsoft Cognitive Services Speech APIs

Home Page: https://hashifdef.wordpress.com

License: MIT License

CMake 2.37% C++ 96.74% Makefile 0.77% Shell 0.12%
speech-recognition speech-to-text microsoft-cognitive-services google-cloud google-api google-speech-recognition google-speech sdk sdk-java sdk-go

msspeech-gbridge's Introduction

msspeech-gbridge

Thin service to allow using of unmodified Google Speech client SDKs to access Microsoft Cognitive Services speech APIs.

Why is it needed

  • Richer client support

Microsoft Cognitive Services speech APIs only provide JavaScript SDK for their latest open API and C#/iOS/Android for the legacy, closed API. Otherwise, you must rely on community built or build your own. For example, if you are building a C/C++ based application, you can use libmsspeech open source client library.

On the other hande, Google provides an extensive set of client libraries for multiple platform. At the time of this writing, they support C#, Go, Java, Node.js, PHP, Python and Ruby.

With msspeech-gbridge service, you can directly use unmodified Google client libraries to access Microsoft APIs.

  • Get union of features

Use the APIs interchangeably to get the features you need. For example, language support, text normalization, etc.

  • Speech provider agnostic

You can build your applications using Google client SDKs but still be able to choose between Google Speech APIs and Microsoft Cognitive Services Speech APIs depending on your scenario without any code changes.

Status

This is the very first version of the service. It only works.

Unsupported features and TODOs

  • Microsoft Speech APIs do not support word level timing. msspeech-gbridge will not error when it is requested, but it will not return them.
  • Microsoft subscription key has to be supplied at the service, not by the client. WiP to supply it by clients.
  • Microsoft Speech APIs support conversational mode, which is not available in Google APIs.
  • Differences in underlying speech recognition parametesr should be evaluated. For example, timeouts and segmentation.
  • Support text to speech APIs.
  • Implement the two remaining Google Speech APIs: LongRunningRecognize.
  • Support Google speech audio codecs that are not supported by Microsoft Speech APIs: FLAC, MULAW, AMR_WB, OGG_OPUS, SPEEX_WITH_HEADER_BYTE.
  • Better command line arguments.
  • Support TLS.
  • Lots of documentation!

Using

Example usage using various Google SDKs can be found here:

Rest of platform SDKs should work but have not been tested.

As a container

Using docker locally

docker run --rm -t -p 8080:8080 technicianted/msspeech-gbridge:experimental /run.sh <your microsoft speech subscription key>

Using Azure Container Instance

If you have an Azure subscription, you can quickly bring up a container running msspeech-gbridge as a service:

az group create --name gbridge --location eastus
az container create --resource-group gbridge --name msspeech-gbridge --image technicianted/msspeech-gbridge:experimental --ip-address public --ports 8080  --command-line '/run.sh <your subscription key>'

Then you can use the assigned public IP address as an endpoint when using the clients.

As a standalone service

To use, you will need to build from source and run:

./msspeech-gbridge <your microsoft speech API subscription key>

Implications

There are some implications when using msspeech-gbridge for your speech applications.

Latency

Depending on where/how you run msspeech-gbridge, some latency might be incurred espcially for shorter audio. With each request, msspeech-gbridge attempts to reuse existing connection. If there isn't any, it connects to Microsoft Speech Service and caches the new connection. In the latter case, the time it takes to establish the connection is added to the leading latency.

However, in most cases, Microsoft Speech Service would catch up and make up for the leading latency. In such cases, no trailing latency is added.

  1. Running locally with client (same local network or same box):

No latency as client to msspeech-gbridge connection establishment is very fast.

  1. Running in same region as Microsoft Speech Service:

When running in Azure as a container, in most cases msspeech-gbridge would hit a Microsoft Speech Service instanced that is located within the same datacenter. In which case, minimal or no latency is added.

  1. Running remotely:

Latency might be incurred due to connection establishment time if audio is very short. Otherwise, Microsoft Speech Service would catch up.

Cost

The application runs as a service. Which means you can either run it as a container or on a dedicated VM, both requiring continuous cost per seconds.

Building from source

Currently you have to build from source.

Dependencies

Building from source

  • Update Google Cloud APIs:
git submodule update
  • Generete Google Cloud APIs:
cd googleapis
make LANGUAGE=c++ OUTPUT=output/
  • Build msspeech-gbridge:
mkdir build
cd build
cmake ../
make

msspeech-gbridge's People

Contributors

technicianted avatar

Stargazers

 avatar

Watchers

 avatar  avatar

Forkers

daenecompass

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.