GithubHelp home page GithubHelp logo

mainro / deepspeech-server Goto Github PK

View Code? Open in Web Editor NEW
212.0 16.0 71.0 82 KB

A testing server for a speech to text service based on coqui.ai

License: Mozilla Public License 2.0

Python 100.00%
speech-to-text speech-recognition reactive-extensions rxpy reactivex deepspeech coqui-ai

deepspeech-server's Introduction

DeepSpeech Server

image

image

Key Features

This is an http server that can be used to test the Coqui STT project (the successor of the Mozilla DeepSpeech project). You need an environment with DeepSpeech or Coqui to run this server.

This code uses the Coqui STT 1.0 APIs.

Installation

The server is available on pypi, so you can install it with pip:

pip3 install deepspeech-server

You can also install deepspeech server from sources:

python3 setup.py install

Note that python 3.6 is the minimum version required to run the server.

Starting the server

deepspeech-server --config config.yaml

What is a STT model?

The quality of the speech-to-text engine depends heavily on which models it loads at runtime. Think of them as a sort of pattern that controls how the engine works.

How to use a specific STT model

You can use coqui without training a model. Pre-trained models are on offer at the Coqui Model Zoo (Make sure the STT Models tab is selected):

https://coqui.ai/models

Once you've downloaded a pre-trained model, make a copy of the sample configuration file. Edit the "model" and "scorer" fields in your new file for the engine you want to use so that they match the downloaded files:

cp config.sample.yaml config.yaml
$EDITOR config.yaml

Lastly, start the server:

deepspeech-server --config config.yaml

Server configuration

The configuration is done with a yaml file, provided with the "--config" argument. Its structure is the following one:

The configuration file contains several sections and sub-sections.

coqui section configuration

Section "coqui" contains configuration of the coqui-stt engine:

model: The model that was trained by coqui. Must be a tflite (TensorFlow Lite) file.

scorer: [Optional] The scorer file. Use this to tune the STT to understand certain phrases better.

lm_alpha: [Optional] alpha hyperparameter for the scorer.

lm_beta: [Optional] beta hyperparameter for the scorer.

beam_width: [Optional] The size of the beam search. Corresponds directly to how long decoding takes.

http section configuration

request_max_size (default value: 1048576, i.e. 1MiB) is the maximum payload size allowed by the server. A received payload size above this threshold will return a "413: Request Entity Too Large" error.

host The listen address of the http server.

port The listening port of the http server.

log section configuration

The log section can be used to set the log levels of the server. This section contains a list of log entries. Each log entry contains the name of a logger and its level. Both follow the convention of the python logging module.

Using the server

Inference on the model is done via http post requests. For example with the following curl command:

curl -X POST --data-binary @testfile.wav http://localhost:8080/stt

deepspeech-server's People

Contributors

cliabhach avatar mainro avatar mlaradji avatar mrgnr avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

deepspeech-server's Issues

ImportError: cannot import name 'AsyncIOScheduler'

when I run the following sentense to config the deepspeech-server:
deepspeech-server --config config.json

An importError appears (I put the deepseech model downloaded ("model") just under the "deepspeech-server" folder ). Here is the bug information:

Traceback (most recent call last):
File "/usr/local/bin/deepspeech-server", line 4, in
import('pkg_resources').run_script('deepspeech-server==1.0.0', 'deepspeech-server')
File "/usr/lib/python3/dist-packages/pkg_resources/init.py", line 658, in run_script
self.require(requires)[0].run_script(script_name, ns)
File "/usr/lib/python3/dist-packages/pkg_resources/init.py", line 1445, in run_script
exec(script_code, namespace, namespace)
File "/usr/local/lib/python3.6/dist-packages/deepspeech_server-1.0.0-py3.6.egg/EGG-INFO/scripts/deepspeech-server", line 3, in
requires = 'deepspeech-server==1.0.0'
File "/usr/local/lib/python3.6/dist-packages/deepspeech_server-1.0.0-py3.6.egg/deepspeech_server/server.py", line 4, in
ImportError: cannot import name 'AsyncIOScheduler'

Concurrent Requests

Hi...
Is this API will allow concurrent requests? if yes how many simultaneous requests I can make through this API? Is it depends on hosted server configuration(no of CPU and GUPU) and bandwidth?

Regards,
Phaneendra.

Requested audio file name in logs

Hey romain picard,
First of all thanks for the deepspeech server, i wanted to use it as production
server and i want the audio file name in the logs as well along with STT result . I am kind of new at server side programing and api integration , so can you guide me how to show the requested audio file name along with STT result .

i have tried to log the item.data in the deepspeech.py along with STT result log but i think it's the stream to data.

one more quick question how can i reduce the response time right now it's taking 4-5 seconds(on cpu). can deepspeech-gpu make any difference?

Thanks

[Feature Request] Add support for S3 paths

I'm posting this as an issue to get some feedback. I'm happy to work on this myself.

In my pipeline, many of the audio files I want to transcribe are stored in S3. Currently, I have a service sitting in front of deepspeech-server that downloads the entire audio file from S3 and forwards it to deepspeech-server. The services live in the same server so there's no issue with bandwidth, but it does seem to be a waste of time and RAM.

A solution would be to allow deepspeech-server to directly access S3 files. It is easy to do using the smart_open package, which allows you to open an S3 path (also Hadoop, etc.) as a file-like object that we can directly use for inference. This would save even more time if we were to use the streaming inference API as well (#16).

Not working with deepspeech 0.2.0

Mozilla just released .2.0. I've downloaded it, along with the new models, and verified that it works as expected with the deepspeech binaries.

However, I'm getting an error when trying to start deepspeech-server (.6.0):

deepspeech-server  --config ../deepspeech/config.json 
Traceback (most recent call last):
  File "/usr/local/bin/deepspeech-server", line 3, in <module>
    from deepspeech_server.server import main;
  File "/usr/local/lib/python3.6/dist-packages/deepspeech_server/server.py", line 16, in <module>
    import deepspeech_server.deepspeech as deepspeech
  File "/usr/local/lib/python3.6/dist-packages/deepspeech_server/deepspeech.py", line 9, in <module>
    from deepspeech.model import Model
ModuleNotFoundError: No module named 'deepspeech.model'

deepspeech.model no longer appears to be valid under .2.0:

python3
>>> from deepspeech.model import Model
ModuleNotFoundError: No module named 'deepspeech.model'

I've removed and reinstalled both deepspeech and deepspeech-server to no avail.

ds 0.7.0 update

In the config, the deepspeech section will only need the model. The lm, trie, and features parts can be removed from the deepspeech.py/server.py files to get a .7 model working.

TypeError: let_bind() got an unexpected keyword argument 'error_map'

I just installed deepspeec-server using a fresh virtualenv in debian testing

I get this error:

$ deepspeech-server --config config.json
Traceback (most recent call last):
  File "/home/test/deepspeech/venv/bin/deepspeech-server", line 5, in <module>
    main()
  File "/home/test/deepspeech/venv/lib/python3.7/site-packages/deepspeech_server/server.py", line 157, in main
    file=file.make_driver()
  File "/home/test/deepspeech/venv/lib/python3.7/site-packages/cyclotron_aio/runner.py", line 8, in run
    program = setup(entry_point, drivers)
  File "/home/test/deepspeech/venv/lib/python3.7/site-packages/cyclotron/rx.py", line 67, in setup
    sinks = entry_point.call(sources)
  File "/home/test/deepspeech/venv/lib/python3.7/site-packages/deepspeech_server/server.py", line 128, in deepspeech_server
    error_map=lambda e: httpd.Response(
TypeError: let_bind() got an unexpected keyword argument 'error_map'

Package versions are from #23 (comment)

$ pip list
Package           Version
----------------- -------
aiohttp           3.6.2  
async-timeout     3.0.1  
attrs             19.3.0 
chardet           3.0.4  
cyclotron         0.6.1  
cyclotron-aio     0.7.0  
cyclotron-std     0.5.0  
deepspeech        0.5.1  
deepspeech-server 1.1.0  
idna              2.8    
multidict         4.6.1  
numpy             1.17.4 
pip               19.3.1 
pkg-resources     0.0.0  
Rx                1.6.0  
scipy             1.3.3  
setuptools        42.0.2 
wheel             0.33.6 
yarl              1.4.2  

Document python version required (support for async def etc.)

I get this error, presumably because async def is not in Python 3.4:

..../deepspeech/lib/python3.4/site-packages/deepspeech_server/driver/http_driver.py", line 20
    async def on_post_data(request, path):
            ^
SyntaxError: invalid syntax

The README.md and setup.py files should indicate which version of Python is required.

move model parameters to the configuration file

The following parameters are currently hard coded in the deepspeech driver:

  • N_FEATURES = 26
  • N_CONTEXT = 9
  • BEAM_WIDTH = 500
  • LM_WEIGHT = 1.75
  • WORD_COUNT_WEIGHT = 1.00
  • VALID_WORD_COUNT_WEIGHT = 1.00

They should be set in the configuration file.

Why I still have a CORS problem

Hi,
I am running this server on localhost:8080, and I write the React code for fetching the response on localhost:3000. However I still meet the CORS problem.

The code is as follow:
const getresult = file => {
file.preventDefault();
const uploadfile = file.target.files[0];
const formdata = new FormData();
formdata.append("uploadfile", uploadfile);
for (const value of formdata.values()) {
console.log(value);
}
const url = "http://localhost:8080/stt";
fetch(url, {
mode: "no-cors",
method: "POST",
headers: { "Content-Type": "application/octet-stream" },
body: formdata
})
.then(response => response.json())
.then(success => {
return

successful

;
})
.catch(error => console.log(error));
};
return (...

and the bug is:
image

however if I add mode: "no-cors", I still meet a problem
image

Thanks.

Update for deepspeech version 0.6.0

Hey everyone , i am trying to use the new deepspeech version 0.6.0 and i have updated the deepspeech version to 0.6.0 for the server .

but i am getting the incompatibility error TypeError: CreateModel() takes at most 2 arguments (5 given)

do we have to change the server code for using the new deepspeech version or this is due some different issue ?

Thanks in advance

run error

deepspeech-server --config config.yaml
http sink error: 'Observable' object has no attribute 'subscribe_', NoneType: None

What's the problem, thanks

Make server port configurable

The server port should be configurable.
This can be useful when using the server directly, and not from a reverse proxy or docker.

Add authorization/authentication support

An authentication support is needed to serve deepspeech on a public server without letting anybody using it. It should be based on a plugin system using external modules. This would allow anyone to use its own authentication system/backend.

413: Request Entity Too Large

Thank you for this tool. I am getting a "413: Request Entity Too Large" message when I attempt to upload a file like this:

curl -X POST --data-binary @zb.wav http://0.0.0.0:8000/stt

The server is running and shows this message:
config file: config, deepspeech.json creating model /home/deepspeech/models/default/output_graph.pb /home/deepspeech/models/default/alphabet.txt ======== Running on http://0.0.0.0:8000 ======== (Press CTRL+C to quit)

The file I am trying to post is 47692844 in size.

Would you consider adding to the configuration file a maximum file size to specify allowed?
It could also be convenient to allow the port to be set in the config.

Thank you.

Add an API based on websocket

Objective

Implement an API based on websocket. Compared to the http API, this will improve latency because audio data will be send on the fly as it is available.

If possible, the API should follow the reactive streams convention, where a stream is a request to do a STT inference (from client to server), and another stream carries the answer (from server to client):

  • Request stream creation is done with one message.
  • several items a sent on the request stream to carry the audio data (split by chunks).
  • The request stream completes once all audio data is transfered or fails in case of error.

When the request stream completes, a STT inference is done, and the anser is sent to a response stream. So there is one request stream per request and a common response stream.

Specification

Alternative solutions

Connect this deepspeech-server by Android

HI, I am wondering can we connect this http server and use the deepspeech-server remotely by an Android device?
Like using the phone to join the local area network with the deespeech-server, and send the HTTP request by the phone?

So that, we might be able to record some audio directly by phones and send the audio files directly to deepspeech-server remotely.

May you give me some hints or simple codes so that I can have a try?

Thank you very much!

How to run as a background service?

Hi,

Apologies; I am not a programmer.

How can I get this program to run as a background service (on Ubuntu 18.04) that will run at system startup and always be listening?

It run from the console just fine but has to be left open, as soon as I send ctrl+c the listening server shuts down.

Installation errors

Hello,

When installing with python3 setup.py install, Rx 1.6.1 will be installed which is not compatible with some other components, as it seems.
error: Rx 1.6.1 is installed but rx>=3.0 is required by {'cyclotron', 'cyclotron-std'}

Manually increasing to 3.6 will result in an error when starting the server.py, since rx.concurrency was deprecated a long time ago, and did not make it to 3.6.

Best Regards

simple fixes to documentation: curl example and config file

Theyre both trivial fixes, and would save folks a useless hour digging around.

server defaults to 8080, not 8000. And putting brackets around the filename will get you an unintelligible error and server crash. Should be:
curl -X POST --data-binary @testing.wav http://localhost:8080/stt

this config file would be a drop-in-and-use-immediately replacement for anyone who downloaded the supplied model from DeepSpeech project. there is no 'models.pb' file anywhere, as referenced by your documentation. and they end up un-tarring themselves to a 'models/' subdirectory.

{
  "deepspeech": {
    "model" :"models/output_graph.pb",
    "alphabet": "models/alphabet.txt",
    "lm": "models/lm.binary",
    "trie": "models/trie"
  },
  "server": {
    "http": {
      "request_max_size": 100048576
    }
  }
}

...save that as server.config in DeepSpeech/,
cd DeepSpeech,
deepspeech-server --config server.config
and 'it just works'.

Thanks for your work, I was using a hackish FIFO scenario before to accomplish same thing, much improved.

convert readme to rst

pypi does not support markdown, so the readme is not correctly displayed on it.
So readme should be converted to rst.

Response mimetype "application/octet-stream"

Hi,

I'm trying to use deepspeech-server with Rhasspy via Speech To Text Remote HTTP module, but the problem is that deepspeech-server response mimetype is "application/octet-stream" instead of "application/json":

[ERROR:2021-06-15 21:37:15,270] rhasspyremote_http_hermes: handle_stop_listening Traceback (most recent call last): File "/usr/lib/rhasspy/rhasspy-remote-http-hermes/rhasspyremote_http_hermes/__init__.py", line 606, in handle_stop_listening transcription_dict = await response.json() File "/usr/lib/rhasspy/.venv/lib/python3.7/site-packages/aiohttp/client_reqrep.py", line 1103, in json headers=self.headers, aiohttp.client_exceptions.ContentTypeError: 0, message='Attempt to decode JSON with unexpected mimetype: application/octet-stream', url=URL('http://192.168.0.6:7123/stt?siteId=kitchen')

I think "application/octet-stream" is wrong mimetype for text response, so maybe that should be changed?

Could not resolve proxy : POST

It was working all fine until yesterday.
All of sudden it didn't work from today.
I ran the curl command,
curl -x POST --data-binary @C23_597.wav http://localhost:8080/stt
but i was issuing this error:
curl: (5) Could not resolve proxy: POST

wav files error

Compiled it all but WAV files are "not understood"

STT error: File format b''... not understood.

Auto publish PyPi

Awesome project. Any chance you could setup auto pushing new releases to pypi? ๐Ÿ‘
0.4.1 fixes a bug which makes it more functional again but the PyPi only has 0.4

DeepSpeech no longer maintained

Hi there! I'd seen this project back when DeepSpeech was actively maintained, and I'm glad to see it's still being useful to others!

In case you didn't know, the old repo doesn't seem to be maintained any longer (for more info, mozilla/DeepSpeech#3693)

however, we (the old DeepSpeech team) split off the project into a new fork, where it's very actively maintained:)

We also recently released a new English model which is much more accurate than the older versions... switching to Coqui should be very easy since the API hasn't changed significantly... but under the hood it's better:)

check out the new repo here: https://github.com/coqui-ai/stt

GPU usage?

The documentation mentions deepspeech-gpu in the installation process but deepspeech-server appears not to use the GPU, which I need for higher-speed inference. I know it isn't a more general problem (misconfiguration of Nvidia drivers, Tensorflow, etc.) because the binary with arch = gpu works; rather it seems that the Model Python class doesn't use the GPU (this also seems the case with the native_client/python/client.py code, which uses that same class).

Package dependency error

With python 3.7 im installing deepspeech-server with pip

pip install deepspeech-server

while instalation it shows this errors

Successfully built deepspeech-server
ERROR: cyclotron 1.0.0 has requirement rx>=3.0, but you'll have rx 1.6.1 which is incompatible.
ERROR: cyclotron-std 1.0.0 has requirement rx>=3.0, but you'll have rx 1.6.1 which is incompatible.
Installing collected packages: rx, cyclotron, cyclotron-std, deepspeech-server
Successfully installed cyclotron-1.0.0 cyclotron-std-1.0.0 deepspeech-server-1.1.0 rx-1.6.1

after instalation if I run deepspeech-server im getting

Traceback (most recent call last):
  File "/home/parallels/Work/deepspeech-server/deepspeech_venv/bin/deepspeech-server", line 3, in <module>
    from deepspeech_server.server import main;
  File "/home/parallels/Work/deepspeech-server/deepspeech_venv/lib/python3.7/site-packages/deepspeech_server/server.py", line 7, in <module>
    from cyclotron.router import make_error_router
  File "/home/parallels/Work/deepspeech-server/deepspeech_venv/lib/python3.7/site-packages/cyclotron/router.py", line 2, in <module>
    import rx.operators as ops
ModuleNotFoundError: No module named 'rx.operators'

HTTP response not working

I have tried to retrieve the transcription of a test wav file from a python script. The console of the server is working properly, showing the transcript, however, I couldn't get the transcript as the HTTP response and the server terminal displayed the following runtime warning: RuntimeWarning: coroutine 'StreamResponse.write' was never awaited
response.write(bytearray(i["data"], 'utf8'))

======== Running on http://127.0.0.1:8080 ========
(Press CTRL+C to quit)
console: experience proves this
/Users/lapwing/anaconda2/envs/ds2/lib/python3.6/site-packages/deepspeech_server/driver/http_driver.py:37: RuntimeWarning: coroutine 'StreamResponse.write' was never awaited
  response.write(bytearray(i["data"], 'utf8'))
console: experience proves this

I am using
Deepspeech version 0.1.1
Deepspeech-server version 0.4.0
and here is the code that I have used to query the server:

import os
import urllib2
class EnhancedFile(file):
    def __init__(self, *args, **keyws):
        file.__init__(self, *args, **keyws)
    def __len__(self):
        return int(os.fstat(self.fileno())[6])
theFile = EnhancedFile('test1.wav', 'r')
theUrl = "http://127.0.0.1:8080/stt"
theHeaders= {'Content-Type': 'text/xml'}
theRequest = urllib2.Request(theUrl, theFile, theHeaders)
response = urllib2.urlopen(theRequest)
theFile.close()
for line in response:
    print line

similar problem with the sample curl command:
curl -X POST --data-binary @test1.wav http://127.0.0.1:8080/stt

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.