alphacep / vosk-asterisk Goto Github PK

View Code? Open in Web Editor NEW

101.0 11.0 40.0 42 KB

Speech Recognition in Asterisk with Vosk Server

License: GNU General Public License v2.0

Makefile 8.82% Shell 1.27% M4 9.01% C 80.89%

vosk asterisk speech-recognition speech-to-text asr

vosk-asterisk's Introduction

Vosk speech recognition modules for Asterisk

This is an asterisk module for Vosk API server:

https://github.com/alphacep/vosk-server

It is tested with latest asterisk git master, but should equally work with other branches (13,16,17).

Installation

Make sure you have latest asterisk update

git clone https://github.com/asterisk/asterisk
....

First build the modules

./bootstrap
./configure --with-asterisk=<path_to_asterisk_source> --prefix=<path_to_install>
make
make install

for example:

./bootstrap
./configure --with-asterisk=/usr --prefix=/usr
make
make install

Edit modules.conf to load modules

load = res_speech.so
load = res_http_websocket.so
load = res_speech_vosk.so

Edit dialplan in extensions.conf:

[internal]
exten = 1,1,Answer
same = n,Wait(1)
same = n,SpeechCreate
same = n,SpeechBackground(hello)
same = n,Verbose(0,Result was ${SPEECH_TEXT(0)})

Run Vosk server with the Docker

docker run -d -p 2700:2700 alphacep/kaldi-en:latest

Dial extension and check the result

vosk-asterisk's People

Contributors

Stargazers

Watchers

vosk-asterisk's Issues

vosk docker stops after few second

Hello, i've test the container but after run few seconds the container stops, i just tried to see what happens but only i get this

root@bfa8f7d0aa1c:/opt/vosk-server/websocket# python3 ./asr_server.py /opt/vosk-model-en/model
LOG (VoskAPI:ReadDataFiles():model.cc:213) Decoding params beam=13 max-active=7000 lattice-beam=6
LOG (VoskAPI:ReadDataFiles():model.cc:216) Silence phones 1:2:3:4:5:11:12:13:14:15
LOG (VoskAPI:RemoveOrphanNodes():nnet-nnet.cc:948) Removed 0 orphan nodes.
LOG (VoskAPI:RemoveOrphanComponents():nnet-nnet.cc:847) Removing 0 orphan components.
LOG (VoskAPI:ReadDataFiles():model.cc:248) Loading i-vector extractor from /opt/vosk-model-en/model/ivector/final.ie
LOG (VoskAPI:ComputeDerivedVars():ivector-extractor.cc:183) Computing derived variables for iVector extractor
LOG (VoskAPI:ComputeDerivedVars():ivector-extractor.cc:204) Done.
LOG (VoskAPI:ReadDataFiles():model.cc:279) Loading HCLG from /opt/vosk-model-en/model/graph/HCLG.fst
LOG (VoskAPI:ReadDataFiles():model.cc:294) Loading words from /opt/vosk-model-en/model/graph/words.txt
LOG (VoskAPI:ReadDataFiles():model.cc:303) Loading winfo /opt/vosk-model-en/model/graph/phones/word_boundary.int
LOG (VoskAPI:ReadDataFiles():model.cc:310) Loading subtract G.fst model from /opt/vosk-model-en/model/rescore/G.fst
LOG (VoskAPI:ReadDataFiles():model.cc:312) Loading CARPA model from /opt/vosk-model-en/model/rescore/G.carpa
LOG (VoskAPI:ReadDataFiles():model.cc:318) Loading RNNLM model from /opt/vosk-model-en/model/rnnlm/final.raw
Killed
root@bfa8f7d0aa1c:/opt/vosk-server/websocket#

only vosk-transcripr works fine

Best Regards
Pablo

Error loading module res_http_websocket_fix.so

Error loading module res_http_websocket_fix.so: /usr/lib64/asterisk/modules/res_websocket_fix.so: undefined symbol:ast_websocket_wait_for_input

asterisk 16.6.2

Not found ./configure, i have in folder only configure.ac

./configure --with-asterisk=/root/asterisk --prefix=/root/soxx
-bash: ./configure: No such file or directory

if i run /configure.ac then i give this log

/configure.ac --with-asterisk=/root/asterisk --prefix=/root/soxx
./configure.ac: line 1: syntax error near unexpected token asterisk-vosk,' ./configure.ac: line 1: AC_INIT(asterisk-vosk, 0.3.7)'

Recognition is suspended 15 seconds after the start of work

Recognition is suspended 15 seconds after the start of work.

same => n,SpeechBackground(Speech,10)

The timeout option does not affect performance.
I've tried setting to 5, 10, 40 and no value.

After the end of the conversation on the phone, the channel on the asterisk remains in a frozen state.

localhost*CLI> core show channels verbose
Channel Context Extension Prio State Application Data CallerID Duration Accountcode PeerAccount BridgeID
PJSIP/102-00000000 internal 1 10 Up SpeechBackgr Speech-2 102 00:12:58
PJSIP/102-00000001 internal 1 10 Up SpeechBackgr Speech-2,10 102 00:10:37
PJSIP/102-00000002 internal 1 10 Up SpeechBackgr Speech-2,10 102 00:04:41
3 active channels
3 active calls
3 calls processed

Setting vosk to timeout if no audio detected

Hello

I've got vosk-asterisk configured and it's working really well, thank you @nshmyrev - this is a fantastically useful tool for Asterisk!. My issue is, if there's no incoming sound, I can't figure out how to achieve a timeout from the vosk-server (https://github.com/alphacep/vosk-server).

In the dial plan, It's possible to set something like:
exten => 5000,n,SpeechBackground(silence-5.gsm,15)
This will set speechbackground to end after 15 seconds, but it's a hard limit and will cut off the audio stream to the vosk-server, even if mid sentence.

Looking at the Kaldi documentation here (https://github.com/kaldi-asr/kaldi/blob/e1dd07b13c7f14c4f8f5532a281a044819e838c0/src/online2/online-endpoint.h#L46), there's five rules. Rule 1 reads:

  /// rule1 times out after 5 seconds of silence, even if we decoded nothing.
  OnlineEndpointRule rule1;

This rule should be set by default (I'm using vosk-model-en-us-0.42-gigaspeech), but if I make a call and mute it, vosk keeps returning empty partial results to Astrerisk after 5 seconds. In model.conf, I've tried various settings in model.conf, such as:

--endpoint.rule1.must-contain-nonsilence=false
--endpoint.rule1.min-trailing-silence=5.0
--endpoint.rule1.min-utterance-length=5.0

However, these seems to have no effect.

My understanding is Rule 1 by default should end the vosk process after five seconds, even if no incoming sound was detected. Is this actually the case? (Or is there, perhaps, a better way to achieve the desired effect? I can use AMD in the dial plan to wait for audio to begin, but it will cut off the beginning of the utterance.)

Thanks in advance.

Asterisk Version: 19.8.0
FreePBX version: 16.0.40.3
Vosk Server: Ubuntu 22.04.1 LTS

SpeechBackground does not work

Hi
I use Freepbx (Asterisk 18.20.2)
At the SpeechBackground stage, it throws an error "exited non-zero". The call ends, the sound is not played.
What could be the reason?

log

 Goto (stt,s,1)
    -- Executing [s@stt:1] Answer("PJSIP/1001-00000012", "") in new stack
    -- Executing [s@stt:2] Wait("PJSIP/1001-00000012", "1") in new stack
    -- Executing [s@stt:3] SpeechCreate("PJSIP/1001-00000012", "") in new stack
    -- Executing [s@stt:4] SpeechBackground("PJSIP/1001-00000012", "custom/present_1") in new stack
  == Spawn extension (stt, s, 4) exited non-zero on 'PJSIP/1001-00000012'

extension

[stt]
exten => s,1,Answer()
exten => s,n,Wait(1)
exten => s,n,SpeechCreate
exten => s,n,SpeechBackground(custom/present_1)
#exten => s,n,Verbose(0,Result as ${SPEECH_TEXT(0)})
exten => s,n,Hangup()

The sound is played quietly through the Playback.
Server Vosk is up and running through python in a dialplan.
Example
exten => s,n,Set(RESULT=${SHELL(python3.6 /home/vosk-server/websocket/test_srt.py .....

Send eof on destroy

The websocket can be closed more elegantly. Sending 'eof' to Vosk seems to be the current polite way to do it (per asr_server.py)

Set Language

Hello, where is the file or how i can change the language, i'm wondering if this file

res_speech_vosk.conf

does support this variable, but also is possible on dialplan something like that:

same = n,Set(SPEECH_ENGINE(VOSK_CONFIG)={"config" : { "languageCode" : es"} })

thanks for your help

__ast_frdup: FRACK!, Failed assertion bad magic number 0x0

Here is the error details

[Jun 17 20:06:11] ERROR[24301][C-00000008]: frame.c:350 __ast_frdup: FRACK!, Failed assertion bad magic number 0x0 for object 0x1494048 (0)
[Jun 17 20:06:11] ERROR[24301][C-00000008]:   Got 12 backtrace records
# 0: /usr/sbin/asterisk(__ao2_ref+0x97) [0x45dca7]
# 1: /usr/sbin/asterisk(__ast_frdup+0x12d) [0x4effad]
# 2: /usr/sbin/asterisk(ast_translate+0x26d) [0x5a1fbd]
# 3: /usr/sbin/asterisk() [0x4ae9ca]
# 4: /usr/lib64/asterisk/modules/app_talkdetect.so(+0x19ac) [0x7f6af391c9ac]
# 5: /usr/sbin/asterisk(pbx_exec+0xb9) [0x52e5a9]
# 6: /usr/sbin/asterisk() [0x521f01]
# 7: /usr/sbin/asterisk() [0x523f64]
# 8: /usr/sbin/asterisk() [0x5254eb]
# 9: /usr/sbin/asterisk() [0x5a4e69]
alphacep/vosk-server#10: /lib64/libpthread.so.0(+0x7ea5) [0x7f6b47203ea5]
alphacep/vosk-server#11: /lib64/libc.so.6(clone+0x6d) [0x7f6b465a48dd]

[Jun 17 20:06:11] ERROR[24301][C-00000008]: frame.c:162 __frame_free: FRACK!, Failed assertion bad magic number 0x0 for object 0x1494048 (0)
[Jun 17 20:06:11] ERROR[24301][C-00000008]:   Got 10 backtrace records
# 0: /usr/sbin/asterisk(__ao2_ref+0x97) [0x45dca7]
# 1: /usr/sbin/asterisk(ast_frame_free+0xf3) [0x4efb53]
# 2: /usr/lib64/asterisk/modules/app_talkdetect.so(+0x1954) [0x7f6af391c954]
# 3: /usr/sbin/asterisk(pbx_exec+0xb9) [0x52e5a9]
# 4: /usr/sbin/asterisk() [0x521f01]
# 5: /usr/sbin/asterisk() [0x523f64]
# 6: /usr/sbin/asterisk() [0x5254eb]
# 7: /usr/sbin/asterisk() [0x5a4e69]
# 8: /lib64/libpthread.so.0(+0x7ea5) [0x7f6b47203ea5]
# 9: /lib64/libc.so.6(clone+0x6d) [0x7f6b465a48dd]

[Jun 17 20:06:11] ERROR[24301][C-00000008]: frame.c:350 __ast_frdup: FRACK!, Failed assertion bad magic number 0x0 for object 0x1494048 (0)
[Jun 17 20:06:11] ERROR[24301][C-00000008]:   Got 16 backtrace records
# 0: /usr/sbin/asterisk(__ao2_ref+0x97) [0x45dca7]
# 1: /usr/sbin/asterisk(__ast_frdup+0x12d) [0x4effad]
# 2: /usr/sbin/asterisk(ast_translate+0x26d) [0x5a1fbd]
# 3: /usr/sbin/asterisk(ast_audiohook_write_list+0x345) [0x467065]
# 4: /usr/sbin/asterisk(ast_write_stream+0xfcf) [0x4a63df]
# 5: /usr/sbin/asterisk() [0x4e2688]
# 6: /usr/sbin/asterisk() [0x4e2829]
# 7: /usr/sbin/asterisk() [0x4ae3a9]
# 8: /usr/lib64/asterisk/modules/app_talkdetect.so(+0x19ac) [0x7f6af391c9ac]
# 9: /usr/sbin/asterisk(pbx_exec+0xb9) [0x52e5a9]
alphacep/vosk-server#10: /usr/sbin/asterisk() [0x521f01]
alphacep/vosk-server#11: /usr/sbin/asterisk() [0x523f64]
alphacep/vosk-server#12: /usr/sbin/asterisk() [0x5254eb]
alphacep/vosk-server#13: /usr/sbin/asterisk() [0x5a4e69]
alphacep/vosk-server#14: /lib64/libpthread.so.0(+0x7ea5) [0x7f6b47203ea5]
alphacep/vosk-server#15: /lib64/libc.so.6(clone+0x6d) [0x7f6b465a48dd]

[Jun 17 20:06:11] ERROR[24301][C-00000008]: frame.c:140 __frame_free: FRACK!, Failed assertion bad magic number 0x0 for object 0x1494048 (0)
[Jun 17 20:06:11] ERROR[24301][C-00000008]:   Got 15 backtrace records
# 0: /usr/sbin/asterisk(__ao2_ref+0x97) [0x45dca7]
# 1: /usr/sbin/asterisk(ast_frame_free+0x193) [0x4efbf3]
# 2: /usr/sbin/asterisk(ast_audiohook_write_list+0x8b6) [0x4675d6]
# 3: /usr/sbin/asterisk(ast_write_stream+0xfcf) [0x4a63df]
# 4: /usr/sbin/asterisk() [0x4e2688]
# 5: /usr/sbin/asterisk() [0x4e2829]
# 6: /usr/sbin/asterisk() [0x4ae3a9]
# 7: /usr/lib64/asterisk/modules/app_talkdetect.so(+0x19ac) [0x7f6af391c9ac]
# 8: /usr/sbin/asterisk(pbx_exec+0xb9) [0x52e5a9]
# 9: /usr/sbin/asterisk() [0x521f01]
alphacep/vosk-server#10: /usr/sbin/asterisk() [0x523f64]
alphacep/vosk-server#11: /usr/sbin/asterisk() [0x5254eb]
alphacep/vosk-server#12: /usr/sbin/asterisk() [0x5a4e69]
alphacep/vosk-server#13: /lib64/libpthread.so.0(+0x7ea5) [0x7f6b47203ea5]
alphacep/vosk-server#14: /lib64/libc.so.6(clone+0x6d) [0x7f6b465a48dd]

[Jun 17 20:06:11] ERROR[24301][C-00000008]: frame.c:350 __ast_frdup: FRACK!, Failed assertion bad magic number 0x0 for object 0x1494048 (0)
[Jun 17 20:06:11] ERROR[24301][C-00000008]:   Got 16 backtrace records
# 0: /usr/sbin/asterisk(__ao2_ref+0x97) [0x45dca7]
# 1: /usr/sbin/asterisk(__ast_frdup+0x12d) [0x4effad]
# 2: /usr/sbin/asterisk(ast_translate+0x26d) [0x5a1fbd]
# 3: /usr/sbin/asterisk(ast_audiohook_write_list+0x345) [0x467065]
# 4: /usr/sbin/asterisk(ast_write_stream+0xfcf) [0x4a63df]
# 5: /usr/sbin/asterisk() [0x4e2688]
# 6: /usr/sbin/asterisk() [0x4e2829]
# 7: /usr/sbin/asterisk() [0x4ae3a9]
# 8: /usr/lib64/asterisk/modules/app_talkdetect.so(+0x19ac) [0x7f6af391c9ac]
# 9: /usr/sbin/asterisk(pbx_exec+0xb9) [0x52e5a9]
alphacep/vosk-server#10: /usr/sbin/asterisk() [0x521f01]
alphacep/vosk-server#11: /usr/sbin/asterisk() [0x523f64]
alphacep/vosk-server#12: /usr/sbin/asterisk() [0x5254eb]
alphacep/vosk-server#13: /usr/sbin/asterisk() [0x5a4e69]
alphacep/vosk-server#14: /lib64/libpthread.so.0(+0x7ea5) [0x7f6b47203ea5]
alphacep/vosk-server#15: /lib64/libc.so.6(clone+0x6d) [0x7f6b465a48dd]

FreeBSD Support

Is it possible to use this module on FreeBSD?

Even though I copied the module and libc.so.6 from Linux machine, it still throws an error when loading:
[Jan 28 05:39:22] ERROR[100106] loader.c: Error loading module 'res_speech_vosk.so': /usr/local/lib/compat/libc.so.6: version GLIBC_2.14 required by /usr/local/lib/asterisk/modules/res_speech_vosk.so not defined

vosk_voice_ua

vosk-asterisk is not compatible with Asterisk 13

res-speech-vosk don't load properly on Asterisk 13. It uses a function (ast_websocket_wait_for_input) and indirectly a class (iostream) that it's only available on Asterisk 16 and Asterisk 17.

Is there any way to use it with Asterisk 13?

Congrats for your excellent work.

Answer about silence.

Hello.

Is there any way to adjust the silence threshold?

Thank you in advance.

installation successful, but no application "SpeechCreate"

Sorry for the disturbance of possible junior question.

Seems everything is correct, however, can't call SpeechCreate. Thank you.

Periodic jerks heard when playing voice files through SpeechBackground

Periodic jerks heard when playing voice files through SpeechBackground.
Asterisk and docker work on the same virtual server(4 core, 12Gb RAM), running on Linux host.

Bug Report

Here is a crash point ( 5555 extensions has a predefined IVR )

[rest]
exten => 4567,1,answer()
exten => 4567,n,Dial(SIP/trunk/5555,30,U(postanswer))
exten => 4567,n,Hangup()

[postanswer]
exten => s,1,SpeechCreate
exten => s,n,SpeechBackground(en/silence/1)
exten => s,n,Verbose(0,Result was ${SPEECH_TEXT(0)})
exten => s,n,SpeechDestroy()

[Aug 10 13:06:05] NOTICE[7005][C-00000002]: res_speech_vosk.c:98 vosk_recog_destroy: (vosk) Destroy speech resource
  == WebSocket connection to '51.83.25.28:2700' closed
[Aug 10 13:06:05] NOTICE[7005][C-00000002]: app_stack.c:1081 gosub_run: SIP/testtrunk-00000001 Abnormal 'Gosub(postanswer,s,1)' exit.  Popping routine return locations.
    -- Channel SIP/testtrunk-00000001 joined 'simple_bridge' basic-bridge <73070389-1ed8-4706-9f65-20b643c5dd6c>
    -- Channel SIP/325-00000000 joined 'simple_bridge' basic-bridge <73070389-1ed8-4706-9f65-20b643c5dd6c>
    -- Channel SIP/testtrunk-00000001 left 'native_rtp' basic-bridge <73070389-1ed8-4706-9f65-20b643c5dd6c>
    -- Channel SIP/325-00000000 left 'native_rtp' basic-bridge <73070389-1ed8-4706-9f65-20b643c5dd6c>
  == Spawn extension (asr, 4567, 2) exited non-zero on 'SIP/325-00000000'
[Aug 10 13:06:05] ERROR[7006][C-00000002]: channel.c:3020 ast_waitfor_nandfds: FRACK!, Failed assertion bad magic number 0x0 for object 0x7f374c009a00 (0)
[Aug 10 13:06:05] ERROR[7006][C-00000002]:   Got 7 backtrace records
# 0: /usr/sbin/asterisk(__ao2_lock+0x7b) [0x45da6b]
# 1: /usr/sbin/asterisk(ast_waitfor_nandfds+0xfd) [0x49d4fd]
# 2: /usr/sbin/asterisk(ast_waitfor_n+0x17) [0x49dcd7]
# 3: /usr/sbin/asterisk() [0x468e9d]
# 4: /usr/sbin/asterisk() [0x5a4a39]
# 5: /usr/lib64/libpthread.so.0(+0x7ea5) [0x7f376cd58ea5]
# 6: /usr/lib64/libc.so.6(clone+0x6d) [0x7f376c0f98dd]

[Aug 10 13:06:05] ERROR[7006][C-00000002]: channel.c:3036 ast_waitfor_nandfds: FRACK!, Failed assertion bad magic number 0x0 for object 0x7f374c009a00 (0)
[Aug 10 13:06:05] ERROR[7006][C-00000002]:   Got 7 backtrace records
# 0: /usr/sbin/asterisk(__ao2_unlock+0x75) [0x45dc75]
# 1: /usr/sbin/asterisk(ast_waitfor_nandfds+0xd4) [0x49d4d4]
# 2: /usr/sbin/asterisk(ast_waitfor_n+0x17) [0x49dcd7]
# 3: /usr/sbin/asterisk() [0x468e9d]
# 4: /usr/sbin/asterisk() [0x5a4a39]
# 5: /usr/lib64/libpthread.so.0(+0x7ea5) [0x7f376cd58ea5]
# 6: /usr/lib64/libc.so.6(clone+0x6d) [0x7f376c0f98dd]

I tried both ( with patch and without patch )
Filename: app_speech_utils .c

With Patch:
https://issues.asterisk.org/jira/secure/attachment/57683/app_speech_utils.diff

Asterisk Version: 16.12.0

Issues with installing on FreePBX - Could not find asterisk.h

Hi there.

I have a FPBX server installed from ISO image downloaded from official site working just fine. But have issues following steps outlined in Readme though with ./configure command spitting out Could not find asterisk.h, make sure Asterisk development package is installed error.

I've installed dev package with yum install asterisk18-devel and now can find asterisk.h in /usr/include directory but still get the same error.

My guess is I'm using ./configure command with wrong parameters but I'm unsure what those should be.

I'm confused with provided example - --with-asterisk=<path_to_asterisk_source> --prefix=<path_to_install>. What should those represent? Found few pages like this one but it's still unclear what the actual parametrs should be. I tried few directories as parameters, like: /usr/src/freepbx, /var/lib/asterisk, etc... but now it's just pushing buttons randomly.

Anyone has any idea what am I doing wrong?

[improvement][suggestion]: Too many NOTICE messages in the logs.

It should at least be an option to quiet this module when there are empty partial results and/or the results have not changed yet since the last analysis.

Custom Grammar Support

Any chance grammar loading will be implemented?

vosk-asterisk/res-speech-vosk/res_speech_vosk.c

Line 121 in f5f800c

/*! \brief Load a local grammar on the speech structure */

A little documentation for this?

I'm trying to make DISA accept a phone number to call by speaking the digits instead of using the keypad.
Being very new to both asterisk and vosk, I can't figure out how this extension works.
I mailed the author but got no reply. Help ?

Install Vosk-Asterisk Module in Freepbx

Theres a way to use the res_speech_vosk.so in FreePbx? i tried to compile but is not working

Vosk

в readme в словe bootstrap ошибка

в readme в словe bootstrap ошибка. Может отпугивать новичков если при старте скопипастят и увидят ошибку

No recognition data back from Kaldi - Asterisk 18

Hi everyone,

Thanks for this beautiful piece of a software. It seems to be working fine with a Python example script feeding the Kaldi server with chunks of .wav file. Same trick I wanted to perform in Asterisk by following official README instructions, however, it doesn't work as intended.

Just FYI this is my setup:

Asterisk v.18.6
Debian 11
Kernel: Linux 5.10.0-21-amd64 x86_64
Python 3.9.2

First of all, there is a Kaldi server running inside a Docker container:

foo@bar:/etc/asterisk# docker ps
CONTAINER ID   IMAGE                      COMMAND                  CREATED          STATUS          PORTS                                       NAMES
2802ee1c618c   alphacep/kaldi-ru:latest   "python3 ./asr_serve…"   36 minutes ago   Up 36 minutes   0.0.0.0:2700->2700/tcp, :::2700->2700/tcp   suspicious_lamport

Next, I've cloned & built Asterisk's git branch code of v18.6 (no luck with latest v18.19 though):

foo@bar:/etc/asterisk# asterisk -rvvvvvvvvvvvvvvvvvvvvvvv
Asterisk 18.6.0, Copyright (C) 1999 - 2021, Sangoma Technologies Corporation and others.
...

So the Asterisk version seems to be okay at this point. The vosk-asterisk lib was also built and installed in to Asterisk's default lib directory as follows:

foo@bar:/etc/asterisk# ls -l /usr/lib/asterisk/modules | grep -i vosk
-rw-r--r-- 1 root root  117350 Jan 31 02:32 res_speech_vosk.a
-rwxr-xr-x 1 root root     990 Jan 31 02:32 res_speech_vosk.la
-rwxr-xr-x 1 root root   73408 Jan 31 02:32 res_speech_vosk.so

These libs were mentioned in modules.conf of the Asterisk, and loaded properly as well:

foo*CLI> module show like vosk
Module                         Description                              Use Count  Status      Support Level
res_speech_vosk.so             Vosk Speech Engine                       0          Running              core
1 modules loaded
foo*CLI> module show like speech
Module                         Description                              Use Count  Status      Support Level
app_speech_utils.so            Dialplan Speech Applications             0          Running              core
res_speech.so                  Generic Speech Recognition API           2          Running              core
res_speech_vosk.so             Vosk Speech Engine                       0          Running              core
3 modules loaded

Here's also a piece of my extensions.conf dialplan code to run speech recognition:

[internal]
exten => 111,1,NoOp()
same => n,Answer()
same => n,SpeechCreate()
same => n,Wait(1)
same => n,SpeechBackground(/var/spool/asterisk/recording/ru-long1, 90)
same => n,Verbose(0,Result was ${SPEECH_TEXT(0)})
same => n,hangup()

So now, when it comes to have all the magic to happen, I dial 111 exten and observe the following:

[Jan 31 02:46:37] NOTICE[169355][C-00000003]: res_speech_vosk.c:204 vosk_recog_start: (vosk) Start recognition
[Jan 31 02:46:38] NOTICE[169355][C-00000003]: res_speech_vosk.c:164 vosk_recog_write: (vosk) Got result: '{
  "partial" : ""
}'
[Jan 31 02:46:38] NOTICE[169355][C-00000003]: res_speech_vosk.c:164 vosk_recog_write: (vosk) Got result: '{
  "partial" : ""
}'
[Jan 31 02:46:38] NOTICE[169355][C-00000003]: res_speech_vosk.c:164 vosk_recog_write: (vosk) Got result: '{
  "partial" : ""
}'
[Jan 31 02:46:38] NOTICE[169355][C-00000003]: res_speech_vosk.c:164 vosk_recog_write: (vosk) Got result: '{
  "partial" : ""
}'
[Jan 31 02:46:38] NOTICE[169355][C-00000003]: res_speech_vosk.c:164 vosk_recog_write: (vosk) Got result: '{
  "partial" : ""
}'
[Jan 31 02:46:39] NOTICE[169355][C-00000003]: res_speech_vosk.c:164 vosk_recog_write: (vosk) Got result: '{
  "partial" : ""
}'
[Jan 31 02:46:39] NOTICE[169355][C-00000003]: res_speech_vosk.c:164 vosk_recog_write: (vosk) Got result: '{
  "partial" : ""
}'
...

... it's just a bunch of empty responses back from the Kaldi server as I understand. There is no real media chunks passing there, so no recognition happens actually. There was an idea to trace VM's internal traffic with tcpdump and I've managed to catch some interesting info:

02:31:40.874029 lo    In  IP localhost.2700 > localhost.57494: Flags [.], ack 138126, win 512, options [nop,nop,TS val 4031340119 ecr 4031340077], length 0
02:31:40.954156 vethb24d8fa P   IP 172.17.0.2.2700 > 172.17.0.1.51092: Flags [P.], seq 1123:1145, ack 138126, win 2105, options [nop,nop,TS val 3874166157 ecr 1350685573], length 22
02:31:40.954171 docker0 In  IP 172.17.0.2.2700 > 172.17.0.1.51092: Flags [P.], seq 1123:1145, ack 138126, win 2105, options [nop,nop,TS val 3874166157 ecr 1350685573], length 22
02:31:40.954200 docker0 Out IP 172.17.0.1.51092 > 172.17.0.2.2700: Flags [.], ack 1145, win 501, options [nop,nop,TS val 1350685695 ecr 3874166157], length 0
02:31:40.954205 vethb24d8fa Out IP 172.17.0.1.51092 > 172.17.0.2.2700: Flags [.], ack 1145, win 501, options [nop,nop,TS val 1350685695 ecr 3874166157], length 0
02:31:40.954319 lo    In  IP localhost.2700 > localhost.57494: Flags [P.], seq 1123:1145, ack 138126, win 512, options [nop,nop,TS val 4031340200 ecr 4031340077], length 22
02:31:40.954328 lo    In  IP localhost.57494 > localhost.2700: Flags [.], ack 1145, win 512, options [nop,nop,TS val 4031340200 ecr 4031340200], length 0
02:31:41.022153 lo    In  IP localhost.57494 > localhost.2700: Flags [P.], seq 138126:141334, ack 1145, win 512, options [nop,nop,TS val 4031340267 ecr 4031340200], length 3208
02:31:41.022194 lo    In  IP localhost.2700 > localhost.57494: Flags [.], ack 141334, win 495, options [nop,nop,TS val 4031340267 ecr 4031340267], length 0
02:31:41.022281 docker0 Out IP 172.17.0.1.51092 > 172.17.0.2.2700: Flags [P.], seq 138126:141334, ack 1145, win 501, options [nop,nop,TS val 1350685763 ecr 3874166157], length 3208
02:31:41.022286 vethb24d8fa Out IP 172.17.0.1.51092 > 172.17.0.2.2700: Flags [P.], seq 138126:141334, ack 1145, win 501, options [nop,nop,TS val 1350685763 ecr 3874166157], length 3208
02:31:41.022308 vethb24d8fa P   IP 172.17.0.2.2700 > 172.17.0.1.51092: Flags [.], ack 141334, win 2155, options [nop,nop,TS val 3874166226 ecr 1350685763], length 0
02:31:41.022314 docker0 In  IP 172.17.0.2.2700 > 172.17.0.1.51092: Flags [.], ack 141334, win 2155, options [nop,nop,TS val 3874166226 ecr 1350685763], length 0
02:31:41.124809 vethb24d8fa P   IP 172.17.0.2.2700 > 172.17.0.1.51092: Flags [P.], seq 1145:1167, ack 141334, win 2155, options [nop,nop,TS val 3874166328 ecr 1350685763], length 22
02:31:41.124816 docker0 In  IP 172.17.0.2.2700 > 172.17.0.1.51092: Flags [P.], seq 1145:1167, ack 141334, win 2155, options [nop,nop,TS val 3874166328 ecr 1350685763], length 22
02:31:41.124837 docker0 Out IP 172.17.0.1.51092 > 172.17.0.2.2700: Flags [.], ack 1167, win 501, options [nop,nop,TS val 1350685866 ecr 3874166328], length 0
02:31:41.124840 vethb24d8fa Out IP 172.17.0.1.51092 > 172.17.0.2.2700: Flags [.], ack 1167, win 501, options [nop,nop,TS val 1350685866 ecr 3874166328], length 0
02:31:41.124906 lo    In  IP localhost.2700 > localhost.57494: Flags [P.], seq 1145:1167, ack 141334, win 512, options [nop,nop,TS val 4031340370 ecr 4031340267], length 22
02:31:41.124915 lo    In  IP localhost.57494 > localhost.2700: Flags [.], ack 1167, win 512, options [nop,nop,TS val 4031340370 ecr 4031340370], length 0
02:31:41.211333 lo    In  IP localhost.57494 > localhost.2700: Flags [P.], seq 141334:144542, ack 1167, win 512, options [nop,nop,TS val 4031340457 ecr 4031340370], length 3208
02:31:41.211389 lo    In  IP localhost.2700 > localhost.57494: Flags [.], ack 144542, win 495, options [nop,nop,TS val 4031340457 ecr 4031340457], length 0
02:31:41.211601 docker0 Out IP 172.17.0.1.51092 > 172.17.0.2.2700: Flags [P.], seq 141334:144542, ack 1167, win 501, options [nop,nop,TS val 1350685953 ecr 3874166328], length 3208
02:31:41.211612 vethb24d8fa Out IP 172.17.0.1.51092 > 172.17.0.2.2700: Flags [P.], seq 141334:144542, ack 1167, win 501, options [nop,nop,TS val 1350685953 ecr 3874166328], length 3208
02:31:41.211645 vethb24d8fa P   IP 172.17.0.2.2700 > 172.17.0.1.51092: Flags [.], ack 144542, win 2205, options [nop,nop,TS val 3874166415 ecr 1350685953], length 0
02:31:41.211655 docker0 In  IP 172.17.0.2.2700 > 172.17.0.1.51092: Flags [.], ack 144542, win 2205, options [nop,nop,TS val 3874166415 ecr 1350685953], length 0
02:31:41.216790 vethb24d8fa P   IP 172.17.0.2.2700 > 172.17.0.1.51092: Flags [P.], seq 1167:1189, ack 144542, win 2205, options [nop,nop,TS val 3874166420 ecr 1350685953], length 22
02:31:41.216803 docker0 In  IP 172.17.0.2.2700 > 172.17.0.1.51092: Flags [P.], seq 1167:1189, ack 144542, win 2205, options [nop,nop,TS val 3874166420 ecr 1350685953], length 22

Please pay your attention to the very end of each string of this output - you should see the packet lengths are pretty much the same, implying the abscense of any significant media payload inside the IP packets flowing between the Asterisk and Kaldi server. When I was testing Kaldi with the Python script, tracing all the IP traffic again, it had different packet lengths all the time, so I assume there was a real media exchange as it should be.

Last thing, this is the file I'm trying to play and recognize:

/var/spool/asterisk/recording/ru-long1.wav: RIFF (little-endian) data, WAVE audio, Microsoft PCM, 16 bit, mono 8000 Hz

Let me know if I could give you more information on my setup, what was configured and how, show you some logs etc. Thank you!

Is there anyway to control how long after the last word is spoken before Vosk closes the session?

I am using the Python implementation and would like to limit how long the system will wait before closing the session.

Are there any parameter files I can create?

python3 ./asr_server.py /opt/vosk-model-en/model

Compilation error

[root@dialer6 vosk-asterisk]# ./configure --with-asterisk=/usr/src/software/asterisk-16.12.0/ --prefix=/usr/lib64/asterisk/
checking for a BSD-compatible install... /usr/bin/install -c
checking whether build environment is sane... yes
checking for a thread-safe mkdir -p... /usr/bin/mkdir -p
checking for gawk... gawk
checking whether make sets $(MAKE)... yes
checking whether make supports nested variables... yes
checking build system type... x86_64-unknown-linux-gnu
checking host system type... x86_64-unknown-linux-gnu
checking for gcc... gcc
checking whether the C compiler works... yes
checking for C compiler default output file name... a.out
checking for suffix of executables...
checking whether we are cross compiling... no
checking for suffix of object files... o
checking whether we are using the GNU C compiler... yes
checking whether gcc accepts -g... yes
checking for gcc option to accept ISO C89... none needed
checking for style of include used by make... GNU
checking dependency style of gcc... gcc3
./configure: line 3417: LT_INIT: command not found
checking that generated files are newer than configure... done
configure: creating ./config.status
config.status: creating Makefile
config.status: error: cannot find input file: `res-speech-vosk/Makefile.in'

Random segmentation fault

Hi.

I'm here begging for help or at least some directions.
I get sometimes, not always, this segfault:

#0  __memmove_sse2_unaligned_erms () at ../sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S:489
#1  0x00007f3d8dc41fdd in vosk_recog_write (speech=0x7f3d8800f6b0, data=<optimized out>, len=320)
    at res_speech_vosk.c:156
#2  0x00007f3d8d98055a in speech_background (chan=0x7f3d800056e0, data=<optimized out>) at app_speech_utils.c:855

Don't always get it, it's so random!
And to put things more easy, I've modified app_speech_utils.c a little. But, I think my modifications don't have to do with this part of code... I'm not 100% sure.

I attach here my asterisk patch and my vosk-asterisk patch for the braves who want to try. I have tried them with 16.18.0 and git version of asterisk. My modifications try to solve a pair of problems I encounter:

SpeechBackground() stops the locution when detect speech and I want that don't stop and keep all text of the speech during the play of the locution.
SpeechBackground() timeout fails in the way that if user start to speak while in the locution play and don't shut up, timeout triggers and don't get any speech.

Hope you can at least point me in a good direction to follow.
Thank you in advance.

ast-patch.txt
vosk-ast-patch.txt
core-thread1.txt

Readme never instructs you on how to use this with asterisk

The read me jumps from getting asterisk to building vosk-asterisk and you never actually put the files in the correct places? It is a little confusing.

can not load the res_speech_vosk.so from asterisk

hello:
I installed it with asterisk-20 and compiled without any error also edit the modules files and let the modules load. but when I start an asterisk, i can not see the module's name. anything wrong?

thanks!

No documentation!

The documentation doesn't explain how to add this plugin to Asterisk sources.

Numerical results

I congratulate you on the excellent work, I was testing and everything works very well.
I only have one query, if I wanted to get the result in the json format that appears in the log, what would I have to do?

I would also like to know if it is possible for you to give me numerical results.

excuse my English. Great job!

issue with building

Hello,
receiving this error as I attempt to check out your product. System is Centos 7 with Asterisk 14.5.0 compiled from source.

/bin/sh ../libtool --tag=CC --mode=compile gcc -DHAVE_CONFIG_H -I../include -I/usr/src/asterisk-14.5.0/include -DAST_MODULE_SELF_SYM="__internal_res_speech_vosk" -g -O2 -MT res_speech_vosk.lo -MD -MP -MF .deps/res_speech_vosk.Tpo -c -o res_speech_vosk.lo res_speech_vosk.c
libtool: compile: gcc -DHAVE_CONFIG_H -I../include -I/usr/src/asterisk-14.5.0/include -DAST_MODULE_SELF_SYM=__internal_res_speech_vosk -g -O2 -MT res_speech_vosk.lo -MD -MP -MF .deps/res_speech_vosk.Tpo -c res_speech_vosk.c -fPIC -DPIC -o .libs/res_speech_vosk.o
res_speech_vosk.c: In function 'vosk_recog_write':
res_speech_vosk.c:166:24: warning: initialization makes pointer from integer without a cast [enabled by default]
const char *text = ast_json_object_string_get(res_json, "text");
^
libtool: compile: gcc -DHAVE_CONFIG_H -I../include -I/usr/src/asterisk-14.5.0/include -DAST_MODULE_SELF_SYM=__internal_res_speech_vosk -g -O2 -MT res_speech_vosk.lo -MD -MP -MF .deps/res_speech_vosk.Tpo -c res_speech_vosk.c -o res_speech_vosk.o >/dev/null 2>&1
mv -f .deps/res_speech_vosk.Tpo .deps/res_speech_vosk.Plo
/bin/sh ../libtool --tag=CC --mode=link gcc -DAST_MODULE_SELF_SYM="__internal_res_speech_vosk" -g -O2 -avoid-version -no-undefined -module -o res_speech_vosk.la -rpath NONE/lib/asterisk/modules res_speech_vosk.lo
libtool: link: only absolute run-paths are allowed
make[2]: *** [res_speech_vosk.la] Error 1
make[2]: Leaving directory /usr/src/vosk-asterisk/res-speech-vosk' make[1]: *** [all-recursive] Error 1 make[1]: Leaving directory /usr/src/vosk-asterisk'
make: *** [all] Error 2

can anyone help me??

please, i want to install but i have error, can anyone help me??

installation: configure script and target paths

Hi,

The configure script assumes that the Asterisk modules are located in PREFIX/lib. However, on some systems such as Gentoo Linux /lib, /lib64, /usr/lib and /usr/lib64 are all different directories with different content (no symlinks as in the olden days).
Also, the following assumes the PREFIX variable will be used for both the conf file and the modules which is usually not the case:

ASTERISK_MODDIR="${prefix}/lib/asterisk/modules"
ASTERISK_CONF_DIR="${prefix}/etc/asterisk"

To make a long story short I want to be able to install the Asterisk module in /usr/lib64/asterisk/modules and the conf file in /etc/asterisk.

AMD

That's why we are using the vox server to transcribe the call audio right after Answer(), still in early media.
How will it be possible to detect an AMD even before answering the call?

GSM operators play the voicemail message without answering the call, hence the importance of transcribing the call even before answering

I will send the audio of the messages that the operators play before forwarding to voicemail

how to get variable from extensions.conf

in extensions.conf
same = n,Set(SPEECH_ENGINE(VOSK_CONFIG)={"config" : {""}}

how to get VOSK_CONFIG in res_speech_vosk.c

I have tried to use ast_variable_retrieve but it has not worked for me.

Распознавание идет только 20 секунд

Прямо перед запуском столкнулись со следующей проблемой: распознавание речи идет всегда только в пределах 20 секунд.
После этого сессия закрывается и все отрабатывает как и положено.
То есть распознавание прерывается не аварийно.

Вопрос № 13 читал, [res_http_websocket.c:] пересобрал отдельно из последнего астериска.

Гугление не дает вообще ничего, в 13 вопросе никакой информации так и не появилось(

 {
      "conf" : 1.000000,
      "end" : 18.060000,
      "start" : 17.520000,
      "word" : "один"
    }, {
      "conf" : 1.000000,
      "end" : 18.720000,
      "start" : 18.180000,
      "word" : "один"
    }, {
      "conf" : 1.000000,
      "end" : 19.350000,
      "start" : 18.870000,
      "word" : "один"
    }, {
      "conf" : 1.000000,
      "end" : 20.160000,
      "start" : 19.710000,
      "word" : "один"
    }],
  "text" : "один один один один один один один один один один один один один один один один один один один один один один один один один один один один"
}'
[2023-08-04 09:29:17] NOTICE[21731][C-00000054]: res_speech_vosk.c:175 vosk_recog_write: (vosk) Recognition result: один один один один один один один один один один один один один один один один один один один один один один один один один один один один

Hi, all work, but very bad recognition, on russian language, often the first letter disappears of the word.

but whit this microphone, hardware and without asterisk only on vosk-api server websocket mic work more good then with asterisk. I think my asterisk channel codec or comfort noise generation or DENOISE or VAD, make my recognise is poor.
May help me with setting asterisk: codec, vad, denoise, CNG, etc configurations recommended for best recogition.

Web socket closed abruptly

Anyone faced this kinds of error

-- Executing [2000@default:1] NoOp("SIP/9000-00000001", "") in new stack
-- Executing [2000@default:2] Wait("SIP/9000-00000001", "1") in new stack
-- Executing [2000@default:3] SpeechCreate("SIP/9000-00000001", "") in new stack

[Jun 14 05:41:16] NOTICE[2527][C-00000002]: res_speech_vosk.c:83 vosk_recog_create: (vosk) Create speech resource ws://localhost:2700
[Jun 14 05:41:16] NOTICE[2527][C-00000002]: res_speech_vosk.c:91 vosk_recog_create: (vosk) Created speech resource result 0
-- Executing [2000@default:4] SpeechBackground("SIP/9000-00000001", "020") in new stack
[Jun 14 05:41:17] NOTICE[2527][C-00000002]: res_speech_vosk.c:196 vosk_recog_start: (vosk) Start recognition
[Jun 14 05:41:17] WARNING[2527][C-00000002]: res_http_websocket.c:523 ws_safe_read: Web socket closed abruptly
[Jun 14 05:41:17] ERROR[2527][C-00000002]: res_http_websocket.c:1415 __ast_websocket_read_string: Client WebSocket string read - error reading string data
[Jun 14 05:41:18] WARNING[2527][C-00000002]: res_http_websocket.c:523 ws_safe_read: Web socket closed abruptly
[Jun 14 05:41:18] ERROR[2527][C-00000002]: res_http_websocket.c:1415 __ast_websocket_read_string: Client WebSocket string read - error reading string data
[Jun 14 05:41:18] WARNING[2527][C-00000002]: res_http_websocket.c:523 ws_safe_read: Web socket closed abruptly
[Jun 14 05:41:18] ERROR[2527][C-00000002]: res_http_websocket.c:1415 __ast_websocket_read_string: Client WebSocket string read - error reading string data
[Jun 14 05:41:18] WARNING[2527][C-00000002]: res_http_websocket.c:523 ws_safe_read: Web socket closed abruptly
[Jun 14 05:41:18] ERROR[2527][C-00000002]: res_http_websocket.c:1415 __ast_websocket_read_string: Client WebSocket string read - error reading string data
[Jun 14 05:41:19] WARNING[2527][C-00000002]: res_http_websocket.c:523 ws_safe_read: Web socket closed abruptly
[Jun 14 05:41:19] ERROR[2527][C-00000002]: res_http_websocket.c:1415 __ast_websocket_read_string: Client WebSocket string read - error reading string data
[Jun 14 05:41:19] WARNING[2527][C-00000002]: res_http_websocket.c:523 ws_safe_read: Web socket closed abruptly
[Jun 14 05:41:19] ERROR[2527][C-00000002]: res_http_websocket.c:1415 __ast_websocket_read_string: Client WebSocket string read - error reading string data
[Jun 14 05:41:19] WARNING[2527][C-00000002]: res_http_websocket.c:523 ws_safe_read: Web socket closed abruptly
[Jun 14 05:41:19] ERROR[2527][C-00000002]: res_http_websocket.c:1415 __ast_websocket_read_string: Client WebSocket string read - error reading string data
[Jun 14 05:41:20] WARNING[2527][C-00000002]: res_http_websocket.c:523 ws_safe_read: Web socket closed abruptly
[Jun 14 05:41:20] ERROR[2527][C-00000002]: res_http_websocket.c:1415 __ast_websocket_read_string: Client WebSocket string read - error reading string data
[Jun 14 05:41:20] WARNING[2527][C-00000002]: res_http_websocket.c:523 ws_safe_read: Web socket closed abruptly
[Jun 14 05:41:20] ERROR[2527][C-00000002]: res_http_websocket.c:1415 __ast_websocket_read_string: Client WebSocket string read - error reading string data

Broken changes for Asterisk 17.5.1

This is c551a1d commit. After update without fixed websocket module doesn't work.

[Jul 12 12:24:23] NOTICE[558088][C-00000001]: res_speech_vosk.c:83 vosk_recog_create: (vosk) Create speech resource ws://127.0.0.1:2700
[Jul 12 12:24:23] NOTICE[558088][C-00000001]: res_speech_vosk.c:91 vosk_recog_create: (vosk) Created speech resource result 0
    -- Executing [1@internal-r:4] SpeechBackground("PJSIP/6002-00000000", "hello") in new stack
[Jul 12 12:24:23] NOTICE[558088][C-00000001]: res_speech_vosk.c:201 vosk_recog_start: (vosk) Start recognition
[Jul 12 12:24:23] NOTICE[558088][C-00000001]: res_speech_vosk.c:182 vosk_recog_write: (vosk) Got error result -1
[Jul 12 12:24:26] NOTICE[558088][C-00000001]: res_speech_vosk.c:98 vosk_recog_destroy: (vosk) Destroy speech resource

vosk-server

Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/websockets/server.py", line 169, in handler
    yield from self.ws_handler(self, path)
  File "./asr_server.py", line 55, in recognize
    message = await websocket.recv()
  File "/usr/lib/python3/dist-packages/websockets/protocol.py", line 434, in recv
    yield from self.ensure_open()
  File "/usr/lib/python3/dist-packages/websockets/protocol.py", line 658, in ensure_open
    ) from self.transfer_data_exc
websockets.exceptions.ConnectionClosed: WebSocket connection is closed: code = 1006 (connection closed abnormally [internal]), no reason

Not install on Centos 8

Centos 8 asterisk 16
./bootstrap
configure.ac:7 error: required file 'config.h.in' not found

Dangerous Functions blocked when using SpeechCreate

Hi,

First of all I want to thank and congratulate you on the excellent work. Now onto the issue. Using Asterisk 16.16, module compiles and loads fine, but when SpeechCreate is invoked many dialplan functions stop working, reporting they are "dangerous". This can be resolved by configuring live_dangerously=yes in asterisk.conf. I do not think the module should trigger this anyways, as the danger features are for variables being set from external APIs.. not sure if a module would be considered external?

Here is a sample dialplan, res_speeck_vosk is loaded and working:

exten => 700,1,Answer
exten => 700,n,SpeechCreate
exten => 700,n,Set(CONFBRIDGE(user,music_on_hold_when_empty)=yes)

And this the error when dialing that extension:

[2021-02-20 15:43:03] ERROR[797][C-00000007]: pbx_functions.c:703 ast_func_write: Dangerous function CONFBRIDGE write blocked

All DB functions are blocked also, this affects the dialplan macros that rely on querying astdb to get data.

Commenting the SpeechCreate resolves the issue, so it is triggered by the speech functions in Asterisk.

I really do not know if its an Asterisk issue, or a vosk issue. I have never used speech engines for Asterisk and I do not have others to teset, so I cannot really say.

Thanks again for your work and for your time.

alphacep / vosk-asterisk Goto Github PK

vosk-asterisk's Introduction

Vosk speech recognition modules for Asterisk

Installation

vosk-asterisk's People

Contributors

Stargazers

Watchers

Forkers

vosk-asterisk's Issues

Recommend Projects

Recommend Topics

Recommend Org

Jobs