Hello, I am getting live stream audio which I want to convert into format readable my

Need help understanding FFmpeg implementation in speech-recognition-from-url for livestream decode of audio stream about sherpa-onnx HOT 6 OPEN

tempops commented on July 19, 2024

Need help understanding FFmpeg implementation in speech-recognition-from-url for livestream decode of audio stream

from sherpa-onnx.

Comments (6)

csukuangfj commented on July 19, 2024

My first question is regarding this, does ffmpeg here convert the streaming data in real time to pcm_s16le format and does it need data to be saved in memory (on file) to do it? or does it work in real-time livestream without creating a file.

As you can read in the following code

sherpa-onnx/python-api-examples/speech-recognition-from-url.py

Lines 8 to 9 in 009ed2c

 # (1) RTMP 

 # rtmp://localhost/live/livestream

The code also supports a live stream URL. So my reasoning is that it won't create a file. Otherwise, if the stream never ends, how large would be the created file ?

The processed data is pushed to a pipe in the RAM.

Can you explain what impact frames_per_read

If you want to read data from the pipe, you must decide how much to read. The more you read, the more time you have to wait until the requested amount of data is available.

samples.astype(np.float32) / 32768, here why we divide by 32768

The feature extractor expects that the input samples are in the range [-1, 1].

how this code block works:

It reads some data from the pipe
convert the data into the range [-1, 1]
send the data to the stream object for processing

const lsProcess = ffmpeg_child_process.spawn('ffmpeg',['-i', 'input.wav', '-f', 's16le', '-acodec', 'pcm_s16le','-ac','1','-ar','16000','new_try.wav','-y'],);

Please send us the file new_try.wav.

do I need to also implement the frames_per_read and samples.astype(np.float32) /32768?

Could you explain why you need this? You said you processed the new_try.wav file with online-client-websocket-decode-files.py, but that file does not need you to write any code. Could you tell us what extra steps you have done?

If you use files from us, you don't need to write any code. Everything has been done by us.
All you need to do is to provide the data.

If the data does not have the requested format, you would get an informative error message.

from sherpa-onnx.

tempops commented on July 19, 2024

Thank you for the response,

As per what you have said here let me explain the process which I am following:

Could you explain why you need this? You said you processed the new_try.wav file with online-client-websocket-decode-files.py, but that file does not need you to write any code. Could you tell us what extra steps you have done?

I have a Typescript Server which is getting Audio data in a format that is not compatible with Sherpa Onnx as it is in PCMU format and Sherpa ONNX expects PCM format
For this conversion I implement the following code to preprocess the data and convert it into PCM format in my TypeScript server:

const lsProcess = ffmpeg_child_process.spawn('ffmpeg',['-i', 'input.wav', '-f', 's16le', '-acodec', 'pcm_s16le','-ac','1','-ar','16000','new_try.wav','-y'],);

I do not have new_file.wav but I can share its configuration according to FFmpeg logs:

nput #0, wav, from 'e5ee3a4f-d250-4901-b311-a12ea85216cd.wav':
Duration: 00:00:04.52, bitrate: 64 kb/s
Stream #0:0: Audio: pcm_mulaw ([7][0][0][0] / 0x0007), 8000 Hz, 1 channels, s16, 64 kb/s
File 'new_try.wav' already exists. Overwrite? [y/N] y
Stream mapping:
Stream #0:0 -> #0:0 (pcm_mulaw (native) -> pcm_s16le (native))
Press [q] to stop, [?] for help
Output #0, s16le, to 'new_try.wav':
Metadata:
encoder : Lavf60.16.100
Stream #0:0: Audio: pcm_s16le, 16000 Hz, mono, s16, 256 kb/s
Metadata:
encoder : Lavc60.31.102 pcm_s16le
[out#0/s16le @ 0x7291500] video:0kB audio:141kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: 0.000000%
size= 141kB time=00:00:04.52 bitrate= 256.1kbits/s speed=1.64e+03x
has context menu

Its raw audio input format is pcm_mulaw according to FFmpeg logs and I followed the ffmpeg code which you had in the speech_recognition_from_url.py file to convert it to PCM s16le

const lsProcess = ffmpeg_child_process.spawn('ffmpeg',['-i', 'input.wav', '-f', 's16le', '-acodec', 'pcm_s16le','-ac','1','-ar','16000','new_try.wav','-y'],);

here input.wav is the raw audio coming from the Typescript server in PCMU format, and new_try.wav is the converted audio in PCM format. I wanted to test if it is readable my Sherpa_ONNX hence I used online_client_websocket_decode_files.py to test if the converted file is compatible, but that is when I get the error:

raise Error('file does not start with RIFF id')
wave.Error: file does not start with RIFF id

My end goal is to implement streaming transformation of audio bytes similar to what you had done in the speech_recognition_from_url.py file, but I am testing it on wav file first before trying to implement ffmpeg for streaming transformation of data to sent to streaming_server.

from sherpa-onnx.

csukuangfj commented on July 19, 2024

I do not have new_file.wav

wanted to test if it is readable my Sherpa_ONNX hence I used online_client_websocket_decode_files.py to test if the converted file is compatible

If you don't have new_file.wav, could you tell us what is the input file to online_client_websocket_decode_files.py?

Please post the input file.

from sherpa-onnx.

tempops commented on July 19, 2024

The input to online_client_websocket_decode_files.py is new_file.wav, how can I post a wav file here, I can only post only screenshots here.

I can share a link of the files which are in my public repo:

This is the original audio stream file which is in PCMU format:

https://github.com/tempops/audiofile/blob/6b3e412c9198132e94c724e483ea8dc4d8ced313/e5ee3a4f-d250-4901-b311-a12ea85216cd.wav

This is the file I get after converting using FFmpeg command (const lsProcess = ffmpeg_child_process.spawn('ffmpeg',['-i', 'input.wav', '-f', 's16le', '-acodec', 'pcm_s16le','-ac','1','-ar','16000','new_try.wav','-y'],);)
which I sent to online_client_websocket_decode_files.py to test if it works with Sherpa_ONNX:

https://github.com/tempops/audiofile/blob/3549f746d784cf3317e3e36dff965193ca3f43b8/new_try%20(1).wav

Also I wanted to know, what from np.frombuffer(data, dtype=np.int16) do over here. I am writing what I understand, correct me if I am wrong:

data = process.stdout.read(frames_per_read * 2) < - Reads audio data bytes as output from FFmpeg in 0.1 second chunks
if not data:
break

samples = np.frombuffer(data, dtype=np.int16) <- What exactly does this do?
samples = samples.astype(np.float32) / 32768 <- Converts data to range [-1,1]
stream.accept_waveform(16000, samples) <- Sends data to stream object

from sherpa-onnx.

Need help understanding FFmpeg implementation in speech-recognition-from-url for livestream decode of audio stream about sherpa-onnx HOT 6 OPEN

Comments (6)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs