GithubHelp home page GithubHelp logo

Need help understanding FFmpeg implementation in speech-recognition-from-url for livestream decode of audio stream about sherpa-onnx HOT 6 OPEN

tempops avatar tempops commented on July 19, 2024
Need help understanding FFmpeg implementation in speech-recognition-from-url for livestream decode of audio stream

from sherpa-onnx.

Comments (6)

csukuangfj avatar csukuangfj commented on July 19, 2024

My first question is regarding this, does ffmpeg here convert the streaming data in real time to pcm_s16le format and does it need data to be saved in memory (on file) to do it? or does it work in real-time livestream without creating a file.

As you can read in the following code

# (1) RTMP
# rtmp://localhost/live/livestream

The code also supports a live stream URL. So my reasoning is that it won't create a file. Otherwise, if the stream never ends, how large would be the created file ?

The processed data is pushed to a pipe in the RAM.


Can you explain what impact frames_per_read

If you want to read data from the pipe, you must decide how much to read. The more you read, the more time you have to wait until the requested amount of data is available.


samples.astype(np.float32) / 32768, here why we divide by 32768

The feature extractor expects that the input samples are in the range [-1, 1].


how this code block works:

  1. It reads some data from the pipe
  2. convert the data into the range [-1, 1]
  3. send the data to the stream object for processing

const lsProcess = ffmpeg_child_process.spawn('ffmpeg',['-i', 'input.wav', '-f', 's16le', '-acodec', 'pcm_s16le','-ac','1','-ar','16000','new_try.wav','-y'],);

Please send us the file new_try.wav.


do I need to also implement the frames_per_read and samples.astype(np.float32) /32768?

Could you explain why you need this? You said you processed the new_try.wav file with online-client-websocket-decode-files.py, but that file does not need you to write any code. Could you tell us what extra steps you have done?

If you use files from us, you don't need to write any code. Everything has been done by us.
All you need to do is to provide the data.

If the data does not have the requested format, you would get an informative error message.

from sherpa-onnx.

tempops avatar tempops commented on July 19, 2024

Thank you for the response,

As per what you have said here let me explain the process which I am following:

Could you explain why you need this? You said you processed the new_try.wav file with online-client-websocket-decode-files.py, but that file does not need you to write any code. Could you tell us what extra steps you have done?

  1. I have a Typescript Server which is getting Audio data in a format that is not compatible with Sherpa Onnx as it is in PCMU format and Sherpa ONNX expects PCM format
  2. For this conversion I implement the following code to preprocess the data and convert it into PCM format in my TypeScript server:

const lsProcess = ffmpeg_child_process.spawn('ffmpeg',['-i', 'input.wav', '-f', 's16le', '-acodec', 'pcm_s16le','-ac','1','-ar','16000','new_try.wav','-y'],);

I do not have new_file.wav but I can share its configuration according to FFmpeg logs:

nput #0, wav, from 'e5ee3a4f-d250-4901-b311-a12ea85216cd.wav':
Duration: 00:00:04.52, bitrate: 64 kb/s
Stream #0:0: Audio: pcm_mulaw ([7][0][0][0] / 0x0007), 8000 Hz, 1 channels, s16, 64 kb/s
File 'new_try.wav' already exists. Overwrite? [y/N] y
Stream mapping:
Stream #0:0 -> #0:0 (pcm_mulaw (native) -> pcm_s16le (native))
Press [q] to stop, [?] for help
Output #0, s16le, to 'new_try.wav':
Metadata:
encoder : Lavf60.16.100
Stream #0:0: Audio: pcm_s16le, 16000 Hz, mono, s16, 256 kb/s
Metadata:
encoder : Lavc60.31.102 pcm_s16le
[out#0/s16le @ 0x7291500] video:0kB audio:141kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: 0.000000%
size= 141kB time=00:00:04.52 bitrate= 256.1kbits/s speed=1.64e+03x
has context menu

Its raw audio input format is pcm_mulaw according to FFmpeg logs and I followed the ffmpeg code which you had in the speech_recognition_from_url.py file to convert it to PCM s16le

const lsProcess = ffmpeg_child_process.spawn('ffmpeg',['-i', 'input.wav', '-f', 's16le', '-acodec', 'pcm_s16le','-ac','1','-ar','16000','new_try.wav','-y'],);

here input.wav is the raw audio coming from the Typescript server in PCMU format, and new_try.wav is the converted audio in PCM format. I wanted to test if it is readable my Sherpa_ONNX hence I used online_client_websocket_decode_files.py to test if the converted file is compatible, but that is when I get the error:

raise Error('file does not start with RIFF id')
wave.Error: file does not start with RIFF id

My end goal is to implement streaming transformation of audio bytes similar to what you had done in the speech_recognition_from_url.py file, but I am testing it on wav file first before trying to implement ffmpeg for streaming transformation of data to sent to streaming_server.

from sherpa-onnx.

csukuangfj avatar csukuangfj commented on July 19, 2024

I do not have new_file.wav

wanted to test if it is readable my Sherpa_ONNX hence I used online_client_websocket_decode_files.py to test if the converted file is compatible

If you don't have new_file.wav, could you tell us what is the input file to online_client_websocket_decode_files.py?

Please post the input file.

from sherpa-onnx.

tempops avatar tempops commented on July 19, 2024

The input to online_client_websocket_decode_files.py is new_file.wav, how can I post a wav file here, I can only post only screenshots here.

I can share a link of the files which are in my public repo:

This is the original audio stream file which is in PCMU format:

https://github.com/tempops/audiofile/blob/6b3e412c9198132e94c724e483ea8dc4d8ced313/e5ee3a4f-d250-4901-b311-a12ea85216cd.wav

This is the file I get after converting using FFmpeg command (const lsProcess = ffmpeg_child_process.spawn('ffmpeg',['-i', 'input.wav', '-f', 's16le', '-acodec', 'pcm_s16le','-ac','1','-ar','16000','new_try.wav','-y'],);)
which I sent to online_client_websocket_decode_files.py to test if it works with Sherpa_ONNX:

https://github.com/tempops/audiofile/blob/3549f746d784cf3317e3e36dff965193ca3f43b8/new_try%20(1).wav

Also I wanted to know, what from np.frombuffer(data, dtype=np.int16) do over here. I am writing what I understand, correct me if I am wrong:

data = process.stdout.read(frames_per_read * 2) < - Reads audio data bytes as output from FFmpeg in 0.1 second chunks
if not data:
break

samples = np.frombuffer(data, dtype=np.int16) <- What exactly does this do?
samples = samples.astype(np.float32) / 32768 <- Converts data to range [-1,1]
stream.accept_waveform(16000, samples) <- Sends data to stream object

from sherpa-onnx.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.