Comments (6)
My first question is regarding this, does ffmpeg here convert the streaming data in real time to pcm_s16le format and does it need data to be saved in memory (on file) to do it? or does it work in real-time livestream without creating a file.
As you can read in the following code
The code also supports a live stream URL. So my reasoning is that it won't create a file. Otherwise, if the stream never ends, how large would be the created file ?
The processed data is pushed to a pipe in the RAM.
Can you explain what impact frames_per_read
If you want to read data from the pipe, you must decide how much to read. The more you read, the more time you have to wait until the requested amount of data is available.
samples.astype(np.float32) / 32768, here why we divide by 32768
The feature extractor expects that the input samples are in the range [-1, 1]
.
how this code block works:
- It reads some data from the pipe
- convert the data into the range [-1, 1]
- send the data to the stream object for processing
const lsProcess = ffmpeg_child_process.spawn('ffmpeg',['-i', 'input.wav', '-f', 's16le', '-acodec', 'pcm_s16le','-ac','1','-ar','16000','new_try.wav','-y'],);
Please send us the file new_try.wav
.
do I need to also implement the frames_per_read and samples.astype(np.float32) /32768?
Could you explain why you need this? You said you processed the new_try.wav
file with online-client-websocket-decode-files.py
, but that file does not need you to write any code. Could you tell us what extra steps you have done?
If you use files from us, you don't need to write any code. Everything has been done by us.
All you need to do is to provide the data.
If the data does not have the requested format, you would get an informative error message.
from sherpa-onnx.
Thank you for the response,
As per what you have said here let me explain the process which I am following:
Could you explain why you need this? You said you processed the new_try.wav file with online-client-websocket-decode-files.py, but that file does not need you to write any code. Could you tell us what extra steps you have done?
- I have a Typescript Server which is getting Audio data in a format that is not compatible with Sherpa Onnx as it is in PCMU format and Sherpa ONNX expects PCM format
- For this conversion I implement the following code to preprocess the data and convert it into PCM format in my TypeScript server:
const lsProcess = ffmpeg_child_process.spawn('ffmpeg',['-i', 'input.wav', '-f', 's16le', '-acodec', 'pcm_s16le','-ac','1','-ar','16000','new_try.wav','-y'],);
I do not have new_file.wav but I can share its configuration according to FFmpeg logs:
nput #0, wav, from 'e5ee3a4f-d250-4901-b311-a12ea85216cd.wav':
Duration: 00:00:04.52, bitrate: 64 kb/s
Stream #0:0: Audio: pcm_mulaw ([7][0][0][0] / 0x0007), 8000 Hz, 1 channels, s16, 64 kb/s
File 'new_try.wav' already exists. Overwrite? [y/N] y
Stream mapping:
Stream #0:0 -> #0:0 (pcm_mulaw (native) -> pcm_s16le (native))
Press [q] to stop, [?] for help
Output #0, s16le, to 'new_try.wav':
Metadata:
encoder : Lavf60.16.100
Stream #0:0: Audio: pcm_s16le, 16000 Hz, mono, s16, 256 kb/s
Metadata:
encoder : Lavc60.31.102 pcm_s16le
[out#0/s16le @ 0x7291500] video:0kB audio:141kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: 0.000000%
size= 141kB time=00:00:04.52 bitrate= 256.1kbits/s speed=1.64e+03x
has context menu
Its raw audio input format is pcm_mulaw according to FFmpeg logs and I followed the ffmpeg code which you had in the speech_recognition_from_url.py file to convert it to PCM s16le
const lsProcess = ffmpeg_child_process.spawn('ffmpeg',['-i', 'input.wav', '-f', 's16le', '-acodec', 'pcm_s16le','-ac','1','-ar','16000','new_try.wav','-y'],);
here input.wav is the raw audio coming from the Typescript server in PCMU format, and new_try.wav is the converted audio in PCM format. I wanted to test if it is readable my Sherpa_ONNX hence I used online_client_websocket_decode_files.py to test if the converted file is compatible, but that is when I get the error:
raise Error('file does not start with RIFF id')
wave.Error: file does not start with RIFF id
My end goal is to implement streaming transformation of audio bytes similar to what you had done in the speech_recognition_from_url.py file, but I am testing it on wav file first before trying to implement ffmpeg for streaming transformation of data to sent to streaming_server.
from sherpa-onnx.
I do not have new_file.wav
wanted to test if it is readable my Sherpa_ONNX hence I used online_client_websocket_decode_files.py to test if the converted file is compatible
If you don't have new_file.wav
, could you tell us what is the input file to online_client_websocket_decode_files.py
?
Please post the input file.
from sherpa-onnx.
The input to online_client_websocket_decode_files.py is new_file.wav, how can I post a wav file here, I can only post only screenshots here.
I can share a link of the files which are in my public repo:
This is the original audio stream file which is in PCMU format:
This is the file I get after converting using FFmpeg command (const lsProcess = ffmpeg_child_process.spawn('ffmpeg',['-i', 'input.wav', '-f', 's16le', '-acodec', 'pcm_s16le','-ac','1','-ar','16000','new_try.wav','-y'],);)
which I sent to online_client_websocket_decode_files.py to test if it works with Sherpa_ONNX:
https://github.com/tempops/audiofile/blob/3549f746d784cf3317e3e36dff965193ca3f43b8/new_try%20(1).wav
Also I wanted to know, what from np.frombuffer(data, dtype=np.int16) do over here. I am writing what I understand, correct me if I am wrong:
data = process.stdout.read(frames_per_read * 2) < - Reads audio data bytes as output from FFmpeg in 0.1 second chunks
if not data:
break
samples = np.frombuffer(data, dtype=np.int16) <- What exactly does this do?
samples = samples.astype(np.float32) / 32768 <- Converts data to range [-1,1]
stream.accept_waveform(16000, samples) <- Sends data to stream object
from sherpa-onnx.
Related Issues (20)
- transcription inconsistency in different runs HOT 3
- export 3d speaker campplus sv model to onnx error
- Whisper onnxruntime exception on Android HOT 27
- 设置keywords后,例如小新小新这样的识别就很高,如果是“学习管家”这样的就基本很难识别到,老哥指导下怎么优化? HOT 17
- Issue with Decoding Audio Stream from Genesys Client HOT 4
- tts piper中文模型能否增加支持多音字? HOT 1
- Is the latest ios-build script working? HOT 5
- DIvision by incorrect value HOT 1
- The command line tool named 'sherpa-onnx-cli' for converting hotwords to tokens was not found. HOT 1
- 安卓sherpa-onnx支持kw类型,iOS端不支持吗? HOT 2
- Convenience Request: Please Add Support for Using hotwords.txt in .NET Environment, Thank You! HOT 1
- Not able to run ios-swiftui sample project HOT 14
- C#版本:OnlineRecognizerResult字段返回过少 HOT 3
- Bytes input to FFmpeg in speech-recognition-from-url.py HOT 4
- [Bug]解决.net-api-examle中WaveReader.cs Invalid SubChunk1Size: {SubChunk1Size}. Expect 16
- bug: tts-models Symbols, spaces, line feeds will pause for five seconds. HOT 1
- Transcribe wav files with timestamps HOT 4
- Transcribe wav files with timestamps
- TTS engine crashes android settings HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from sherpa-onnx.