Hi. Firstly, thank you for open-sourcing the model and the libraries. <p dir="auto

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

Streaming ASR model latency issue about espnet HOT 6 OPEN

DinnoKoluh commented on May 27, 2024

Streaming ASR model latency issue

from espnet.

Comments (6)

espnetUser commented on May 27, 2024

Hi @DinnoKoluh, the larger latency you see for the first chunk might come from the block processing done by the streaming Conformer encoder as it needs to fill the entire initial block with (downsampled) input frames before it can actually compute any encoder output.

from espnet.

DinnoKoluh commented on May 27, 2024

I understand, it's too bad that the time is fairly large.
Do you have any clue about the other issues mentioned?

from espnet.

espnetUser commented on May 27, 2024

My guess for the increase in latency is that for each update of the transcription I get the whole transcription back instead of just the update. I am not sure if that is the desired behaviour but it would seem to me that the more natural behaviour is to have transcription updates for the, maybe, last 5 words instead the whole transcript.

AFAIU, the decoding process will always return the best (full length) hypo (or n-best) for entire available encoded input sequence. The (label-synchrounous) beam search might pick a different path when more encoder input becomes available and thus changing words that appeared in earlier results.
Also, for very long audio the current decoder implementation will slow down noticeable because it keeps the whole "history" of encoded inputs. So for very long audio it is better to use VAD to split it into smaller segments before decoding.

I would also like to add, that changing the device from GPU to CPU or vice-versa doesn't have any effect on the latency which is odd, I would expect that switching to a GPU would decrease the latency by a lot.

I haven't run any inference on GPU so can't really comment on that.

from espnet.

DinnoKoluh commented on May 27, 2024

I understand but it is supposed to be a streaming model, so I am just mimicking the streaming by chunking a long audio file. But in practice I would expect to just have a stream of audio chunks of some fixed length and that stream can last for example $1$ hour. So, the inference should be done on just the incoming chunks or chunks that are near the incoming one as it may need context to update the transcript.

And doesn't the parameter is_false reset the history (buffer), which I already mentioned?

I should maybe ask the main question with an example: Is the ESPnet streaming model capable, on a live audio stream (let's say listening to a live news channel on YouTube), to produce a live transcript which lags behind the audio stream at most 500 ms (or some other fixed amount)?

from espnet.

espnetUser commented on May 27, 2024

I should maybe ask the main question with an example: Is the ESPnet streaming model capable, on a live audio stream (let's say listening to a live news channel on YouTube), to produce a live transcript which lags behind the audio stream at most 500 ms (or some other fixed amount)?

As is, it won't be able to work on a live audio stream from Youtube that runs for hours (as the decoder code keeps the entire history). You would need to implement some simple endpointing in combination with "is_false" parameter to cut the audio at appropriate times (pauses etc.) and reset the internal buffer. The delay you can control via encoder block_size, hop_size and look_ahead parameter in the model config, 500ms (except for initial phase) will be challenging but should be possible.

from espnet.

DinnoKoluh commented on May 27, 2024

Okay, thank you for the info.

from espnet.

Streaming ASR model latency issue about espnet HOT 6 OPEN

Comments (6)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs