Using the JFK wave file here: <a href="https://github.com/ggerganov/whisper.cpp/blob/m

Reproduced on my end. Let me know if <a class="issue-link js-issue-link" data-error-te

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Bug? File Transcription never completes. about whisperlive HOT 11 CLOSED

collabora commented on August 20, 2024

Bug? File Transcription never completes.

from whisperlive.

Comments (11)

justinlevi commented on August 20, 2024 1

Yeah, I don't think we need it in close_websocket.

from whisperlive.

makaveli10 commented on August 20, 2024

Reproduced on my end. Let me know if #43 doesnt fix the issue

from whisperlive.

justinlevi commented on August 20, 2024

Unfortunately that didn't seem to fix the problem. See attached screen recording.

Getting a few more tokens back, but not the full transcript

And so, my fellow America, ask not what your country can do
for you, as...

whisper_live_720p.mov

from whisperlive.

makaveli10 commented on August 20, 2024

I dont see this on my end when using a GPU, over the CPU yes.

I think your server is not able to compute fast enough to be real-time. Its something that the client needs to take care of i.e. wait sometime before closing the websocket connection so that it receives everything. If I run the server on the cpu I see the same issue.

After your audio ends, I have added a simple wait for 15 seconds before closing the websocket to consume the server responses. Feel free to change 15 to 20 if its still not showing the complete response.

from whisperlive.

justinlevi commented on August 20, 2024

@makaveli10 This makes total sense. Trying this out now. I am running on CPU for development as I want to experiment with what I can accomplish with the small and tiny models as they do seem to work well for most situations from what I'm seeing. If this is the case, I can see some very interesting potential applications I'd like to build.

from whisperlive.

justinlevi commented on August 20, 2024

Hmm, seems that there are still some cleanup and timeout issues when running on cpu. Quite often I'll get the following error output from the server

ERROR:root:[ERROR]: 'WhisperModel' object has no attribute 'model'
^CTraceback (most recent call last):
File "/Users/justinwinter/projects/whisperlive/./run_server.py", line 5, in
server.run("0.0.0.0")
File "/Users/justinwinter/projects/whisperlive/whisper_live/server.py", line 136, in run
server.serve_forever()
File "/opt/homebrew/Caskroom/miniconda/base/envs/whisper_live/lib/python3.9/site-packages/websockets/sync/server.py", line 226, in serve_forever
poller.select()
File "/opt/homebrew/Caskroom/miniconda/base/envs/whisper_live/lib/python3.9/selectors.py", line 562, in select
kev_list = self._selector.control(None, max_ev, timeout)
KeyboardInterrupt

from whisperlive.

justinlevi commented on August 20, 2024

If I add the following to the client play_file: method before the stream is closed, I can get the transcription to work every time now

            elapsed = time.time() - self.LAST_RESPONSE_RECIEVED
            while elapsed < self.DISCONNECT_IF_NO_RESPONSE_FOR:
                continue

from whisperlive.

makaveli10 commented on August 20, 2024

@justinlevi I did that in this commit. So, we can merge this now?

from whisperlive.

justinlevi commented on August 20, 2024

@makaveli10 I think your commit only has that addition in the close_websocket method

def close_websocket(self):
        while time.time() - self.LAST_RESPONSE_RECIEVED < self.DISCONNECT_IF_NO_RESPONSE_FOR:
            continue

I think it also need to to be in play_file

    def play_file(self, filename):
        # read audio and create pyaudio stream
        self.wf = wave.open(filename, 'rb')
        self.stream = self.p.open(format=self.p.get_format_from_width(self.wf.getsampwidth()),
                channels=self.wf.getnchannels(),
                rate=self.wf.getframerate(),
                input=True,
                output=True,
                frames_per_buffer=self.CHUNK)
        try:
            while Client.RECORDING:
                data = self.wf.readframes(self.CHUNK)
                if data==b'': break

                audio_array = Client.bytes_to_float_array(data)
                self.send_packet_to_server(audio_array.tobytes())
                self.stream.write(data)

            self.wf.close()

            elapsed = time.time() - self.LAST_RESPONSE_RECIEVED
            while elapsed < self.DISCONNECT_IF_NO_RESPONSE_FOR:
                continue

            self.stream.close()
            ```

from whisperlive.

justinlevi commented on August 20, 2024

Otherwise the stream is closed right when the file is done playing which we don't want if we're waiting for the server to send back transcription text.

d9f315c

from whisperlive.

makaveli10 commented on August 20, 2024

@justinlevi In that case, could you test if checking the elapsed_time once in play_file solves the issue. Wondering if we really need to check again in close_websocket?

from whisperlive.

Bug? File Transcription never completes. about whisperlive HOT 11 CLOSED

Comments (11)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs