GithubHelp home page GithubHelp logo

Comments (11)

justinlevi avatar justinlevi commented on August 20, 2024 1

Yeah, I don't think we need it in close_websocket.

from whisperlive.

makaveli10 avatar makaveli10 commented on August 20, 2024

Reproduced on my end. Let me know if #43 doesnt fix the issue

from whisperlive.

justinlevi avatar justinlevi commented on August 20, 2024

Unfortunately that didn't seem to fix the problem. See attached screen recording.

Getting a few more tokens back, but not the full transcript

And so, my fellow America, ask not what your country can do
for you, as...
whisper_live_720p.mov

from whisperlive.

makaveli10 avatar makaveli10 commented on August 20, 2024

I dont see this on my end when using a GPU, over the CPU yes.

I think your server is not able to compute fast enough to be real-time. Its something that the client needs to take care of i.e. wait sometime before closing the websocket connection so that it receives everything. If I run the server on the cpu I see the same issue.

After your audio ends, I have added a simple wait for 15 seconds before closing the websocket to consume the server responses. Feel free to change 15 to 20 if its still not showing the complete response.

from whisperlive.

justinlevi avatar justinlevi commented on August 20, 2024

@makaveli10 This makes total sense. Trying this out now. I am running on CPU for development as I want to experiment with what I can accomplish with the small and tiny models as they do seem to work well for most situations from what I'm seeing. If this is the case, I can see some very interesting potential applications I'd like to build.

from whisperlive.

justinlevi avatar justinlevi commented on August 20, 2024

Hmm, seems that there are still some cleanup and timeout issues when running on cpu. Quite often I'll get the following error output from the server

ERROR:root:[ERROR]: 'WhisperModel' object has no attribute 'model'
^CTraceback (most recent call last):
File "/Users/justinwinter/projects/whisperlive/./run_server.py", line 5, in
server.run("0.0.0.0")
File "/Users/justinwinter/projects/whisperlive/whisper_live/server.py", line 136, in run
server.serve_forever()
File "/opt/homebrew/Caskroom/miniconda/base/envs/whisper_live/lib/python3.9/site-packages/websockets/sync/server.py", line 226, in serve_forever
poller.select()
File "/opt/homebrew/Caskroom/miniconda/base/envs/whisper_live/lib/python3.9/selectors.py", line 562, in select
kev_list = self._selector.control(None, max_ev, timeout)
KeyboardInterrupt

from whisperlive.

justinlevi avatar justinlevi commented on August 20, 2024

If I add the following to the client play_file: method before the stream is closed, I can get the transcription to work every time now

            elapsed = time.time() - self.LAST_RESPONSE_RECIEVED
            while elapsed < self.DISCONNECT_IF_NO_RESPONSE_FOR:
                continue

from whisperlive.

makaveli10 avatar makaveli10 commented on August 20, 2024

@justinlevi I did that in this commit. So, we can merge this now?

from whisperlive.

justinlevi avatar justinlevi commented on August 20, 2024

@makaveli10 I think your commit only has that addition in the close_websocket method

def close_websocket(self):
        while time.time() - self.LAST_RESPONSE_RECIEVED < self.DISCONNECT_IF_NO_RESPONSE_FOR:
            continue

I think it also need to to be in play_file

    def play_file(self, filename):
        # read audio and create pyaudio stream
        self.wf = wave.open(filename, 'rb')
        self.stream = self.p.open(format=self.p.get_format_from_width(self.wf.getsampwidth()),
                channels=self.wf.getnchannels(),
                rate=self.wf.getframerate(),
                input=True,
                output=True,
                frames_per_buffer=self.CHUNK)
        try:
            while Client.RECORDING:
                data = self.wf.readframes(self.CHUNK)
                if data==b'': break

                audio_array = Client.bytes_to_float_array(data)
                self.send_packet_to_server(audio_array.tobytes())
                self.stream.write(data)

            self.wf.close()

            elapsed = time.time() - self.LAST_RESPONSE_RECIEVED
            while elapsed < self.DISCONNECT_IF_NO_RESPONSE_FOR:
                continue

            self.stream.close()
            ```

from whisperlive.

justinlevi avatar justinlevi commented on August 20, 2024

Otherwise the stream is closed right when the file is done playing which we don't want if we're waiting for the server to send back transcription text.

d9f315c

from whisperlive.

makaveli10 avatar makaveli10 commented on August 20, 2024

@justinlevi In that case, could you test if checking the elapsed_time once in play_file solves the issue. Wondering if we really need to check again in close_websocket?

from whisperlive.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.