GithubHelp home page GithubHelp logo

Comments (7)

yehiaabdelm avatar yehiaabdelm commented on August 20, 2024 2

Yes check this repo https://github.com/yehiaabdelm/transcription-app

from whisperlive.

makaveli10 avatar makaveli10 commented on August 20, 2024 1

Looks like you are sending UInt8Array ? The server expects the data to be in Float32. I would try to send Float32 format data first.

from whisperlive.

yehiaabdelm avatar yehiaabdelm commented on August 20, 2024 1

I switched from MediaRecorder to AudioContext and used the same code in the Chrome extension and it works perfectly. My only issue is that some of the methods from AudioContext are deprecated, so will have to move away from it eventually. Will close this issue and update when I switch from AudioContext.

from whisperlive.

makaveli10 avatar makaveli10 commented on August 20, 2024

@yehiaabdelm the code on the client side looks alright. This is not something we have tested, we would mostly test WhisperLive in a real-time scenario i.e. sending out small chunks.

Can you share the logs from the server side as well?

from whisperlive.

yehiaabdelm avatar yehiaabdelm commented on August 20, 2024

I decided to switch to real-time and used your chrome extension code as a blue print. However, I'm having some issues with the integration. I'm using svelte. In the ondataavailable event for mediaRecorder I convert the buffer to be divisible by 4 before passing it to resampleTo16kHZ. I then send the output to the websocket. It seems to be sending the bytes in chunks (shown below), but I'm not getting a response in the end and the connection cuts before I stop the mediaRecorder.

<script>
  import { onMount } from 'svelte';
  import { v4 } from 'uuid';

  let websocketUrl = ''
  let audioChunks = [] // Store audio data chunks
  let mediaRecorder = null
  let uid = null
  let audioElement = null
  let socket

  /**
   * Resamples the audio data to a target sample rate of 16kHz.
   * @param {Array|ArrayBuffer|TypedArray} audioData - The input audio data.
   * @param {number} [origSampleRate=44100] - The original sample rate of the audio data.
   * @returns {Float32Array} The resampled audio data at 16kHz.
   */
   function resampleTo16kHZ(audioData, origSampleRate = 44100) {
    // Convert the audio data to a Float32Array
    const data = new Float32Array(audioData);

    // Calculate the desired length of the resampled data
    const targetLength = Math.round(data.length * (16000 / origSampleRate));

    // Create a new Float32Array for the resampled data
    const resampledData = new Float32Array(targetLength);

    // Calculate the spring factor and initialize the first and last values
    const springFactor = (data.length - 1) / (targetLength - 1);
    resampledData[0] = data[0];
    resampledData[targetLength - 1] = data[data.length - 1];

    // Resample the audio data
    for (let i = 1; i < targetLength - 1; i++) {
      const index = i * springFactor;
      const leftIndex = Math.floor(index).toFixed();
      const rightIndex = Math.ceil(index).toFixed();
      const fraction = index - leftIndex;
      resampledData[i] = data[leftIndex] + (data[rightIndex] - data[leftIndex]) * fraction;
    }

      // Return the resampled data
      return resampledData;
  }

  async function startRecording() {
    const stream = await navigator.mediaDevices.getUserMedia({ audio: true }); // audio stream
    const origSampleRate = stream.getAudioTracks()[0].getSettings().sampleRate;
    console.log('Sample rate: ', origSampleRate)

    mediaRecorder = new MediaRecorder(stream);
    mediaRecorder.start(1000) // fire the dataavailable event every 1s

    uid = v4() // generate a unique id for this recording

    if (mediaRecorder.state === 'recording'){
      console.log('Media recorder', mediaRecorder)
      socket = new WebSocket(websocketUrl) // create a websocket connection
      let isServerReady = false

      socket.onopen = (event) => { // when the connection is open send the handshake
        socket.send(
          JSON.stringify({
            uid: uid,
            multilingual: false,
            language: 'en',
            task: 'transcribe'
          })
        )
      }

      socket.onmessage = async (event) => {

        const data = JSON.parse(event.data)

        if (data.uid !== uid) return // ignore messages that are not for this recording
        
        if (data?.message && data?.message === 'SERVER_READY' ) {
          console.log('Server ready')
          isServerReady = true
          return
        }

        if (data.message === 'DISCONNECTED') {
          console.log('Server disconnected')
          socket.close()
          return
        }

        console.log(event.data)
      }
      

      mediaRecorder.ondataavailable = (event) => {

        console.log('Data available', event.data)
        if (!isServerReady) return; // if the server is not ready, don't send any data

        const audioBlob = event.data;
        audioChunks.push(event.data);

        audioBlob.arrayBuffer().then((buffer) => {
          console.log('Original buffer length', buffer.byteLength);

          // Calculate the length for the new ArrayBuffer
          const newBufferLength = Math.floor(buffer.byteLength / 4) * 4;
          console.log('New buffer length', newBufferLength);

          // Create a new ArrayBuffer and a new Uint8Array to work with the bytes
          const newBuffer = new ArrayBuffer(newBufferLength);
          const newUint8Array = new Uint8Array(newBuffer);

          // Create a Uint8Array from the original buffer
          const originalUint8Array = new Uint8Array(buffer);

          // Copy the bytes from the original Uint8Array to the new Uint8Array
          for (let i = 0; i < newBufferLength; i++) {
            newUint8Array[i] = originalUint8Array[i];
          }

          // Create the Float32Array from the new buffer
          const audioData16kHz = resampleTo16kHZ(newBuffer, origSampleRate);
          socket.send(audioData16kHz);
        })
        

      };



      mediaRecorder.onstop = () => {
        const audioBlob = new Blob(audioChunks, { 'type': 'audio/ogg; codecs=opus' });
        audioChunks = [];

        audioElement.src = URL.createObjectURL(audioBlob)
        console.log('Stopped recording')
        console.log('Audio blob: ', audioBlob)
        
        if (socket){
          audioChunks = []
          socket.close()
        }
      };


    }
    else {
      if (socket) {
        socket.close()
        audioChunks = []
      }
      return
    }

  }

  function stopRecording() {
    mediaRecorder.stop()
    mediaRecorder = null
  }

  onMount (() => {
    audioElement = document.querySelector('audio')
  })

</script>

<div>
  {#if mediaRecorder && mediaRecorder.state === 'recording'}
    <p>Recording...</p>
  {/if}
  <button on:click={startRecording}>Start</button>
  <button on:click={stopRecording}>Stop</button>
  <audio controls></audio>
</div>

Server logs

2023-08-19T09:07:43.540707332Z ERROR:root:received 1005 (no status code [internal]); then sent 1005 (no status code [internal])
2023-08-19T09:12:30.597431339Z ERROR:root:received 1005 (no status code [internal]); then sent 1005 (no status code [internal])
2023-08-19T09:17:58.574169906Z ERROR:root:received 1005 (no status code [internal]); then sent 1005 (no status code [internal])
2023-08-19T09:19:51.553874703Z ERROR:root:received 1005 (no status code [internal]); then sent 1005 (no status code [internal])
2023-08-19T09:49:23.825580464Z ERROR:root:received 1005 (no status code [internal]); then sent 1005 (no status code [internal])
2023-08-19T09:51:39.765192851Z Could not load library libcudnn_ops_infer.so.8. Error: libcudnn_ops_infer.so.8: cannot open shared object file: No such file or directory
2023-08-19T09:51:39.765227056Z Please make sure libcudnn_ops_infer.so.8 is in your library path!
2023-08-19T09:51:47.540743536Z 2023-08-19 09:51:47.540581981 [W:onnxruntime:, graph.cc:3543 CleanUnusedInitializersAndNodeArgs] Removing initializer '131'. It is not used by any node and should be removed from the model.
2023-08-19T09:51:47.540789684Z 2023-08-19 09:51:47.540605575 [W:onnxruntime:, graph.cc:3543 CleanUnusedInitializersAndNodeArgs] Removing initializer '136'. It is not used by any node and should be removed from the model.
2023-08-19T09:51:47.540792569Z 2023-08-19 09:51:47.540610825 [W:onnxruntime:, graph.cc:3543 CleanUnusedInitializersAndNodeArgs] Removing initializer '139'. It is not used by any node and should be removed from the model.
2023-08-19T09:51:47.540794843Z 2023-08-19 09:51:47.540615544 [W:onnxruntime:, graph.cc:3543 CleanUnusedInitializersAndNodeArgs] Removing initializer '140'. It is not used by any node and should be removed from the model.
2023-08-19T09:51:47.540796958Z 2023-08-19 09:51:47.540620013 [W:onnxruntime:, graph.cc:3543 CleanUnusedInitializersAndNodeArgs] Removing initializer '134'. It is not used by any node and should be removed from the model.
2023-08-19T09:51:47.540810042Z 2023-08-19 09:51:47.540678253 [W:onnxruntime:, graph.cc:3543 CleanUnusedInitializersAndNodeArgs] Removing initializer '628'. It is not used by any node and should be removed from the model.
2023-08-19T09:51:47.540812377Z 2023-08-19 09:51:47.540691538 [W:onnxruntime:, graph.cc:3543 CleanUnusedInitializersAndNodeArgs] Removing initializer '623'. It is not used by any node and should be removed from the model.
2023-08-19T09:51:47.540814661Z 2023-08-19 09:51:47.540698562 [W:onnxruntime:, graph.cc:3543 CleanUnusedInitializersAndNodeArgs] Removing initializer '629'. It is not used by any node and should be removed from the model.
2023-08-19T09:51:47.540816765Z 2023-08-19 09:51:47.540706407 [W:onnxruntime:, graph.cc:3543 CleanUnusedInitializersAndNodeArgs] Removing initializer '620'. It is not used by any node and should be removed from the model.
2023-08-19T09:51:47.540818819Z 2023-08-19 09:51:47.540714041 [W:onnxruntime:, graph.cc:3543 CleanUnusedInitializersAndNodeArgs] Removing initializer '625'. It is not used by any node and should be removed from the model.
2023-08-19T09:55:53.779634510Z Could not load library libcudnn_ops_infer.so.8. Error: libcudnn_ops_infer.so.8: cannot open shared object file: No such file or directory
2023-08-19T09:55:53.779667623Z Please make sure libcudnn_ops_infer.so.8 is in your library path!
2023-08-19T09:56:09.207461798Z 2023-08-19 09:56:09.207269935 [W:onnxruntime:, graph.cc:3543 CleanUnusedInitializersAndNodeArgs] Removing initializer '131'. It is not used by any node and should be removed from the model.
2023-08-19T09:56:09.207502626Z 2023-08-19 09:56:09.207294722 [W:onnxruntime:, graph.cc:3543 CleanUnusedInitializersAndNodeArgs] Removing initializer '136'. It is not used by any node and should be removed from the model.
2023-08-19T09:56:09.207507345Z 2023-08-19 09:56:09.207299270 [W:onnxruntime:, graph.cc:3543 CleanUnusedInitializersAndNodeArgs] Removing initializer '139'. It is not used by any node and should be removed from the model.
2023-08-19T09:56:09.207509409Z 2023-08-19 09:56:09.207303759 [W:onnxruntime:, graph.cc:3543 CleanUnusedInitializersAndNodeArgs] Removing initializer '140'. It is not used by any node and should be removed from the model.
2023-08-19T09:56:09.207511733Z 2023-08-19 09:56:09.207308237 [W:onnxruntime:, graph.cc:3543 CleanUnusedInitializersAndNodeArgs] Removing initializer '134'. It is not used by any node and should be removed from the model.
2023-08-19T09:56:09.207513597Z 2023-08-19 09:56:09.207366608 [W:onnxruntime:, graph.cc:3543 CleanUnusedInitializersAndNodeArgs] Removing initializer '628'. It is not used by any node and should be removed from the model.
2023-08-19T09:56:09.207515560Z 2023-08-19 09:56:09.207381135 [W:onnxruntime:, graph.cc:3543 CleanUnusedInitializersAndNodeArgs] Removing initializer '623'. It is not used by any node and should be removed from the model.
2023-08-19T09:56:09.207517564Z 2023-08-19 09:56:09.207387417 [W:onnxruntime:, graph.cc:3543 CleanUnusedInitializersAndNodeArgs] Removing initializer '629'. It is not used by any node and should be removed from the model.
2023-08-19T09:56:09.207519708Z 2023-08-19 09:56:09.207393168 [W:onnxruntime:, graph.cc:3543 CleanUnusedInitializersAndNodeArgs] Removing initializer '620'. It is not used by any node and should be removed from the model.
2023-08-19T09:56:09.207521742Z 2023-08-19 09:56:09.207399180 [W:onnxruntime:, graph.cc:3543 CleanUnusedInitializersAndNodeArgs] Removing initializer '625'. It is not used by any node and should be removed from the model.
image

from whisperlive.

yehiaabdelm avatar yehiaabdelm commented on August 20, 2024

Update: Switched to the gpu docker file. I'm now getting responses to the chunks of audio I'm sending after the handshake, however, the segments array is empty. Not sure if this is an issue with the type of data I'm sending.

Server logs

Downloading model.bin:  28%|██▊       | 136M/484M [00:01<00:03, 105MB/s]
Downloading model.bin:  33%|███▎      | 157M/484M [00:01<00:03, 106MB/s]
Downloading model.bin:  37%|███▋      | 178M/484M [00:01<00:02, 107MB/s]
Downloading model.bin:  41%|████      | 199M/484M [00:01<00:02, 107MB/s]
Downloading model.bin:  46%|████▌     | 220M/484M [00:02<00:02, 107MB/s]
Downloading model.bin:  50%|████▉     | 241M/484M [00:02<00:02, 108MB/s]
Downloading model.bin:  54%|█████▍    | 262M/484M [00:02<00:02, 108MB/s]
Downloading model.bin:  59%|█████▊    | 283M/484M [00:02<00:01, 108MB/s]
Downloading model.bin:  63%|██████▎   | 304M/484M [00:02<00:01, 108MB/s]
Downloading model.bin:  67%|██████▋   | 325M/484M [00:03<00:01, 108MB/s]
Downloading model.bin:  72%|███████▏  | 346M/484M [00:03<00:01, 107MB/s]
Downloading model.bin:  76%|███████▌  | 367M/484M [00:03<00:01, 108MB/s]
Downloading model.bin:  80%|████████  | 388M/484M [00:03<00:00, 107MB/s]
Downloading model.bin:  85%|████████▍ | 409M/484M [00:03<00:00, 107MB/s]
Downloading model.bin:  89%|████████▉ | 430M/484M [00:04<00:00, 107MB/s]
Downloading model.bin:  93%|█████████▎| 451M/484M [00:04<00:00, 108MB/s]
Downloading model.bin:  98%|█████████▊| 472M/484M [00:04<00:00, 108MB/s]
Downloading model.bin: 100%|██████████| 484M/484M [00:04<00:00, 109MB/s]
Downloading model.bin: 100%|██████████| 484M/484M [00:04<00:00, 106MB/s]
2023-08-20T21:05:13.307155231Z /usr/local/lib/python3.8/dist-packages/faster_whisper/feature_extractor.py:139: RuntimeWarning: invalid value encountered in cast
2023-08-20T21:05:13.307159339Z   np.multiply(frame, window, out=fft_signal[:frame_size])
2023-08-20T21:05:13.307162254Z /usr/local/lib/python3.8/dist-packages/faster_whisper/feature_extractor.py:142: RuntimeWarning: overflow encountered in cast
2023-08-20T21:05:13.307165110Z   data[f] = np.fft.fft(fft_signal, axis=0)[:num_fft_bins]
2023-08-20T21:05:13.307167985Z /usr/local/lib/python3.8/dist-packages/faster_whisper/feature_extractor.py:157: RuntimeWarning: overflow encountered in absolute
2023-08-20T21:05:13.307170680Z   magnitudes = np.abs(stft[:, :-1]) ** 2
2023-08-20T21:05:13.307173335Z /usr/local/lib/python3.8/dist-packages/faster_whisper/feature_extractor.py:157: RuntimeWarning: overflow encountered in square
2023-08-20T21:05:13.307175950Z   magnitudes = np.abs(stft[:, :-1]) ** 2
2023-08-20T21:05:13.307178555Z /usr/local/lib/python3.8/dist-packages/faster_whisper/feature_extractor.py:160: RuntimeWarning: invalid value encountered in matmul
2023-08-20T21:05:13.307181200Z   mel_spec = filters @ magnitudes
2023-08-20T21:05:13.307183765Z ERROR:root:received 1005 (no status code [internal]); then sent 1005 (no status code [internal])
2023-08-20T21:05:13.439656557Z ERROR:root:[ERROR]: 'WhisperModel' object has no attribute 'model'
2023-08-20T21:07:53.818712943Z ERROR:root:received 1005 (no status code [internal]); then sent 1005 (no status code [internal])
2023-08-20T21:07:54.320688518Z ERROR:root:[ERROR]: received 1005 (no status code [internal]); then sent 1005 (no status code [internal])

Web socket logs
image

The responses I'm getting look like this
image

from whisperlive.

renambot avatar renambot commented on August 20, 2024

I switched from MediaRecorder to AudioContext and used the same code in the Chrome extension and it works perfectly. My only issue is that some of the methods from AudioContext are deprecated, so will have to move away from it eventually. Will close this issue and update when I switch from AudioContext.

Could you share a working example ?
thanks

from whisperlive.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.