GithubHelp home page GithubHelp logo

Comments (6)

arunman1kandan avatar arunman1kandan commented on June 1, 2024 1

Hey @ENDERFUN2 ,
Assume you have a box full of Legos and want to construct a spacecraft. To be used, all of the Legos should be loose and in the box.However, occasionally, a smaller box within the larger box may contain the Legos. This extra box is merely there; it contains no Legos.The sounddevice.rec function in this code is analogous to obtaining a Lego box. There could be an additional, empty box included (the extra dimension).It's like opening the large box and removing the smaller, empty one when you use the squeeze feature. It takes out the superfluous box so that all you have to work with is the audio data, or Legos.This is significant because the WhisperModel.transcribe method, which you use to construct the spaceship, is limited to working with loose Legos and not with boxes inside boxes. Squeezing ensures that everything operates as intended and removes the excess box.

This is as how I understood the squeeze works.

from faster-whisper.

trungkienbkhn avatar trungkienbkhn commented on June 1, 2024

@ENDERFUN2 , hello. To handle silence in recorded audio, you can try using the vad_filter option.
To avoid saving audio to temp file, you should pass the audio data as numpy ndarray format to FW model. Below is my example:

import numpy as np
import sounddevice as sd

from faster_whisper import WhisperModel

print("Recording started")
duration = 10
sample_rate = 16000
audio_data = sd.rec(
    int(sample_rate * duration), samplerate=sample_rate, channels=1, dtype=np.float32
)
sd.wait()
audio_data = audio_data.squeeze()
print("Recording stopped")

model = WhisperModel("tiny", device="cpu")
segments, info = model.transcribe(audio_data, word_timestamps=True)
for segment in segments:
    print("[%.2fs -> %.2fs] %s" % (segment.start, segment.end, segment.text))

from faster-whisper.

arunman1kandan avatar arunman1kandan commented on June 1, 2024

Also along with it @ENDERFUN2 make sure you use 16000 as sampling rate and not any other as FW wouldn't support a sample rate other than 16000 as told by @trungkienbkhn.

from faster-whisper.

ENDERFUN2 avatar ENDERFUN2 commented on June 1, 2024

Okay, so vad_filter works perfectly. I don't know why I hadn't found it before...
Also, that sample code from @trungkienbkhn turned out to be a game changer. But, as a curious man, why is squeeze required?It's my first serious project in Python, 'cause all my previous were in Java or C++, therefore I don't really understand it. And why sample rate has to be set to 16000? When I pass the audio file with 48000 sample rate, it transcribes the audio 100% perfect. Would so grateful for explanaition

from faster-whisper.

trungkienbkhn avatar trungkienbkhn commented on June 1, 2024

@ENDERFUN2 , FYI, you can see this comment to better understand why should use sample_rate=16000. If I use sr=48000 in my example in here, obviously it doesn't work.

For why use squeeze(), sd.rec() func returns an array with shape (duration * sample_rate, 1) because it records mono audio, resulting in a 2D array with one of the dimensions having size 1. However, FW requires input as a 1D array. So we need use squeeze() to reformat.

from faster-whisper.

ENDERFUN2 avatar ENDERFUN2 commented on June 1, 2024

Well, it now makes a lot of sense. Thank you both @trungkienbkhn @arunman1kandan for your service, although abstracting it to Legos wasn't necessary, I just didn't understand ndarrays. Also, after some refactoring I noticed that my code is one big pile of garbage and I should reformat it asap. Your answers gave me an important insight

from faster-whisper.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.