GithubHelp home page GithubHelp logo

Comments (4)

Deamon12 avatar Deamon12 commented on May 28, 2024

Damn, I think I got it working.... I had to reset the frame_generator offset to keep it chugging along for each frame.
Still not sure its in a great place, but your lib is detecting speech and chunking off files still. I am hearing some distortion in the recordings however.

    def frame_generator(self, frame_duration_ms, audio, sample_rate):
        """Generates audio frames from PCM audio data.

        Takes the desired frame duration in milliseconds, the PCM data, and
        the sample rate.

        Yields Frames of the requested duration.
        """
        print("len(audio): " + str(len(audio)))
        channels = 2
        n = int(sample_rate * (frame_duration_ms / 1000.0) * channels)
        print("n : " + str(n))
        offset = 0
        timestamp = 0.0
        duration = (float(n) / sample_rate) / channels

        for audioFrame in audio:
            offset = 0  # reset for every frame
            if audioFrame is not None:
                while offset + n < len(audioFrame):
                    yield Frame(audioFrame[offset:offset + n], timestamp, duration)
                    # print("Frame created")
                    timestamp += duration
                    offset += n

from py-webrtcvad.

Deamon12 avatar Deamon12 commented on May 28, 2024

Eh, nah. I need help with this.
I now have gstreamer pulling audio frames in from an alsasrc. This is working, until I get to passing audio frames to the VAD detection. Blows up after frame_generator.

Errors when attempting to evaluate the frames:
File "SoundDetector.py", line 351, in vad_collector
is_speech = vad.is_speech(frame.bytes, sample_rate)
File "python3.8/site-packages/webrtcvad.py", line 27, in is_speech
return _webrtcvad.process(self._vad, sample_rate, buf, length)

I am struggling with creating the 10, 20, 30 ms segments...

from py-webrtcvad.

Deamon12 avatar Deamon12 commented on May 28, 2024

I ended up getting it by setting the gstreamer alsasrc 'blocksize' to match pyAudio size. The buffer data was not adequate for the VAD parsing.

works now tho

from py-webrtcvad.

alexissinglaire avatar alexissinglaire commented on May 28, 2024

I ended up getting it by setting the gstreamer alsasrc 'blocksize' to match pyAudio size. The buffer data was not adequate for the VAD parsing.

works now tho

Hi,

Can i have sample of your code that reading the mic input realtime and output the result into chunks of files?

thanks in advance

from py-webrtcvad.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.