GithubHelp home page GithubHelp logo

Comments (3)

jianfch avatar jianfch commented on August 20, 2024

By "true timestamps", I'm assuming you mean accurate segment timestamps that matches the flow of how a human would time the dialogue. The start and end of each segment is dictated by the prediction of the model so that is not entire within our control. Lets say we force it to always end at a period or a specific word. Then that decoded ending timestamp is less likely to be accurate than what is produced by the current heuristics (that lets the model decide for itself when to end the segment). The "gapless" results is what the suppressing silence (or ignore silence) feature of stable-ts tries to reduce but it doesn't always work.

from stable-ts.

p4-k4 avatar p4-k4 commented on August 20, 2024

Yeah in that case, it's outside of our control at least for now WRT the start/end of each segment.

The inverse of this would be the measurement of anything other than speech, which would then give us the correct start/end times of segments although it would be a post-process at least for now.

Lets say we force it to always end at a period or a specific word. Then that decoded ending timestamp is less likely to be accurate than what is produced by the current heuristics (that lets the model decide for itself when to end the segment).

Correct, I just checked and it's totally not accurate or reliable this way. Currently, I'll be using speechbrain as a post process to get start/end of segments but again that's inaccurate too.

Ah well, we'll see soon how things develop with this. Cheers

from stable-ts.

jianfch avatar jianfch commented on August 20, 2024

ver 2.0.0 enables this now

from stable-ts.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.