GithubHelp home page GithubHelp logo

Comments (15)

tadly avatar tadly commented on September 14, 2024 1

I see. This means you have some samples inside "test" that might not exist in any other form in ww.

This is all been quite help-full. Thanks a lot.
I am writing some form of documentation/howto while working on this and hope to share it that with the community once it is in a good enough state.
I'm trying to say, your time has not been wasted (I hope) :)

from localcroft.

el-tocino avatar el-tocino commented on September 14, 2024

Speed is more related to hardware resources. What are you running on, and how loaded is it?

from localcroft.

tadly avatar tadly commented on September 14, 2024

For testing actually on my dev machine so that really shouldn't be a bottleneck (Intel i7-5600U)
And the system is basically idle otherwise too

Does this mean my suspicion that the engine requires the trailing 1 second before considering it a "activation" is wrong?

I should add that I'm new to ML so I'm still in a steep learning curve ':D

Edit:
Does your training data also contain a 1-2 second silence ending or does it stop right after speech stops?
Also, using your own model, would you say it activates as fast as f.e. google or alexa?

from localcroft.

el-tocino avatar el-tocino commented on September 14, 2024

Should only be listening to 1.5 seconds, I think, to activate. My data cuts off usually pretty quickly. I haven't used alexa in a while, but activation seems to be near-google-speed in my experience. I run on an i7-4770 with 8gb doing a bunch of things (mycroft/wiki/tts/stt) and it's not noticeably slow.

from localcroft.

tadly avatar tadly commented on September 14, 2024

Hm...
My wake-word is "Kiana".
No hey in front or anything. That makes quicker to say than "okay google" or "hey mycroft" etc.

So maybe that has an effect on the perceived speed as "hey mycroft" is much closer to a 1.5 cycle then "Kiana".

That said. After playing with the batch-size a bit I managed to get a model that at least activates (although val_acc could be much better) and it activates much faster about 50% of the time.
The other 50% it feels just like the other initial model.

I'll do some more testing tomorrow and probably compare against the official mycroft model in terms of speed.

Do you have a clue why a data-set with 1 sec. of silence would perform so much better when it comes to training then the ones without?
Seems kind of weird to me but would explain why the mycroft-precise wiki says you should have 1-2 sec at the end.

from localcroft.

el-tocino avatar el-tocino commented on September 14, 2024

Dunno. A lot depends on your data. Train with more data or more steps to improve val_acc. I have something like 300 wake word samples now, and about 4x that in not-wake-words (particularly in things that triggered false activations). A good chunk of the noises in PCD are from that as well. If you're not using at least 50 samples of wake word 3x that in not, you will probably want to add more. Also use wakeword saving to build more samples, particularly of the not-wake-word variety.

from localcroft.

tadly avatar tadly commented on September 14, 2024

Dammit, I knew I forgot to share important information...

I'm currently at:

  • wake_words=210 (only myself using different microphones and I'm adding new ones daily )
  • not_wake_words=1448 (the two downloads I shared in OP)

False activations im not yet worried about as I can fix those later on through the methods you outlined in your write-up.

Given my dataset I would expect val_acc to hit 1 all the time.
The fact that the set without the 1 second tail doesn't worries me (mostly because I don't understand why and I would really like to)

Changing the sensitivity to a higher value (e.g. 0.8 rather then 0.2) seems to improve activation speed which seems odd.
Was only crude testing though which I'll investigate further tomorrow.

from localcroft.

el-tocino avatar el-tocino commented on September 14, 2024

Hmm. Yeah, i was more concerned with accuracy than anything, speed never was an issue.

from localcroft.

tadly avatar tadly commented on September 14, 2024

So, I tried the "hey microft" model and damn that thing activates fast.

I really wish I knew how exactly this model was trained.
I read somewhere that it was trained using a sample-size of 90k (hope I remember that correctly) but this doesn't clarify if that's 90k "hey mycroft" or 90k of "hey mycroft" and not wake words. (an impressive number either way)

I don't know if the activation speed would improve the more data one adds or if they used different training-technics.

I have a lot more reading/learning to do it seems

from localcroft.

el-tocino avatar el-tocino commented on September 14, 2024

50k hey mycrofts was what I heard. There's a lot of other data, including nww's they have, but not all of it is good/usable?

from localcroft.

tadly avatar tadly commented on September 14, 2024

Interesting.

I'd have one more question if you'd be so kind.

From what I read, keras usually splits data into training and test data itself while precise doesn't do that.
Instead I declare test data through the test directory. I assume this is to ensure model-creation is reputable.

The question now. How do you handle test/not-wake-words?
Do you use psounds and/or other downloaded sound-packs or did you populate NWW all yourself?
The reason I'm asking is.
If I record stuff myself (be it a fan or whatever) which activates the model, I can create multiple recordings of the same source and put some of it in wake-word/not-wake-word and some in test/not-wake-word.
With downloads like psounds most of the recordings exist only once.
From my understanding you should not duplicate data between wake-word and test

from localcroft.

el-tocino avatar el-tocino commented on September 14, 2024

I randomly sample 10% and move it over. I use google voice commands, psounds, and a few thousand nww's I recorded/saved.

I end up running precise-test against the full wakeword dataset for fun to see where it's having issues as well. (I've run it against my nww's as well, which generally isn't as useful)

from localcroft.

el-tocino avatar el-tocino commented on September 14, 2024

I still model words for others on occasion, so any new or better info is always welcome. But what I've gleaned is also through a bunch of trial and lots of error, so better to share that so others can get where they need to go sooner.

from localcroft.

tadly avatar tadly commented on September 14, 2024

Oh damn that reminded me of one more question I wanted to ask.
In your write-up you say:

I have only recently started recording with noisy backgrounds. Will update if I get better info.

Any news on that?

p.s. with all the additional testing I've done so far my model still is far from the activation speed of what "hey mycroft" has.
I can only suspect that the more data you feed in, the quicker the model can "decide".
I'll probably do a test including googles data set again (I left that out because it's so specific in what i provides)

from localcroft.

el-tocino avatar el-tocino commented on September 14, 2024

Doesn't hurt as far as I can tell. It's mostly the captured wake words and such, they tend to be fairly noisy, and I haven't noticed a decrease in activations

from localcroft.

Related Issues (6)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.