Comments (15)
I see. This means you have some samples inside "test" that might not exist in any other form in ww.
This is all been quite help-full. Thanks a lot.
I am writing some form of documentation/howto while working on this and hope to share it that with the community once it is in a good enough state.
I'm trying to say, your time has not been wasted (I hope) :)
from localcroft.
Speed is more related to hardware resources. What are you running on, and how loaded is it?
from localcroft.
For testing actually on my dev machine so that really shouldn't be a bottleneck (Intel i7-5600U)
And the system is basically idle otherwise too
Does this mean my suspicion that the engine requires the trailing 1 second before considering it a "activation" is wrong?
I should add that I'm new to ML so I'm still in a steep learning curve ':D
Edit:
Does your training data also contain a 1-2 second silence ending or does it stop right after speech stops?
Also, using your own model, would you say it activates as fast as f.e. google or alexa?
from localcroft.
Should only be listening to 1.5 seconds, I think, to activate. My data cuts off usually pretty quickly. I haven't used alexa in a while, but activation seems to be near-google-speed in my experience. I run on an i7-4770 with 8gb doing a bunch of things (mycroft/wiki/tts/stt) and it's not noticeably slow.
from localcroft.
Hm...
My wake-word is "Kiana".
No hey in front or anything. That makes quicker to say than "okay google" or "hey mycroft" etc.
So maybe that has an effect on the perceived speed as "hey mycroft" is much closer to a 1.5 cycle then "Kiana".
That said. After playing with the batch-size a bit I managed to get a model that at least activates (although val_acc could be much better) and it activates much faster about 50% of the time.
The other 50% it feels just like the other initial model.
I'll do some more testing tomorrow and probably compare against the official mycroft model in terms of speed.
Do you have a clue why a data-set with 1 sec. of silence would perform so much better when it comes to training then the ones without?
Seems kind of weird to me but would explain why the mycroft-precise wiki says you should have 1-2 sec at the end.
from localcroft.
Dunno. A lot depends on your data. Train with more data or more steps to improve val_acc. I have something like 300 wake word samples now, and about 4x that in not-wake-words (particularly in things that triggered false activations). A good chunk of the noises in PCD are from that as well. If you're not using at least 50 samples of wake word 3x that in not, you will probably want to add more. Also use wakeword saving to build more samples, particularly of the not-wake-word variety.
from localcroft.
Dammit, I knew I forgot to share important information...
I'm currently at:
- wake_words=210 (only myself using different microphones and I'm adding new ones daily )
- not_wake_words=1448 (the two downloads I shared in OP)
False activations im not yet worried about as I can fix those later on through the methods you outlined in your write-up.
Given my dataset I would expect val_acc to hit 1 all the time.
The fact that the set without the 1 second tail doesn't worries me (mostly because I don't understand why and I would really like to)
Changing the sensitivity to a higher value (e.g. 0.8 rather then 0.2) seems to improve activation speed which seems odd.
Was only crude testing though which I'll investigate further tomorrow.
from localcroft.
Hmm. Yeah, i was more concerned with accuracy than anything, speed never was an issue.
from localcroft.
So, I tried the "hey microft" model and damn that thing activates fast.
I really wish I knew how exactly this model was trained.
I read somewhere that it was trained using a sample-size of 90k (hope I remember that correctly) but this doesn't clarify if that's 90k "hey mycroft" or 90k of "hey mycroft" and not wake words. (an impressive number either way)
I don't know if the activation speed would improve the more data one adds or if they used different training-technics.
I have a lot more reading/learning to do it seems
from localcroft.
50k hey mycrofts was what I heard. There's a lot of other data, including nww's they have, but not all of it is good/usable?
from localcroft.
Interesting.
I'd have one more question if you'd be so kind.
From what I read, keras usually splits data into training and test data itself while precise doesn't do that.
Instead I declare test data through the test directory. I assume this is to ensure model-creation is reputable.
The question now. How do you handle test/not-wake-words
?
Do you use psounds and/or other downloaded sound-packs or did you populate NWW all yourself?
The reason I'm asking is.
If I record stuff myself (be it a fan or whatever) which activates the model, I can create multiple recordings of the same source and put some of it in wake-word/not-wake-word
and some in test/not-wake-word
.
With downloads like psounds most of the recordings exist only once.
From my understanding you should not duplicate data between wake-word
and test
from localcroft.
I randomly sample 10% and move it over. I use google voice commands, psounds, and a few thousand nww's I recorded/saved.
I end up running precise-test against the full wakeword dataset for fun to see where it's having issues as well. (I've run it against my nww's as well, which generally isn't as useful)
from localcroft.
I still model words for others on occasion, so any new or better info is always welcome. But what I've gleaned is also through a bunch of trial and lots of error, so better to share that so others can get where they need to go sooner.
from localcroft.
Oh damn that reminded me of one more question I wanted to ask.
In your write-up you say:
I have only recently started recording with noisy backgrounds. Will update if I get better info.
Any news on that?
p.s. with all the additional testing I've done so far my model still is far from the activation speed of what "hey mycroft" has.
I can only suspect that the more data you feed in, the quicker the model can "decide".
I'll probably do a test including googles data set again (I left that out because it's so specific in what i provides)
from localcroft.
Doesn't hurt as far as I can tell. It's mostly the captured wake words and such, they tend to be fairly noisy, and I haven't noticed a decrease in activations
from localcroft.
Related Issues (6)
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from localcroft.