GithubHelp home page GithubHelp logo

dwks / silvius Goto Github PK

View Code? Open in Web Editor NEW
100.0 100.0 28.0 79 KB

Kaldi-based speech recognition system + grammar

Home Page: http://voxhub.io/silvius

License: BSD 2-Clause "Simplified" License

Python 96.79% Shell 2.69% Batchfile 0.52%

silvius's People

Contributors

codingbyvoice avatar dwks avatar etherealvisage avatar jasonleekennedy avatar jvanloov avatar raabsm avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

silvius's Issues

How to scratch more than 9 characters?

I can say scratch <any digit> to repeatly hit backspace (with the exception of number 2, see #20). However, I can't say scratch three zero or scratch number three number zero to repeat 30 times.

Adding wake/sleep word

I think the sleep word can simply be "sleep". Maybe the wake word can be "Hey wiretap"

Keywords forbidden inside "sentence" and "phrase" commands

to: Saying phrase or sentence followed by any digit was added with #12, with the exception of two (which is usually recognized as "to").

Also forbidden are number, late, rate, left, right, etc.

Expected functionality: phrase and sentence mode should input the raw words without parsing, and if you want to insert a symbol you wait for the phrase to be parsed and insert it outside of phrase mode.

How to make macros?

How do you insert a chain of characters? I want to make a few macros. For example, if I'd like go word left to input Ctrl+left, select word left to input Ctrl+Shift+left, and delete word left to input Ctrl+Shift+left delete.

adding grammar - new characters/commands

Hello,

first of all a wonderful project and presentation(found it on youtube). I myself have a problem with writing due to my neck injury. I find this project very helpful.

Can you please post an example how I can extend the grammar?

  • e.g. I can see single/double quotes are not working, can you help me fix that?
  • noticed one can not include numbers (number three - this is not working, would be nice)
  • is it possible to train the system to my voice? It has trouble differencing between right and rate in my case, not a native english speaker it is hard to pronounce those so that it recognizes them. Perhaps if you show how I can change the word for left bracket it would help :)
  • there is no curly braces command/word
  • the letter x can not be written, as xray is not recognised. x. ray is many times written instead (as seen when I run the script) - edit: I just found it is expert for x :)

Please add a bitcoin address to the page so I can tip you :)

Option to ignore invalid tokens

The recognition server has a bias to hear the words "the" and "and" at the beginning of an audio snippit, which spoils lots of commands. It would be nice to ignore all invalid tokens, or to manually create a rule that pops "the" and "and" from the buffer if it's at index 0.

Cross platform client

Linux is great but Silvius needs a cross-platform client and preferably architecture agnostic. We need to make this as easy as possible to experiment and not just for programmers. Easy experimentation leads to adoption, then innovation. Therefore we might start evaluating implementation of libraries that would work cross-platform most likely but not necessarily python.

I would say we need to discuss how the project is currently structure and therefore our approach about planning and implementation.

Available Hardware for testing purposes.
Microsoft 10 64-bit(Linux Ubuntu system installed), Linux (any flavor) amd64, oDroid c2(64bit ARMv8), Raspberry Pi 3(64bit ARMv7), Pine A64(ARMv8), oDroid xu4(ARMv7),
Emulated Mac OS (VirtualBox)

#8 could be merged

New speech model introduces [COUGH], and [UH] but won't parse these

Silvius. What a cool project. And so much work to make. Thank you David and team.

Learning Silvius has been a hobby since August 2018. Three months ago I broke two
metacarpals in a bicycle incident and my hobby became a lifeline to productivity.

One thing though. When I upgraded to the new Silvius speech model
[1], my user experience actually crashed.

What I expect:

I run Silvius. I get really good recognition. For example, if I say
'arch tango bravo' I see this:

: LISTENING TO MICROPHONE
: arch tango bravo
: > arch tango bravo
: [arch, tango, bravo, END]
: chain {
: char ['a']
: char ['t']
: char ['b']
: }
: /usr/bin/xdotool key a key t key b

What happens instead. After I upgraded to the new speech model [1],
now this happens.

I run Silvius. I get a stream of words interspersed with [COUGH] and
[UM] and very poor recognition. For example, this is just an attempt
to get 'tango bravo' recognized:

: [COUGH]
: > [COUGH]
: [ANY, END]
: Error: Unexpected token ANY' (word number 1) : tango : > tango : [tango, END] : chain { : char ['t'] : } : /usr/bin/xdotool key t : [UH] : > [UH] : [ANY, END] : Error: Unexpected tokenANY' (word number 1)
: [UM]
: > [UM]
: [ANY, END]
: Error: Unexpected token `ANY' (word number 1)

Has anyone else had this experience? Any clue as to what I'm missing?
It seems obvious to me at least that the python grammar is supposed to
ignore [UH] and [COUGH], but mine is trying to parse it as part of the
BNF. What happened in my obviously failed install?

thank you,

  • James

References

[1] dwk post of 29/11/2018 titled "new silvius speech model"
(https://groups.google.com/forum/#!topic/silvius/1NNmRNLVyC0 accessed
2020-01-23)

Saying a number in a sentence throws an error

If I try to say "phrase something something something one" I get this error:

[phrase, ANY, ANY, ANY, one, END]
Error: Unexpected token `one' (word number 5)

Works with every number I tried:

> phrase five
[phrase, five, END]
Error: Unexpected token `five' (word number 2)

Improving recognition accuracy

Hey, thanks for making this amazing tool! I think it could work well for me but I'm running into some issues with the speech recognition and I'm hoping to get some input on the best resolution.

Some of my commands are recognized first time but most require multiple repeats and some are never recognized regardless of how many times I repeat. I've tried both of the public services and the beta is definitely better but still not usable.

I think the issue could be my English accent or my microphone quality. I'm using a hyperx cloud silver gaming headset that I assumed would have a decent enough mic but maybe not. What do you think?

These are the mic specs:

* Element: Electret condenser microphone
* Polar pattern: Uni-directional, noise-cancelling
* Frequency response: 50Hz-18,000 Hz
* Sensitivity: -39dBV (0dB=1V/Pa,1kHz)

Allow modifying buffer

Right now, you have to wait after a period of silence before the buffer is parsed. It would be great to force the buffer to be parsed (suggested word: "slurp") or to discard the buffer entirely (suggested word: "spit"). Also, popping the last word added to the stack (with "oops" or "scratch").

removing a stream sound bites

sorry, if pieces of the result of streaming audio mic conducted by mic.py, layout frame her pieces where?

Best regards
thanks

Using two for a repeat is broken

For example, scratch two:

> scratch to
[scratch, to, END]
Error: Unexpected token `to' (word number 2)
> down to
[down, to, END]
Error: Unexpected token `to' (word number 2)

Ambient sounds recognized as input

Ambient noise (not human speech, but typing, clicking, rustling etc..) is being falsely recognized as words.
This happens every few seconds and in a silent bedroom.

I tried two mics with same results: the built in imac microphone, and a Rode XLR mic connected to external audio interface. I also tried lowering the line input level.

Is there any way to reduce/stop this?

(on Mac, Sierra)

thanks!

Modifier + direction causes Silvius client to crash

This makes it hard to navigate i3. Example:

control left 
> control left
[control, left, END]
 chain {
     mod_plus_key ['ctrl'] {
         movement [left]
     }
 }
Traceback (most recent call last):
  File "grammar/main.py", line 31, in <module>
    execute(ast, f == sys.stdin)
  File "/home/shit/bin/voice2code/silvius-crypdick/grammar/execute.py", line 103, in execute
    ExecuteCommands(ast, real)
  File "/home/shit/bin/voice2code/silvius-crypdick/grammar/execute.py", line 12, in __init__
    self.postorder_flat()
  File "/home/shit/bin/voice2code/silvius-crypdick/grammar/execute.py", line 26, in postorder_flat
    func(node)
  File "/home/shit/bin/voice2code/silvius-crypdick/grammar/execute.py", line 32, in n_chain
    self.postorder_flat(n)
  File "/home/shit/bin/voice2code/silvius-crypdick/grammar/execute.py", line 26, in postorder_flat
    func(node)
  File "/home/shit/bin/voice2code/silvius-crypdick/grammar/execute.py", line 38, in n_mod_plus_key
    self.automator.mod_plus_key(node.meta, node.children[0].meta[0])
  File "/home/shit/bin/voice2code/silvius-crypdick/grammar/execute.py", line 98, in mod_plus_key
    if(len(k) > 1 and k != 'plus' and k != 'apostrophe' and k != 'period' and k != 'minus'): k = k.capitalize()
AttributeError: Token instance has no attribute '__len__'
super left 
Exception in thread WebSocketClient:
Traceback (most recent call last):
  File "/usr/lib/python2.7/threading.py", line 801, in __bootstrap_inner
    self.run()
  File "/usr/lib/python2.7/threading.py", line 754, in run
    self.__target(*self.__args, **self.__kwargs)
  File "/home/shit/.local/lib/python2.7/site-packages/ws4py/websocket.py", line 528, in run
    if not self.once():
  File "/home/shit/.local/lib/python2.7/site-packages/ws4py/websocket.py", line 410, in once
    if not self.process(self.buf[:requested]):
  File "/home/shit/.local/lib/python2.7/site-packages/ws4py/websocket.py", line 480, in process
    self.received_message(s.message)
  File "stream/mic.py", line 113, in received_message
    sys.stdout.flush()
IOError: [Errno 32] Broken pipe

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.