dictation-toolbox / dragonfly Goto Github PK

Speech recognition framework allowing powerful Python-based scripting and extension of Dragon NaturallySpeaking (DNS), Windows Speech Recognition (WSR), Kaldi and CMU Pocket Sphinx

License: GNU Lesser General Public License v3.0

Python 100.00%

python speech-recognition

dragonfly's Introduction

dictation-toolbox

Meta-repository for tracking and providing information about the organization itself.

Summary

dictation-toolbox is a loose organization intended to organize projects for users of voice recognition software, with a focus on customizability, flexibility, and programming/system administration/etc. We hope to help fill in the gap between commercially available solutions for common workflows and other workflows which require customization.

We also intend to provide a contingency plan for projects to be "adopted" by new maintainers if their current maintainers leave without transferring control of the project. Each repository in dictation-toolbox has one or more owners responsible for it, and admins of dictation-toolbox agree not to use their admin privileges on repositories where they are not an owner, except in the case of all owners dropping out of all contact for four months.

We are open to including other projects under these terms to better serve the needs of the dictation community. Please open an issue in this repository if you are interested. Historically, we have made all owners of all repositories in dictation-toolbox admins of the organization.

Projects

The listed owners of each project, rather than the admins of dictation-toolbox, retain full control and responsibility for all aspects of their project.

aenea

Owners: @calmofthestorm

Client-server library for using voice macros from Dragon NaturallySpeaking and Dragonfly on remote/non-windows hosts using NatLink and Dragonfly.

aenea-grammars

Owners: @calmofthestorm

A collection of grammars written for use with Aenea, with an eye toward programming and use on Linux.

dragonfly-scripts

Owners: @nirvdrum

This repository contains a collection of Dragonfly Python-scripts, that can be used with Dragon NaturallySpeaking Professional.

aenea-resources

Owners: @calmofthestorm

Miscellaneous resources (cheatsheets, etc) for working with aenea and aenea-grammars.

dragonfly's People

Contributors

Stargazers

Watchers

dragonfly's Issues

Support Microsoft's UI Automation API on Windows

Microsoft's UI Automation API (UIA) allows interacting with system UI elements (e.g. task bar, menus, etc.) and programs running as administrator. Dragonfly's Key, Text and Mouse actions cannot currently do this.

A full end-to-end proof of concept working of the helper app for interacting with apps that are running as administrator in Windows. It doesn't work with the UAC prompt yet. Chilimangoes can contact as resource on Gitter.

Currently only Key action is implemented from Dragonfly, but Chilimangoes thinks the other actions will be quite a bit easier. He is going to cleanup the github repo, and document how it works. Dragonfly to (optionally) call out to the UI helper app. Chilimangoes has some sample Python code for doing this and hacked his Caster install to do so for testing, but it needs to be made robust and configurable.

Chilimangoes direct involvement with coding will be limited due to work obligations for the 2 to 3 months.

There are other libraries supporting UIA that could be used instead:

Pocket Sphinx free-dictation mode

The Pocket Sphinx engine has no free-dictation mode at the moment. If no key phrase or grammar rule is recognised, then "Sorry, what was that?" will be printed. Given how Pocket Sphinx grammar searches do not reject out-of-grammar words at the moment, I think it might be better to have a separate mode for when free-dictation using a language model is required.

The new mode could be enabled/disabled by key phrases or by calling a public engine method, similar to the wake/sleep commands and methods.

One problem with using Pocket Sphinx for this is that the speech hypothesis strings it returns are typically in lowercase. I'm not sure if there are projects out there for capitialising words appropriately. Punctuation words like 'period', 'comma' and 'apostrophe' could be replaced with their characters in the output.

Discuss moving Dragonfly into dictation-toolbox

Danesprite invited me to discuss this in dictation-toolbox/dictation-toolbox#1 (comment); this issue is to discuss whether it would make sense to move Dragonfly into dictation-toolbox. We're currently working with Caster in https://github.com/synkarius/caster/issues/321 to do the same.

dictation-toolbox is intended to be a lightweight organization for collecting projects related to voice coding and computer use, plus a contingency plan in the event that all maintainers of a project leave. Individual projects would still remain independent in terms of governance, community management, etc. See dictation-toolbox/dictation-toolbox#1 for the work on drafting this up, and please feel free to weigh in.

Backwards incompatible change to Text action is missing from changelog

In 0.10, the Text action seems to have been changed (in pull request #44 by @Versatilus ?) so that it no longer respects modifier keys being held down.

Although this is a backwards incompatible change, it seems worthwhile for easier typing of Unicode characters, especially since the original dragonfly documentation says

It differs from the Key action in that Text is used for typing literal text, while dragonfly.actions.action_key.Key emulates pressing keys on the keyboard.

which implies that Text and Key do not work the same.

However it would be a good idea to document this backwards incompatible change in the change log, especially for the benefit of people moving over from original dragonfly (e.g. if I remember rightly, the original multiedit grammar relies on holding down modifier keys to modify Text actions). It might also be helpful to especially call this out in the documentation for the Text action too.

cc @wolfmanstout - my grammar was originally based on your repeat grammar, so I suspect you will run into this too (your full_key_action_map will contain Text actions which won't work with combo_key_element's pressing of modifier keys).

All rules are activated when initializing Dragonfly, causing BadGrammar error

This was discussed on Gitter, but I'm filing a proper issue here and will fix shortly.

The bug is here:
https://github.com/Danesprite/dragonfly/blob/master/dragonfly/grammar/grammar_base.py#L349

The bug is much more obvious if you look at what this used to look like: if rule.active != False. This looked too silly to be a mistake, and indeed it was not. It turns out that _active is a tri-state and is initialized to None. Here's my best guess as to why this breaks things:

When Grammar.load is called, rule.activate is not called on newly-constructed rules. But importantly, behind the scenes, these are already activated in the grammar (that's my hypothesis based on the results, not confirmed).

Later, when contexts are checked to determine whether to activate/deactivate rules, these rules already appear inactive, so they are not deactivated.

All the inactive rules are actually active, leading to the error.

For now, I'm going to revert this problematic change. We can also consider improving this code so that it does not use a tri-state or uses an explicit tri-state values (enum-style).

Add support for different keyboard layouts

Right now different keyboard layouts can cause problems like not being able to write {} with Text().
In Monospark's fork this bug was fixed with ebc094c69ad364b1c217b9e6d4af23aa97711314. This should be fixed here, too.

Add dragonfly commands for loading and recognising from command modules

This would be a new command for dragonfly's command-line interface (see the CLI documentation). I guess the command would be dragonfly load <files>.

It would be similar to dragonfly's test <files> command, but would call the relevant engine method to recognise speech from the microphone. Most of the arguments for test could be re-used, e.g. -e|--engine and --language.

It would be an alternative to the module loaders and should work with each available engine. The observer used by the module loaders could be used with this command to print "Speech started.", "Recognized: <words>" and "Sorry, what was that?". This and other parts of the module loaders could be moved into dragonfly/__main__.py or dragonfly/loader.py to avoid duplicating code.

A -c|--config option could be added for loading a Python module before the others. The config module could be used for engine configuration and setting up other functionality like the wake/sleep modes added by @daanzu's module loaders for Kaldi and WSR.

One benefit of this command over the module loaders is greater control over which modules are loaded. You could specify modules directly or with shell wildcards instead of changing the loader code. For example:

python -m dragonfly load _module1.py _module2.py
python -m dragonfly load *.py
python -m dragonfly load _*.py

This already works with dragonfly test.

Recognition history through JSON-RPC

Currently we can pull grammars from the supported engine system through JSON-RPC. It would be useful to have the ability to pull recognition history as well.

Commands: All recognized commands
Free dictation: All recognized dictations
A message for Dictations that are not recognized by the engine. DNS (say that again)

KeyboardLanguageContext class?

We have noticed some trouble getting commands working with non- English Language / English Keyboard setups dictation-toolbox/Caster#512.

Is it possible to create something like AppContext that will test Language + Keyboard like AppContext tests Apps?

Reposting the "carat" behavior from the linked issue:

German language, German keyboard: ^^
German language, English US keyboard: `
English language, German keyboard: &
English language, English US keyboard: ^

Automatically load and write tests for other supported dragonfly languages

I noticed there are integer and digit content implementations under dragonfly.language.other for Arabic, Indonesian and Malaysian. I suspect these have been left here because neither Dragon NaturallySpeaking or Windows Speech Recognition supports them. Now that dragonfly supports using other speech recognition engines, these should be automatically loaded when the proper language code is used.

There are unit tests for the English, German and Dutch integer and digit classes. It should be simple enough to add similar tests for the other languages. The test engine (issue #36) should help with running these tests without a real speech recognition engine.

Dragonfly monitor returns rectangle of wrong dimensions if windows dpi scaling is enabled

This is a bug originally detected in Caster development. Windows has a dpi scaling option which is mandatory for higher resolutions to make text readable. When this is set to anything but 100%, dragonfly monitor returns a smaller rectangle than the actual screen size. The rectangle's area and dimensions are exactly divided by the dpi scaling. Thus, the function which is used inside dragonfly is probably returning the actual rendering resolution and not the end result which is upscaled.This is very problematic for multiple monitors especially Which can each have different dpi scaling values because you can know neither their real resolution nor their actual positions on the virtual desktop.

Furthermore, I was told (didn't have much time to look at the code) that dragonfly actually calculates the monitors variable only on startup. This is also problematic because the scaling value can change at any time and the correct dimensions should be returned Every Time.

Suggestions to fix this

Use the EnumDisplayMonitors function to get information about the monitors since it returns the Proper values.
Make the rectangle attributes into properties or functions which will be dynamically calculated on runtime.
I Could try my hand to fix this But I'm not very Familiar with the cpython API.

Unicode/multi-language support for grammar specs

At the moment dragonfly's parser only accepts ASCII characters. The string.letters string used by the parser is not locale-dependent and will only work with English characters (see the docs). There have been other issues about this, including dragonfly/t4ngo #11 and Aenea #156.

I think it'd be a good idea to change the appropriate parser classes to use Unicode instead to allow writing grammars in supported languages without limitations on the use of accentuated letters. This could be done for alphanumeric characters using a regex pattern:

re.compile(r'\w',  re.UNICODE)

Strings could then be encoded to windows-1252 for DNS and WSR. If encoding from utf-8 fails, then a SyntaxError could be raised (what happens currently). This translation should work for French, German, Italian, Spanish, Portuguese and Dutch (all supported by DNS and windows-1252).

The work in progress CMU Pocket Sphinx and Google Cloud Speech engines would allow many other languages to be supported.

Accept "too" and "to" as equivalent to "two" for English Integers

Sometimes "too" is interpreted instead of "two" when using an Integer or Short Integer which causes failure to recognize the spec as matching.

Integrate eye tracking platforms

Eye tracking technology allows bypassing of cumbersome elements of coding by voice. Perhaps the slowest element coding by voice is navigation. Navigation also requires many dictated elements to achieve the desired location on a screen. In turn this increases the probability of symptoms related to voice strain.

There are two main implementations for categories of eye tracking. Dedicated hardware which is specifically designed for eye tracking and tends to be expensive. WebCams which are ubiquitous, relatively cheap but are less performed than alternatives.

Tobii dedicated eye tracking hardware.
EyeXMouse - Requires - Windows OS, Tobii EyeX Controller and Software
WebCam based eye tracking
Researching...

After discussing with Versatilus's EyeXMouse fork he thought dragonfly would be a better fit for integrating eye tracking. That being said though still up for discussion.

Clarify dragonfly documentation regarding user selected language of the SR engine.

Currently users can manually edit language for a SR engine. We should clarify the documentation with the example. This should be clarified by each supported speech recognition engine.

Nuance Documentation for DNS 15 supported languages

Missing documentation

There are a number of parts of dragonfly that are not documented. I'll list some of them below.

A pylintrc file for the project's code style (mostly PEP 8).
List and DictList classes.
Classes in the dragonfly.language sub-package (except ShortIntegerRef).
E.g. IntegerRef, Integer, Digits
Classes in the dragonfly.windows sub-package (except Clipboard).
Windows only modules such as win32gui are mocked for documentation builds, so that shouldn't be an issue.
Information on cross-platform support
- which platforms are supported
- what functionality is supported
Cross-platform pyperclip Clipboard class in dragonfly.util.clipboard.
Separate documentation pages on the Natlink, WSR and text-input engine back-ends.

Allow speaking Dictation with the Sphinx backend again

The CMU Pocket Sphinx engine implementation does not currently allow matching Dictation elements by speaking, although they can be matched with mimic.

The old way of doing this was removed in #62 (release 0.12.0) because it really didn't work well. I'd like to add two ways of using dictation:

Speaking Dictation words after utterance breaks via the default language model search (the old behaviour).
Allow using a single language model search for matching dictation and grammar words with no utterance breaks.

I think I can re-implement option 1 in a cleaner, more efficient way now. Any parts of grammar rules using Dictation elements would not be repeatable. This would be the default behaviour. It would not be difficult to add an engine configuration option to choose either option.

Option 2 would require a custom language model, similar to what Silvius uses. The engine would need to differentiate between dictation and grammar words. This can be done by adjusting the probability score for grammar words in the language model. Grammar reloading and list updates wouldn't work without having some way to reload the model.

@dwks I am guessing this is possible if a custom ngram model file is set in the decoder configuration. It wouldn't be as accurate as Silvius, but I think it would be nice to have this option available! :-)

TODO

Fix a few problems with the pyjsgf DictationGrammar class and release a new version.
Re-implement Dictation element support with utterance breaks using a simpler, timer-based approach.
Implement Dictation element support relying on the default language model search for both dictation and grammar rule recognitions without utterance pauses.
Add engine configuration option(s) for dictation support.
Adjust (and simplify) dictation tests in test_engine_sphinx_dictation.py appropriately.

Test engine implementation and module loader

It would be useful to have a test engine implementation without any additional dependencies. This way there can be better automated test coverage with Travis-CI. It would help catch bugs like #33.

A module loader script using the test engine would also be useful for debugging grammars and rules efficiently without restarting the running engine. Another way this could be done is by adding a script that can be used to run modules from the command line. For example:

python -m dragonfly_test _module1.py _module2.py ...

GUI framework using RPC

Many of the people who use Dragonfly have limitations with interacting with the traditional interfaces as keyboard and mouse. Principally a GUI displays information and interface elements they can be interacted with voice, touch, eye tracking, mouse, and keyboard. A cross-platform GUI framework could enhance the Dragonfly platform. Some end-user use cases are:

Listing available commands and grammars in the current context
Integrating and displaying documentation
Guiding the user through complex commands
A GUI for building dragonfly/third-party grammars

As it stands in the moment the proposed libraries to drive this framework will consist of two parts Tkinter and json-rpc.

Evaluated GUI libraries

Kivy is primarily a mobile framework but can run on Linux Mac and Windows. Unfortunately since its focus is primarily mobile OSs it does not have the APIs to integrate well with desktop OSs. One limiting factor is that Kivy renders in OpenGL. There is only one global context that has to be switched to when you want to draw on one window or another. Each window needs to be a separate process and use IPC to communicate. This adds a lot of complexity.
Python-qt5. is cross platform for mobile and desktop applications on Python 2.7 or 3.X. PyQt5 has a very robust API and packaging system. Another very attractive part of the framework is qt5 designer. Qt5 designer is the Qt tool for designing and building graphical user interfaces (GUIs) with Qt Widgets. Compose and customize windows or dialogs in a what-you-see-is-what-you-get (WYSIWYG) manner, and test them using different styles and resolutions.

The library is heavy weighing in around 300mb. Freezing PyQt5 for distribution can generate a large 80MB plus the application with cx_Freeze, py2exe or others. Distribution size can be resolved by using a project such as PyQt5 minimal. Unfortunately the PyQt5 installer through Wheels only supports Python v3.5 and later. Which means were left building from source for Python 2.7. Building from source is not acceptable unless we can distribute precompiled. Once Dragonfly and Natlink transition to Python 3 PyQt5 may be an alternative.
Tkinter is part of Python's "standard" GUI and cross-platform for desktop OSs. It's lightweight relatively easy but also limited in its widgets. While it can run on Python 2.7 and 3.5x the libraries are substantially different between Python versions.

These are all the GUI frameworks I had time to look into alternatives can be found here. I'm open to suggestions.

The second component consists of RPC (Remote Procedure Call), which will drive the GUI. json-rpc has some significant advantages:

Json-rpc can extend beyond a UI use case. It allows us to position other features such as a service for integrating other features securely over networked environment.

Training program/mode for the Pocket Sphinx engine

A method for easily training the acoustic model used for the Pocket Sphinx engine would make it much more useful than it currently is. The default US English model is trained for (essentially) dictation/prose use, rather than for sequences of commands. This is (partly) why the accuracy is less than ideal.

The idea I have in mind is to train the active model using data from the engine, rather than by recording your voice manually. So as the engine is used, recorded audio and speech hypothesises could be stored in files in some configurable folder, perhaps a folder under MacroSystem.

The user could at some point say "start training" (or similar) to start a GUI training program that would have:

an ordered list of recognised phrases - phrases to train, loaded from the folder mentioned above
a button for playing the .wav file associated with the selected phrase (using PyAudio)
(crucially) a way to edit the phrases (by double clicking or using an Edit button) if Pocket Sphinx got it wrong
a button for deleting the selected phrase and associated .wav file
a status bar showing whether phrases were successfully saved after editing or deleted after the delete button was pressed
a button to run the .wav files and phrases through the training process using SphinxTrainingHelper
a log window for the training process
keyboard shortcuts for each button

Some notes

Tkinter could be used to make a cross-platform GUI.
SphinxTrainingHelper requires some programs from sphinxtrain which can be checked for when the training program starts.
- The programs required are bw, map_adapt, mllr_solve, mk_s2sendump and mllr_transform. All of them are in /usr/lib/sphinxtrain for me on Debian 9.
- os.walk could be used to find the required programs in common locations on Linux/Unix systems.
- It might be a good idea to eventually redistribute the binaries for Windows at least, as compiling both sphinxtrain and its sphinxbase dependency from source is a bit of a pain on Windows systems.
Sphinxtrain should allow you to specify noise in training phrases by using [NOISE].
The example for using PyAudio to record into a .wav file will be useful in making the engine record spoken phrases into files.
Phrases spoken when the training program is running shouldn't be added to the list of training phrases because it would be confusing.
The training program should run as a separate process from the engine/loader.
Training and audio files present when the training program starts would be moved to another directory for training, deleted if the training process is successful or moved back if unsuccessful.
The engine's Pocket Sphinx decoder would need to be reinitialised with the adapted model after successful training (engine restart).
Additional documentation would need to be added for common issues with training, engine configuration, etc.

This is pretty ambitious. I'm definitely open to ideas, feedback and help on this, especially on the GUI part.

Functionality to remap integer phrasings?

Caster currently re-implements the integer class from dragonfly to allow for changing the phrasing of single integers to "one", "torque", "traio", "fairn", "faif", "six", "seven", "eigen", "nine". It would be nice to be able to allow a user to do something like this directly with the dragonfly classes. Would it be feasible to allow a user to pass a list to replace the phrasings in īnt_1_9 to their choosing?

dragonfly/dragonfly/language/en/number.py

Lines 37 to 46 in c8ebb79

 int_1_9 = MapIntBuilder({ 

 "one": 1, 

 "two": 2, 

 "three": 3, 

 "four": 4, 

 "five": 5, 

 "six": 6, 

 "seven": 7, 

 "eight": 8, 

 "nine": 9,

Perhaps a simple way to do it is to allow the user to put their own "language" file in the .dragonfly2-speech folder, which would replace the en/number.py etc file.

Mouse wheel scrolling fails with loud beep if set to high repeat value

I have some commands which scroll the wheel 10 times in a row to effectively scroll a page at a time. Since the new scrolling logic was added, this sometimes fails with a loud beep. I can stably reproduce this if I change the repeat value to 50. The reason appears to be the difference between sending multiple events to the OS in high sequence versus sending a larger multiple as the value of the event. I'm going to send out a pull request which changes it to do the latter. If the user wants to send multiple separate events in sequence, then they can use a Repeat action to do so, but without this change there's no workaround for this issue.

@Versatilus FYI

Build error in backend_sapi5/engine.py

Two problems when I ran wsr_module_loader_plus.py

File ".....env\lib\site-packages\dragonfly\engines\backend_sapi5\engine.py", line 336, in recognize_forever
WinEventProcType = WINFUNCTYPE(None, HANDLE, DWORD, HWND, LONG,
NameError: name 'WINFUNCTYPE' is not defined

which I fixed by moving the import of WINFUNCTYPE in that file

from ctypes import windll, WinError
from ctypes.wintypes import DWORD, HANDLE, HWND, LONG, WINFUNCTYPE

from ctypes import windll, WinError, WINFUNCTYPE
from _ctypes.wintypes_ import DWORD, HANDLE, HWND, LONG

I had to correct the file name in main()

    except NameError:
        # The "__file__" name is not always available, for example
        # when this module is run from PythonWin.  In this case we
        # simply use the current working directory.
        path = os.getcwd()
        __file__ = os.path.join(path, "wsr_module_loader_plus.py")

Python 3.6.6 (v3.6.6:4cf1f54eb7, Jun 27 2018, 03:37:03) [MSC v.1900 64 bit (AMD64)]

Mimic and Playback do not work for "press letter" commands

Example:

"test command": Mimic("press", "alpha"),

I believe the words are interpreted correctly by Dragon, but they are just executed as normal dictation - so you get the text "press alpha" rather than "a". Strangely, Mimic("press", "hash") works fine.

For reference, the reason why I want to do this is so that I can create a command to paste from the clipboard using dragon commands, allowing interaction with admin windows. I suspect that the reason this doesn't work is a security feature within Dragon to prevent exactly this, but would be interested to know if there is another explanation. Thanks.

UnicodeDecodeError: 'ascii' codec can't decode byte 0x83 in position 52

This may be related to porting Dragonfly to Python 3.
It was first discovered when enabling Caster filter rules. It can be reproduced on initialization of Caster by adding in _caster.py and adding from builtins import str and from __future__ import unicode_literals to the top.

`Error loading _caster from C:\NatLink\NatLink\MacroSystem\_caster.py
Traceback (most recent call last):
  File "C:\NatLink\NatLink\MacroSystem\core\natlinkmain.py", line 322, in loadFile
    imp.load_module(modName,fndFile,fndName,fndDesc)
  File "C:\NatLink\NatLink\MacroSystem\_caster.py", line 69, in <module>
    class MainRule(MergeRule):
  File "C:\NatLink\NatLink\MacroSystem\_caster.py", line 131, in MainRule
    "disable": False
  File "C:\Python27\lib\site-packages\dragonfly-0.6.6b1-py2.7.egg\dragonfly\grammar\elements_compound.py", line 323, in __init__
    child = Compound(spec=k, value=v, extras=extras)
  File "C:\Python27\lib\site-packages\dragonfly-0.6.6b1-py2.7.egg\dragonfly\grammar\elements_compound.py", line 270, in __init__
    element = self._parser.parse(spec)
  File "C:\Python27\lib\site-packages\dragonfly-0.6.6b1-py2.7.egg\dragonfly\parser.py", line 52, in parse
    for result in generator:
  File "C:\Python27\lib\site-packages\dragonfly-0.6.6b1-py2.7.egg\dragonfly\parser.py", line 413, in parse
    try: path[-1].next()
  File "C:\Python27\lib\site-packages\dragonfly-0.6.6b1-py2.7.egg\dragonfly\parser.py", line 472, in parse
    try: path[-1].next()
  File "C:\Python27\lib\site-packages\dragonfly-0.6.6b1-py2.7.egg\dragonfly\parser.py", line 413, in parse
    try: path[-1].next()
  File "C:\Python27\lib\site-packages\dragonfly-0.6.6b1-py2.7.egg\dragonfly\parser.py", line 547, in parse
    for result in child.parse(state):
  File "C:\Python27\lib\site-packages\dragonfly-0.6.6b1-py2.7.egg\dragonfly\parser.py", line 413, in parse
    try: path[-1].next()
  File "C:\Python27\lib\site-packages\dragonfly-0.6.6b1-py2.7.egg\dragonfly\parser.py", line 708, in parse
    while not state.finished() and state.peek(1) in self._set:
UnicodeDecodeError: 'ascii' codec can't decode byte 0x83 in position 52: ordinal not in range(128)`

Full issue is posted https://github.com/synkarius/caster/issues/236

Let me know how I can help!

Window class implementations for other platforms

Re: cross-platform support (issue #8).

Dragonfly's Window class could be abstracted somewhat to have implementations for Windows, X11 and MacOS. from dragonfly import Window should import the implementation for the current platform. As with the cross-platform Clipboard class, implementations for other platforms can be in the dragonfly.util sub-package.

If this is done in dragonfly instead of in external projects such as Aenea, engine implementations can get the window context for supported platforms more easily. This should also make it relatively easy to implement dragonfly's WaitWindow, FocusWindow, StartApp and BringApp actions for supported platforms as they mostly just use the Window class in some way.

I think that python-xlib can be used to implement the Window class for X11. xdotool or xdo can be used instead if I'm wrong about that.

@wolfmanstout you may be interested in this for your Google engine.

Mappings get executed when the grammar does not match

t4ngo/dragonfly#71

Text action intermittently ignores hardware_apps override

@Versatilus FYI

I'm about to send a pull request to fix this. The symptom is that when unicode_keyboard is enabled and the active application is in hardware_apps, some Text actions may still attempt to send Unicode-style keyboard events. The problem is that the action checks whether or not to use hardware-style events in _parse_spec, which may be resolved at declaration time instead of execution time (depending on whether the spec contains any "%" symbols).

Python 3 IntegerRef bug

I think this bug is related to the classes and methods that build the rules that IntegerRefs use. The following Python code can be run in both versions, but results in different outputs:

from dragonfly import IntegerRef
print(IntegerRef("n", 1, 20).rule.gstring())

Using Python 2.7, the expected rule that will match any number from "one" to "twenty" (exclusive) is built and printed:

<_IntegerRef_07> = (((four) | (seven) | (five) | (three) | (eight) | (nine) | (six) | (two) | (one)) | ((fourteen) | (eleven) | (seventeen) | (eighteen) | (thirteen) | (ten) | (twelve) | (nineteen) | (sixteen) | (fifteen)));

Using Python 3.5, a very long rule is printed that allows speaking numbers between "one" and some number in the thousands:

<_IntegerRef_07> = (((one) | (seven) | (four) | (two) | (six) | (three) | (nine) | (five) | (eight)) | ((eighteen) | (fourteen) | (fifteen) | (sixteen) | (ten) | (thirteen) | (seventeen) | (twelve) | (nineteen) | (eleven)) | (((twenty) [((one) | (seven) | (four) | (two) | (six) | (three) | (nine) | (five) | (eight))])) | ((([(one)] hundred [(([and] (((one) | (seven) | (four) | (two) | (six) | (three) | (nine) | (five) | (eight)) | ((eighteen) | (fourteen) | (fifteen) | (sixteen) | (ten) | (thirteen) | (seventeen) | (twelve) | (nineteen) | (eleven)) | ((((eighty) | (sixty) | (fifty) | (twenty) | (seventy) | (ninety) | (thirty) | (forty)) [((one) | (seven) | (four) | (two) | (six) | (three) | (nine) | (five) | (eight))])))))])) | (([(one)] hundred [(([and] (((one) | (seven) | (four) | (two) | (six) | (three) | (nine) | (five) | (eight)) | ((eighteen) | (fourteen) | (fifteen) | (sixteen) | (ten) | (thirteen) | (seventeen) | (twelve) | (nineteen) | (eleven)) | (((twenty) [((one) | (seven) | (four) | (two) | (six) | (three) | (nine) | (five) | (eight))])))))]))) | ((([((one) | ((([(one)] hundred [(([and] (((one) | (seven) | (four) | (two) | (six) | (three) | (nine) | (five) | (eight)) | ((eighteen) | (fourteen) | (fifteen) | (sixteen) | (ten) | (thirteen) | (seventeen) | (twelve) | (nineteen) | (eleven)) | ((((eighty) | (sixty) | (fifty) | (twenty) | (seventy) | (ninety) | (thirty) | (forty)) [((one) | (seven) | (four) | (two) | (six) | (three) | (nine) | (five) | (eight))])))))])) | (([(one)] hundred [(([and] (one)))]))))] thousand [((([and] (((one) | (seven) | (four) | (two) | (six) | (three) | (nine) | (five) | (eight)) | ((eighteen) | (fourteen) | (fifteen) | (sixteen) | (ten) | (thirteen) | (seventeen) | (twelve) | (nineteen) | (eleven)) | ((((eighty) | (sixty) | (fifty) | (twenty) | (seventy) | (ninety) | (thirty) | (forty)) [((one) | (seven) | (four) | (two) | (six) | (three) | (nine) | (five) | (eight))]))))) | ((([(one)] hundred [(([and] (((one) | (seven) | (four) | (two) | (six) | (three) | (nine) | (five) | (eight)) | ((eighteen) | (fourteen) | (fifteen) | (sixteen) | (ten) | (thirteen) | (seventeen) | (twelve) | (nineteen) | (eleven)) | ((((eighty) | (sixty) | (fifty) | (twenty) | (seventy) | (ninety) | (thirty) | (forty)) [((one) | (seven) | (four) | (two) | (six) | (three) | (nine) | (five) | (eight))])))))])) | (([((seven) | (four) | (two) | (six) | (three) | (nine) | (five) | (eight))] hundred [(([and] (((one) | (seven) | (four) | (two) | (six) | (three) | (nine) | (five) | (eight)) | ((eighteen) | (fourteen) | (fifteen) | (sixteen) | (ten) | (thirteen) | (seventeen) | (twelve) | (nineteen) | (eleven)) | ((((eighty) | (sixty) | (fifty) | (twenty) | (seventy) | (ninety) | (thirty) | (forty)) [((one) | (seven) | (four) | (two) | (six) | (three) | (nine) | (five) | (eight))])))))]))))])) | (([((one) | ((([(one)] hundred [(([and] (((one) | (seven) | (four) | (two) | (six) | (three) | (nine) | (five) | (eight)) | ((eighteen) | (fourteen) | (fifteen) | (sixteen) | (ten) | (thirteen) | (seventeen) | (twelve) | (nineteen) | (eleven)) | ((((eighty) | (sixty) | (fifty) | (twenty) | (seventy) | (ninety) | (thirty) | (forty)) [((one) | (seven) | (four) | (two) | (six) | (three) | (nine) | (five) | (eight))])))))])) | (([(one)] hundred [(([and] (one)))]))))] thousand [((([and] (((one) | (seven) | (four) | (two) | (six) | (three) | (nine) | (five) | (eight)) | ((eighteen) | (fourteen) | (fifteen) | (sixteen) | (ten) | (thirteen) | (seventeen) | (twelve) | (nineteen) | (eleven)) | (((twenty) [((one) | (seven) | (four) | (two) | (six) | (three) | (nine) | (five) | (eight))]))))) | (([(one)] hundred [(([and] (((one) | (seven) | (four) | (two) | (six) | (three) | (nine) | (five) | (eight)) | ((eighteen) | (fourteen) | (fifteen) | (sixteen) | (ten) | (thirteen) | (seventeen) | (twelve) | (nineteen) | (eleven)) | (((twenty) [((one) | (seven) | (four) | (two) | (six) | (three) | (nine) | (five) | (eight))])))))])))]))) | ((([((one) | ((([(one)] hundred [(([and] (((one) | (seven) | (four) | (two) | (six) | (three) | (nine) | (five) | (eight)) | ((eighteen) | (fourteen) | (fifteen) | (sixteen) | (ten) | (thirteen) | (seventeen) | (twelve) | (nineteen) | (eleven)) | ((((eighty) | (sixty) | (fifty) | (twenty) | (seventy) | (ninety) | (thirty) | (forty)) [((one) | (seven) | (four) | (two) | (six) | (three) | (nine) | (five) | (eight))])))))])) | (([(one)] hundred [(([and] (one)))]))) | ((([((one) | ((([(one)] hundred [(([and] (((one) | (seven) | (four) | (two) | (six) | (three) | (nine) | (five) | (eight)) | ((eighteen) | (fourteen) | (fifteen) | (sixteen) | (ten) | (thirteen) | (seventeen) | (twelve) | (nineteen) | (eleven)) | ((((eighty) | (sixty) | (fifty) | (twenty) | (seventy) | (ninety) | (thirty) | (forty)) [((one) | (seven) | (four) | (two) | (six) | (three) | (nine) | (five) | (eight))])))))])) | (([(one)] hundred [(([and] (one)))]))))] thousand [((([and] (((one) | (seven) | (four) | (two) | (six) | (three) | (nine) | (five) | (eight)) | ((eighteen) | (fourteen) | (fifteen) | (sixteen) | (ten) | (thirteen) | (seventeen) | (twelve) | (nineteen) | (eleven)) | ((((eighty) | (sixty) | (fifty) | (twenty) | (seventy) | (ninety) | (thirty) | (forty)) [((one) | (seven) | (four) | (two) | (six) | (three) | (nine) | (five) | (eight))]))))) | ((([(one)] hundred [(([and] (((one) | (seven) | (four) | (two) | (six) | (three) | (nine) | (five) | (eight)) | ((eighteen) | (fourteen) | (fifteen) | (sixteen) | (ten) | (thirteen) | (seventeen) | (twelve) | (nineteen) | (eleven)) | ((((eighty) | (sixty) | (fifty) | (twenty) | (seventy) | (ninety) | (thirty) | (forty)) [((one) | (seven) | (four) | (two) | (six) | (three) | (nine) | (five) | (eight))])))))])) | (([((seven) | (four) | (two) | (six) | (three) | (nine) | (five) | (eight))] hundred [(([and] (((one) | (seven) | (four) | (two) | (six) | (three) | (nine) | (five) | (eight)) | ((eighteen) | (fourteen) | (fifteen) | (sixteen) | (ten) | (thirteen) | (seventeen) | (twelve) | (nineteen) | (eleven)) | ((((eighty) | (sixty) | (fifty) | (twenty) | (seventy) | (ninety) | (thirty) | (forty)) [((one) | (seven) | (four) | (two) | (six) | (three) | (nine) | (five) | (eight))])))))]))))])) | (([((one) | (([(one)] hundred [(([and] (one)))])))] thousand [((([and] (one))) | (([(one)] hundred [(([and] (one)))])))]))))] million [((([and] (((one) | (seven) | (four) | (two) | (six) | (three) | (nine) | (five) | (eight)) | ((eighteen) | (fourteen) | (fifteen) | (sixteen) | (ten) | (thirteen) | (seventeen) | (twelve) | (nineteen) | (eleven)) | ((((eighty) | (sixty) | (fifty) | (twenty) | (seventy) | (ninety) | (thirty) | (forty)) [((one) | (seven) | (four) | (two) | (six) | (three) | (nine) | (five) | (eight))]))))) | ((([(one)] hundred [(([and] (((one) | (seven) | (four) | (two) | (six) | (three) | (nine) | (five) | (eight)) | ((eighteen) | (fourteen) | (fifteen) | (sixteen) | (ten) | (thirteen) | (seventeen) | (twelve) | (nineteen) | (eleven)) | ((((eighty) | (sixty) | (fifty) | (twenty) | (seventy) | (ninety) | (thirty) | (forty)) [((one) | (seven) | (four) | (two) | (six) | (three) | (nine) | (five) | (eight))])))))])) | (([((seven) | (four) | (two) | (six) | (three) | (nine) | (five) | (eight))] hundred [(([and] (((one) | (seven) | (four) | (two) | (six) | (three) | (nine) | (five) | (eight)) | ((eighteen) | (fourteen) | (fifteen) | (sixteen) | (ten) | (thirteen) | (seventeen) | (twelve) | (nineteen) | (eleven)) | ((((eighty) | (sixty) | (fifty) | (twenty) | (seventy) | (ninety) | (thirty) | (forty)) [((one) | (seven) | (four) | (two) | (six) | (three) | (nine) | (five) | (eight))])))))]))) | ((([((one) | ((([(one)] hundred [(([and] (((one) | (seven) | (four) | (two) | (six) | (three) | (nine) | (five) | (eight)) | ((eighteen) | (fourteen) | (fifteen) | (sixteen) | (ten) | (thirteen) | (seventeen) | (twelve) | (nineteen) | (eleven)) | ((((eighty) | (sixty) | (fifty) | (twenty) | (seventy) | (ninety) | (thirty) | (forty)) [((one) | (seven) | (four) | (two) | (six) | (three) | (nine) | (five) | (eight))])))))])) | (([(one)] hundred [(([and] (one)))]))))] thousand [((([and] (((one) | (seven) | (four) | (two) | (six) | (three) | (nine) | (five) | (eight)) | ((eighteen) | (fourteen) | (fifteen) | (sixteen) | (ten) | (thirteen) | (seventeen) | (twelve) | (nineteen) | (eleven)) | ((((eighty) | (sixty) | (fifty) | (twenty) | (seventy) | (ninety) | (thirty) | (forty)) [((one) | (seven) | (four) | (two) | (six) | (three) | (nine) | (five) | (eight))]))))) | ((([(one)] hundred [(([and] (((one) | (seven) | (four) | (two) | (six) | (three) | (nine) | (five) | (eight)) | ((eighteen) | (fourteen) | (fifteen) | (sixteen) | (ten) | (thirteen) | (seventeen) | (twelve) | (nineteen) | (eleven)) | ((((eighty) | (sixty) | (fifty) | (twenty) | (seventy) | (ninety) | (thirty) | (forty)) [((one) | (seven) | (four) | (two) | (six) | (three) | (nine) | (five) | (eight))])))))])) | (([((seven) | (four) | (two) | (six) | (three) | (nine) | (five) | (eight))] hundred [(([and] (((one) | (seven) | (four) | (two) | (six) | (three) | (nine) | (five) | (eight)) | ((eighteen) | (fourteen) | (fifteen) | (sixteen) | (ten) | (thirteen) | (seventeen) | (twelve) | (nineteen) | (eleven)) | ((((eighty) | (sixty) | (fifty) | (twenty) | (seventy) | (ninety) | (thirty) | (forty)) [((one) | (seven) | (four) | (two) | (six) | (three) | (nine) | (five) | (eight))])))))]))))])) | (([(((seven) | (four) | (two) | (six) | (three) | (nine) | (five) | (eight)) | ((eighteen) | (fourteen) | (fifteen) | (sixteen) | (ten) | (thirteen) | (seventeen) | (twelve) | (nineteen) | (eleven)) | ((((eighty) | (sixty) | (fifty) | (twenty) | (seventy) | (ninety) | (thirty) | (forty)) [((one) | (seven) | (four) | (two) | (six) | (three) | (nine) | (five) | (eight))])) | ((([(one)] hundred [(([and] (((seven) | (four) | (two) | (six) | (three) | (nine) | (five) | (eight)) | ((eighteen) | (fourteen) | (fifteen) | (sixteen) | (ten) | (thirteen) | (seventeen) | (twelve) | (nineteen) | (eleven)) | ((((eighty) | (sixty) | (fifty) | (twenty) | (seventy) | (ninety) | (thirty) | (forty)) [((one) | (seven) | (four) | (two) | (six) | (three) | (nine) | (five) | (eight))])))))])) | (([((seven) | (four) | (two) | (six) | (three) | (nine) | (five) | (eight))] hundred [(([and] (((one) | (seven) | (four) | (two) | (six) | (three) | (nine) | (five) | (eight)) | ((eighteen) | (fourteen) | (fifteen) | (sixteen) | (ten) | (thirteen) | (seventeen) | (twelve) | (nineteen) | (eleven)) | ((((eighty) | (sixty) | (fifty) | (twenty) | (seventy) | (ninety) | (thirty) | (forty)) [((one) | (seven) | (four) | (two) | (six) | (three) | (nine) | (five) | (eight))])))))]))))] thousand [((([and] (((one) | (seven) | (four) | (two) | (six) | (three) | (nine) | (five) | (eight)) | ((eighteen) | (fourteen) | (fifteen) | (sixteen) | (ten) | (thirteen) | (seventeen) | (twelve) | (nineteen) | (eleven)) | ((((eighty) | (sixty) | (fifty) | (twenty) | (seventy) | (ninety) | (thirty) | (forty)) [((one) | (seven) | (four) | (two) | (six) | (three) | (nine) | (five) | (eight))]))))) | (([((one) | (seven) | (four) | (two) | (six) | (three) | (nine) | (five) | (eight))] hundred [(([and] (((one) | (seven) | (four) | (two) | (six) | (three) | (nine) | (five) | (eight)) | ((eighteen) | (fourteen) | (fifteen) | (sixteen) | (ten) | (thirteen) | (seventeen) | (twelve) | (nineteen) | (eleven)) | ((((eighty) | (sixty) | (fifty) | (twenty) | (seventy) | (ninety) | (thirty) | (forty)) [((one) | (seven) | (four) | (two) | (six) | (three) | (nine) | (five) | (eight))])))))])))]))))])) | (([((one) | ((([(one)] hundred [(([and] (((one) | (seven) | (four) | (two) | (six) | (three) | (nine) | (five) | (eight)) | ((eighteen) | (fourteen) | (fifteen) | (sixteen) | (ten) | (thirteen) | (seventeen) | (twelve) | (nineteen) | (eleven)) | ((((eighty) | (sixty) | (fifty) | (twenty) | (seventy) | (ninety) | (thirty) | (forty)) [((one) | (seven) | (four) | (two) | (six) | (three) | (nine) | (five) | (eight))])))))])) | (([(one)] hundred [(([and] (one)))]))) | ((([((one) | ((([(one)] hundred [(([and] (((one) | (seven) | (four) | (two) | (six) | (three) | (nine) | (five) | (eight)) | ((eighteen) | (fourteen) | (fifteen) | (sixteen) | (ten) | (thirteen) | (seventeen) | (twelve) | (nineteen) | (eleven)) | ((((eighty) | (sixty) | (fifty) | (twenty) | (seventy) | (ninety) | (thirty) | (forty)) [((one) | (seven) | (four) | (two) | (six) | (three) | (nine) | (five) | (eight))])))))])) | (([(one)] hundred [(([and] (one)))]))))] thousand [((([and] (((one) | (seven) | (four) | (two) | (six) | (three) | (nine) | (five) | (eight)) | ((eighteen) | (fourteen) | (fifteen) | (sixteen) | (ten) | (thirteen) | (seventeen) | (twelve) | (nineteen) | (eleven)) | ((((eighty) | (sixty) | (fifty) | (twenty) | (seventy) | (ninety) | (thirty) | (forty)) [((one) | (seven) | (four) | (two) | (six) | (three) | (nine) | (five) | (eight))]))))) | ((([(one)] hundred [(([and] (((one) | (seven) | (four) | (two) | (six) | (three) | (nine) | (five) | (eight)) | ((eighteen) | (fourteen) | (fifteen) | (sixteen) | (ten) | (thirteen) | (seventeen) | (twelve) | (nineteen) | (eleven)) | ((((eighty) | (sixty) | (fifty) | (twenty) | (seventy) | (ninety) | (thirty) | (forty)) [((one) | (seven) | (four) | (two) | (six) | (three) | (nine) | (five) | (eight))])))))])) | (([((seven) | (four) | (two) | (six) | (three) | (nine) | (five) | (eight))] hundred [(([and] (((one) | (seven) | (four) | (two) | (six) | (three) | (nine) | (five) | (eight)) | ((eighteen) | (fourteen) | (fifteen) | (sixteen) | (ten) | (thirteen) | (seventeen) | (twelve) | (nineteen) | (eleven)) | ((((eighty) | (sixty) | (fifty) | (twenty) | (seventy) | (ninety) | (thirty) | (forty)) [((one) | (seven) | (four) | (two) | (six) | (three) | (nine) | (five) | (eight))])))))]))))])) | (([((one) | ((([(one)] hundred [(([and] (((one) | (seven) | (four) | (two) | (six) | (three) | (nine) | (five) | (eight)) | ((eighteen) | (fourteen) | (fifteen) | (sixteen) | (ten) | (thirteen) | (seventeen) | (twelve) | (nineteen) | (eleven)) | ((((eighty) | (sixty) | (fifty) | (twenty) | (seventy) | (ninety) | (thirty) | (forty)) [((one) | (seven) | (four) | (two) | (six) | (three) | (nine) | (five) | (eight))])))))])) | (([(one)] hundred [(([and] (one)))]))))] thousand [((([and] (one))) | (([(one)] hundred [(([and] (one)))])))]))))] million [((([and] (((one) | (seven) | (four) | (two) | (six) | (three) | (nine) | (five) | (eight)) | ((eighteen) | (fourteen) | (fifteen) | (sixteen) | (ten) | (thirteen) | (seventeen) | (twelve) | (nineteen) | (eleven)) | (((twenty) [((one) | (seven) | (four) | (two) | (six) | (three) | (nine) | (five) | (eight))]))))) | (([(one)] hundred [(([and] (((one) | (seven) | (four) | (two) | (six) | (three) | (nine) | (five) | (eight)) | ((eighteen) | (fourteen) | (fifteen) | (sixteen) | (ten) | (thirteen) | (seventeen) | (twelve) | (nineteen) | (eleven)) | (((twenty) [((one) | (seven) | (four) | (two) | (six) | (three) | (nine) | (five) | (eight))])))))])) | (([((one) | ((([(one)] hundred [(([and] (((one) | (seven) | (four) | (two) | (six) | (three) | (nine) | (five) | (eight)) | ((eighteen) | (fourteen) | (fifteen) | (sixteen) | (ten) | (thirteen) | (seventeen) | (twelve) | (nineteen) | (eleven)) | ((((eighty) | (sixty) | (fifty) | (twenty) | (seventy) | (ninety) | (thirty) | (forty)) [((one) | (seven) | (four) | (two) | (six) | (three) | (nine) | (five) | (eight))])))))])) | (([(one)] hundred [(([and] (one)))]))))] thousand [((([and] (((one) | (seven) | (four) | (two) | (six) | (three) | (nine) | (five) | (eight)) | ((eighteen) | (fourteen) | (fifteen) | (sixteen) | (ten) | (thirteen) | (seventeen) | (twelve) | (nineteen) | (eleven)) | (((twenty) [((one) | (seven) | (four) | (two) | (six) | (three) | (nine) | (five) | (eight))]))))) | (([(one)] hundred [(([and] (((one) | (seven) | (four) | (two) | (six) | (three) | (nine) | (five) | (eight)) | ((eighteen) | (fourteen) | (fifteen) | (sixteen) | (ten) | (thirteen) | (seventeen) | (twelve) | (nineteen) | (eleven)) | (((twenty) [((one) | (seven) | (four) | (two) | (six) | (three) | (nine) | (five) | (eight))])))))])))])))]))));

The test_language_en_number.py, test_language_de_number.py and test_language_de_number.py unit tests pass for Python 2.7 and fail (don't finish) for Python 3.5.

Windows Speech Recognition context bug

WSR locks contexts in at speech start before dragonfly activates/deactivates rules and grammars based on the window attributes. This causes recognitions to fail until the second or later utterance in a specific context.

Notepad grammar example:
I say "save" in notepad after switching it. WSR won't recognise what I said. So I repeat myself and say "save" again without switching windows. It recognises me correctly and presses Ctrl + S.

To fix this we need to either check the active window periodically or register for window changes using the Windows API, then activate/deactivate rules and grammars if the user is not speaking.

Thanks to @daanzu for pointing this out.

Cross platform support

There have been discussions on making dragonfly more cross platform by using packages that support cross platform user input and window management on Windows and Linux (X11). Aenea already does a lot of this in its server implementations, so it may be a good starting point.

TODO

Implement cross-platform clipboard class using pyperclip.
The implementation is located in dragonfly.util.clipboard.
Implement the Key and Text action for X11 and Mac OS using pynput ~~or PyUserInput~~.
Implement Mouse actions using pynput.
The implementation should allow using the same key names and specs for Mouse as on Windows. Other platforms should be supported by moving code into other modules:
- MouseBase action and related classes: actions/mouse/_base.py
- Windows Mouse action sub-class: actions/mouse/_win32.py
- X11/MacOS Mouse action sub-class: actions/mouse/_pynput.py
Implement any utility functions for input actions such as the get_cursor_position() and set_cursor_position() functions in action_mouse.py.
Implement Monitor classes for X11 and Mac OS.
Implement an X11 equivalent for dragonfly's Window class for window management using libxdo.
python-xlib could also be used, but libxdo handles checking various things like what information the window manager can give us about windows (i.e. window manager hints).
Implement a Mac OS equivalent for the Window class.
As I mentioned in #35, there are some Swift projects for doing this. It would probably be better to use pyobjc though, as it is a project requirement on Mac OS now (for Monitor).
~~Implement what is possible for Wayland.~~
See notes and comments below.

Notes

The Mouse action requires dragonfly's Window class to be at least partially implemented in order for relative mouse movements to work.
Wayland doesn't currently allow emulating keyboard/mouse input or window management. python-evdev could be used to emulate input via the uinput Linux kernel module, although this requires elevated privileges. This is what Aenea's Wayland server implementation does.

Edit: I've re-written this post to mostly be a list of things to do rather than using the GitHub project for that.

Sphinx engine methods for recognising from .wav files / streams

The Pocket Sphinx API allows for this and the built-in wave package could be used to validate files (correct sample rate, no. of channels, valid format) using the engine configuration. This would make the engine usable over a network or with a machine that doesn't have a microphone attached.

Methods could be recognise_from_file and recognise_from_buffer. Both would use the PocketSphinx processing methods and change the engine's _audio_buffers list as necessary.

Obviously audio data transferred across an open network should be protected in some way (perhaps using TLS), but that isn't really in the scope of this project.

Engine methods for key phrase spotting

There is support in Pocket Sphinx for listening for a single key phrase or a list of key phrases from a file. This is the same as the 'wake up' command that Dragon has.

It might be nice to have an engine method for registering a key phrase to listen for:

engine.register_keyphrase_search(keyphrase, threshold, function)

and also perhaps unregistering key phrases:

engine.unregister_keyphrase_search(keyphrase)

The following section from the CMU Sphinx wiki explains the threshold parameter:

The threshold must be specified for every keyphrase. For shorter keyphrases you can use smaller thresholds like 1e-1, for longer keyphrases the threshold must be bigger, up to 1e-50. If your keyphrase is very long – larger than 10 syllables – it is recommended to split it and spot for parts separately. The threshold must be tuned to balance between false alarms and missed detections.

Tuning just involves some trial and error.

An example command module:

from dragonfly import get_engine

engine = get_engine("sphinx")

# Register "microphone off" as a key phrase to listen for. If the phrase is heard,
# engine.pause_recognition will be called.
# The '1e-15' value is just a guess
engine.register_keyphrase_search("microphone off", 1e-15, engine.pause_recognition)

def unload():
    engine.unregister_keyphrase_search("microphone off")

pynput can't do some OS shortcuts on X11

I switched from the xdotool linux port to pynput (on master), and discovered that my ctrl-alt-m shortcut for launching a terminal no longer works. ctrl-alt-right to switch desktops is fine. I made the keypress delay very large (1s) and it still didn't work. This might be a problem with pynput that we can't address; I used xev to observe the X11 events that are actually generated, and the 'm' is marked synthetic and same_screen is NO. I'm guessing one of these is the problem.

KeyPress event, serial 49, synthetic NO, window 0x3a00001,
    root 0x1e1, subw 0x0, time 2089007475, (-323,425), root:(548,904),
    state 0x0, keycode 37 (keysym 0xffe3, Control_L), same_screen YES,
    XLookupString gives 0 bytes: 
    XmbLookupString gives 0 bytes: 
    XFilterEvent returns: False

KeyPress event, serial 50, synthetic NO, window 0x3a00001,
    root 0x1e1, subw 0x0, time 2089007477, (-323,425), root:(548,904),
    state 0x4, keycode 64 (keysym 0xffe9, Alt_L), same_screen YES,
    XLookupString gives 0 bytes: 
    XmbLookupString gives 0 bytes: 
    XFilterEvent returns: False

KeyPress event, serial 50, synthetic YES, window 0x3a00001,
    root 0x1e1, subw 0x0, time 0, (0,0), root:(0,0),
    state 0xc, keycode 58 (keysym 0x6d, m), same_screen NO,
"   XLookupString gives 1 bytes: (0d) "
"   XmbLookupString gives 1 bytes: (0d) "
    XFilterEvent returns: False

I can use xdotool to press ctrl-alt-m, and then 'm' isn't sent to xev because it's captured by my window manager. The Control_L and Alt_L events are otherwise identical. Here is a small script demonstrating the problem.

#!/usr/bin/python
import time
import os
from pynput.keyboard import Controller, KeyCode

c = Controller()

def test(l):
    time.sleep(2)
    for (k,d,_) in l: c.touch(k, d)

# ctrl-alt-m
l1 = [(KeyCode(65507), True, 0), (KeyCode(65513), True, 0), (KeyCode.from_char('m'), True, 0), (KeyCode.from_char('m'), False, 0.0), (KeyCode(65513), False, 0), (KeyCode(65507), False, 0)]

# ctrl-alt-right
l2 = [(KeyCode(65507), True, 0), (KeyCode(65513), True, 0), (KeyCode(65363), True, 0), (KeyCode(65363), False, 0.0), (KeyCode(65513), False, 0), (KeyCode(65507), False, 0)]

print "pynput ctrl-alt-m fails..."
test(l1)

print "xdotool key ctrl+alt+m works..."
time.sleep(2)
os.system("xdotool key ctrl+alt+m")

print "pynput ctrl-alt-right works, should move to next desktop"
test(l2)

I haven't read any docs on pynput, but perhaps there is a way to affect the types of keystrokes it generates.

Simplify Git branching scheme

This was decided in this project's gitter channel, but I wanted to say it here too. We're switching from using a develop and master branch to just using a master branch for the latest changes, replacing develop. There are a few reasons for this:

There is a pretty decent Keep a Changelog style changelog now, so there's no need to have a branch at the latest release any more.
Release versions will be tagged and can just be checked out with something like git checkout x.y.z if required.
Updates to the changelog and things like the ReadtheDocs and Travis-CI projects will be simpler.

If you use the develop branch, just switch to using the master branch after this issue is closed.

Dragonfly's unit and doc tests are broken

The tests don't run properly at the moment (at least for me) because of relative importing. I'm working on fixing this. I'd like to make the test suites work properly so they can be run through Travis CI.

Various tests in test_engine_text.py are duplicates or should be moved elsewhere so that they apply to all engines. The Pocket Sphinx engine tests need to be adjusted too, but I'll do that with the rework I've been planning for that engine.

Add Gitter channel to project

Greetings, first off I want to thank you for pursuing adding CMU Pocket Sphinx as a backend for dragonfly. It has been a long hope to see alternative speech recognition engines especially open source projects integrated into dragonfly.

I found gitter as excellent tool to help the exchange of ideas and community. Typically ideas might be formed in gitter chat and when formalized opened up as an github issue or pull request. Depends on your Preference but I thought I might suggest it is a way of fostering community around Pocket Sphinx.
https://gitter.im/

Language 'de' does not implement 'ShortIntegerContent'.

After updating dragonfly I get the following error when starting Caster with a german DNS profile:

WARNING:timer:Dragonfly's _Timer class has been deprecated. Please use engine.create_timer() instead.
Ignoring ccr rule 'VHDL'. Failed to load with: 
<class 'dragonfly.error.DragonflyError'>
Language 'de' does not implement 'ShortIntegerContent'.
<traceback object at 0x0ED5F350>
  File "C:\dev\Natlink-User\Caster\castervoice\lib\ccr\__init__.py", line 36, in <module>
    [class_name])  # attempts to import the class

  File "C:\dev\Natlink-User\Caster\castervoice\lib\ccr\vhdl\vhdl.py", line 6, in <module>
    from castervoice.lib.imports import *

  File "C:\dev\Natlink-User\Caster\castervoice\lib\imports.py", line 10, in <module>
    from castervoice.lib import control, utilities, text_manipulation_functions

  File "C:\dev\Natlink-User\Caster\castervoice\lib\text_manipulation_functions.py", line 6, in <module>
    from castervoice.lib.ccr.core.punctuation import text_punc_dict,  double_text_punc_dict

  File "C:\dev\Natlink-User\Caster\castervoice\lib\ccr\core\punctuation.py", line 61, in <module>
    class Punctuation(MergeRule):

  File "C:\dev\Natlink-User\Caster\castervoice\lib\ccr\core\punctuation.py", line 85, in Punctuation
    IntegerRefST("npunc", 0, 10),

  File "C:\dev\Natlink-User\Caster\castervoice\lib\dfplus\additions.py", line 39, in __init__
    content = language.ShortIntegerContent

  File "C:\dev\Python27\lib\site-packages\dragonfly\language\loader.py", line 70, in __getattr__
    return self.get_attribute(name)

  File "C:\dev\Python27\lib\site-packages\dragonfly\language\loader.py", line 84, in get_attribute
    % (self._language, name))

Invalid

Clipboard multiplatform overhaul

I had some problems with the dragonfly clipboard implementation, specifically the get System text method not working at all Due to that it requests a unicode clipboard format which doesn't exist in some applications like visual studio code.
Instead of fixing the problem in Windows, I added pyperclip which works great. I thought we could use it to overhaul the entire clipboard class. There are some things to consider first:

The current implementation supports storing many different clipboard formats inside its contents dictionary.Pyperclip only supports unicode text which would make formats and all related functions obsolete.
2.Pyperclip is a dependency and we may not want it.

With these limitations, is it worth It to get multi-platform support ?

Grammars with Sapi5SharedEngine containing rules with <n> do not load.

This was initially discussed on Gitter, summarized here and discovered during testing synkarius/caster#305.

Mapping like '[use] function <n> [<text>]' with rules containing <n> don't load on windows speech recognition. Dragonfly's get_engine method uses Sapi5SharedEngine. WSR module loader uses Sapi5InProcEngine which loads sample.py loads properly.

t4ngo/dragonfly#3 appears to be the rational behind having two engine classes for it. Sapi5InProcEngine doesn't seem to need you to start WSR at all, but doesn't have any UI components.

To replicate E:\NatLink\NatLink\MacroSystem> python .\sample.py
sample.zip

 Traceback (most recent call last):
  File ".\sample.py", line 81, in <module>
    grammar.load()
  File "C:\Python27\lib\site-packages\dragonfly\grammar\grammar_base.py", line 338, in load
    self._engine.load_grammar(self)
  File "C:\Python27\lib\site-packages\dragonfly\engines\base\engine.py", line 136, in load_grammar
    wrapper = self._load_grammar(grammar)
  File "C:\Python27\lib\site-packages\dragonfly\engines\backend_sapi5\engine.py", line 124, in _load_grammar
    handle.CmdSetRuleState(rule.name, constants.SGDSActive)
  File "C:\Python27\lib\site-packages\win32com\gen_py\C866CA3A-32F7-11D2-9602-00C04F8EE628x0x5x4\ISpeechRecoGrammar.py", line 67, in CmdSetRuleState
    , State)
pywintypes.com_error: (-2147352567, 'Exception occurred.', (0, None, None, None, 0, -2147200940), None)

Add a documentation tag related to changes in source code where Docs need to be updated

As we continue to make changes to the dragonfly source code we need to keep track of updating the documentation. Could we add a documentation tag so we can look back and refer to issues and projects?

`FocusWindow` sometimes fails.

The method used by FocusWindow sometimes fails. I'm not sure why this happens.

When it fails, it produces an error. A fallback I've used in the past is minimising, then restoring the target window. Here's the relevant method:

import win32gui
import win32con

def focus_window_fallback(hwnd):
    """Fallback for focussing a window when other methods fail. 

    This method will minimise then maximise the target window. Doing 
    this produces a jarring transition animation, so use only as a 
    fallback. 

    :param hwnd: the handle of the window you want to focus.

    """
    win32gui.ShowWindow(hwnd, win32con.SW_MINIMIZE)
    win32gui.ShowWindow(hwnd, win32con.SW_RESTORE)

Discussing on Gitter, it seems like it would be a good idea to add this as a fallback when the primary method fails. I'm happy to integrate it if everyone approves.

Unforking this repository

There has recently been discussion on Gitter on making GitHub code search work with this repository. At the time of writing, code search doesn't work with forks. While it is not a must-have feature, it would be nice to have. Detaching this repository from t4ngo/dragonfly seems to be the best option for fixing this.

There are also other reasons for unforking. It looks like the project will also be easier to find via web searches, GitHub references (i.e. #XX) wouldn't inadvertently reference issues in the original project instead and pull requests would be easier to make from other branches in dictation-toolbox/dragonfly. I have asked GitHub support if there are any downsides to unforking.

Unless there is opposition to this change, we'll go ahead with it and mention that the project was forked from t4ngo/dragonfly in the readme and docs.

null

Greedy Timers

Currently, if a Timer falls behind in getting called, it can start monopolizing execution. This is a valid option for the wary user, but seems a dangerous default.

It'd be safer to reset the next time to from now. Perhaps with a warning that calls were skipped.

dragonfly/dragonfly/engines/base/timer.py

Line 89 in e50b000

self.next_time += self.interval

Change dragonfly.timer._Timer class to use 'engine.create_timer'

Dragonfly's _Timer class predates the engine.create_timer() method that works with all engines. These two ways of using natlink's setTimerCallback function are incompatible because either one will override any already set callback function, meaning that some timer functions will no longer be called.

I mentioned this in dictation-toolbox/Caster#490. I reckon that the class should be changed to use engine.create_timer() internally to get around this problem. This would also remove the need to import and use natlink in that file and mean that the class will automatically work with all engines.

According to commit 1d3ba37, t4ngo meant to remove the dragonfly/timer.py file in late 2009, but seeing as Caster currently relies on it for a few things, I have no problem keeping it around until it is no longer used in their current release. I would like to at least add a short deprecation warning in the __init__() method though.

@LexiconCode This should fix the problems and give me or someone else time to fix Caster's timers separately.

Rules using command chaining and Dictation don't work for the Sphinx engine

Command chaining (repetition) of rules using Dictation elements/extras doesn't work at the moment. This is due to such rules needing to be spoken in sequence with utterance pauses.

I have some work and tests on this using a Timer thread which processes the complete repetitions of rules so far after a timeout period, similar to the timeout period that starts after speaking part of a rule. This work is in the feature/repetition branch.

macron not printing

I get the following error when trying to print the macron "a" in ngā mihi.

Function(maori_words): 'ascii' codec can't decode byte 0xc4 in position 2: ordinal not in range(128)`

My function is:

def maori_words(big, word):
    if big:
        word = word.title()
    Text(str(word)).execute()

Is dragonfly able to handle this situation when passing a UTF-8 character through Function and Text? If so, help!

Continuous integration with Travis-CI

Dragonfly has a large number of unit tests, some of which are platform/engine specific. It would be nice to use Travis-CI for unit tests which aren't platform/engine specific, like the parser and element tests, so that it's obvious when new commits/PRs break tests or if certain functionality doesn't work with specific Python versions.

At the moment Travis-CI only supports Linux testing environments for Python, so tests for Windows-specific functionality won't work and should be excluded.

The Pocket Sphinx engine tests could also be included, although the sphinxwrapper and pyjsgf dependencies should be installable via pip install first (i.e. distributed on PyPI.org).

	int_1_9 = MapIntBuilder({
	"one": 1,
	"two": 2,
	"three": 3,
	"four": 4,
	"five": 5,
	"six": 6,
	"seven": 7,
	"eight": 8,
	"nine": 9,

dictation-toolbox / dragonfly Goto Github PK

dragonfly's Introduction

dictation-toolbox

Summary

Projects

aenea

aenea-grammars

dragonfly-scripts

aenea-resources

dragonfly's People

Contributors

Stargazers

Watchers

Forkers

dragonfly's Issues

TODO

Some notes

TODO

Notes

Recommend Projects

Recommend Topics

Recommend Org

Jobs