mycroftai / padatious Goto Github PK
View Code? Open in Web Editor NEWA neural network intent parser
Home Page: http://padatious.readthedocs.io
License: Apache License 2.0
A neural network intent parser
Home Page: http://padatious.readthedocs.io
License: Apache License 2.0
Right now I have a big long auto-generated list of different units, things like [meter,mile,amp,ampere,]
etc. This list is created using a combination of auto-generation and hand-editing.
Right now I need to create copies of the file, unitFrom.entity
and unitTo.entity
as an example. You can see some sample vocab here:
How many {unitTo} is {unitFrom}
How many {unitFrom} are in a {unitTo}
How many {unitFrom} in a {unitTo}
What is {unitFrom} in {unitTo}
Duplicating the files is a bad solution because if those files get out-of-sync at any point it could create some very confusing and hard-to-debug issues. Using symlinks is also confusing and presumes that mycroft will only ever be deployed on linux.
I think the best solutions would be to allow named-capture-groups, perhaps something like
How many {unit:to} is {unit:from}
How many {unit:from} are in a {unit:to}
How many {unit:from} in a {unit:to}
What is {unit:from} in {unit:to}
Hello,
test in question is failing with ca 30% probability in our build system. I have extraxted following testcase:
from time import monotonic
import os
import random
from padatious.intent_container import IntentContainer
cont = IntentContainer('temp')
cont.add_intent('a',
[' '.join(random.choice('abcdefghijklmnopqrstuvwxyz') for _ in range(5))
for __ in range(300)])
cont.add_intent('b',
[' '.join(random.choice('abcdefghijklmnopqrstuvwxyz') for _ in range(5))
for __ in range(300)])
for x in range(10):
a = monotonic()
assert not cont.train_subprocess(timeout=0.1)
b = monotonic()
print (b - a)
When I run it, I had got for example:
0.47674093791283667
0.5609202678315341
0.5488572919275612
6.474134984891862
0.4769664751365781
0.45290810498408973
0.470392829971388
0.4690805918071419
0.46847033803351223
0.4608854129910469
I have the following setup:
code.intent:
(code|error) (|is) {code}
code.entity:
###
Example Phrase:
How is code 404 named?
In this case "code.intent" triggers as expected only with 3-digit-numbers, but captures all words following the entity too. So in this example message.data['code']
returns "404 named" instead of only "404".
Issue can be resolved by searching directory '/usr/lib/x86_64-linux-gnu/' which is where libfann-dev installs. Alternatively, running command 'sudo ln -s /usr/lib/x86_64-linux-gnu/fann -d /lib/' will symbolically link the libraries into pips path
Log files
import pathlib, padatious
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/padatious/init.py", line 15, in
from .intent_container import IntentContainer
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/padatious/intent_container.py", line 25, in
from padatious.entity import Entity
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/padatious/entity.py", line 17, in
from padatious.simple_intent import SimpleIntent
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/padatious/simple_intent.py", line 15, in
from fann2 import libfann as fann
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/fann2/init.py", line 4, in
from fann2 import libfann
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/fann2/libfann.py", line 13, in
from . import _libfann
ImportError: dlopen(/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/fann2/_libfann.cpython-37m-darwin.so, 2): Library not loaded: libdoublefann.2.dylib
Referenced from: /Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/fann2/_libfann.cpython-37m-darwin.so
Reason: image not found
Environment (please complete the following information):
Describe the bug
I'm working on a feature addition to the spotify skill that works with the following intent:
list (all) (albums|records) by {artist}
For values of artist
that are multiple words, the first word in this phrase gets chopped off:
(Pdb) message.data
{'artist': 'might be giants', 'utterance': 'list albums by they might be giants', 'utterances': ['list albums by they might be giants']}
Hi,
I contributed the instantiate_from_disk() functionality some time ago (short recap: you can reuse and load trained models from disk via utilizing cached contents which have been stored externally).
Unfortunately, there is a bug I found out recently. I fixed it already with some kind of workaround. In the following I describe the details/background.
Background:
instantiate_from_disk()
does. Another code change and pull request for this project might be necessary here but that can be discussed on the padaos repo website.self.padaos.add_entity(name, lines)
in method add_entity
of intent_container.py
contained empty lines
- I provided an empty list via instantiate_from_disk()
since for reloading/instantiating from disk no training data is (usually) necessary! In this case it is, unfortunately (due to the need to recompile patterns with padaos
)Implications: If no contents for entities and intents are provided to the padas compiler, most entities embedded in intents are not detected and some intents tend to be completely wrong most of the time. So it is rather severe.
Solution: As said before, I would like to contribute the fix via another pull request. The fixed functionality has also been considered in an improved unit test (in test_container.test_instantiate_from_disk()
)
I am working on it and can't able to load the model. Please help me out for the same.
Give the following intent
add {Food} to (| (the | my)) {ShoppingList} (| list) (under {Category} |)
Shopping list is improperly parsed
Using the phrase:
add temperature sensors to steve's projecta
The intent parser produces
steve ' s projecta
instead of steve's projecta
The utterance shows correct parsing:
~~~~50788 | __main__:handle_utterance:72 | Utterance: ["add temperature sensors to steve's projects"]
however the message.data
shows that shoppinglist
has been poorly parsed
{'food': 'temperature sensors', 'shoppinglist': "steve ' s projects", 'utterance': "add temperature sensors to steve's projects"}
This obviously causes a errors or unmatched entities
The error comes from match_data.py. This statement:
def detokenize(self):
self.sent = ' '.join(self.sent)
combine with the fact that self.sent
is split like so:
'sent': ['add', 'something', 'to', 'steve', "'", 's', 'projects'], 'matches': {}, 'conf': 0.0}
Causes the error. One solution that could be refined is
@staticmethod
def handle_apostrophes(old_sentence):
new_sentence = ''
apostrophe_present = False
sentence = "steve's projects"
for word in old_sentence:
if word == "'":
apostrophe_present = True
new_sentence += word
else:
if apostrophe_present:
new_sentence += word
apostrophe_present = False
else:
if len(new_sentence) > 0:
new_sentence += " " + word
else:
new_sentence = word
apostrophe_presnet = False
return new_sentence
# Converts parameters from lists of tokens to one combined string
def detokenize(self):
self.sent = self.handle_apostrophes(self.sent)
new_matches = {}
for token, sent in self.matches.items():
print(self.handle_apostrophes(sent))
new_token = token.replace('{', '').replace('}', '')
new_matches[new_token] = self.handle_apostrophes(sent)
self.matches = new_matches
For an intent file like this:
(Start|Set) (a|) 5 minute timer (called|for) {name}.
The phrase:
"start a 5 minute timer called lasagna"
Return the entity "name" as "called lasagna".
However it does correctly parse:
"set a 5 minute timer called lasagna"
Returning the entity "name" as "lasagna".
This is the sample code and I am wondering that after training the container, is there any way to save this model?
from padatious import IntentContainer
container = IntentContainer('intent_cache')
#add intents and entities here just like regex
#documentation at: https://mycroft.ai/documentation/padatious/
container.add_intent('greeting', ['(Hi | Goodbye | Good | Hello) (| there!) {greeting}',
'Hello.'])
container.add_intent('fired',['fired','cancel','{person} (are | is) (terminated | fired).',
'(I | We) do not (need | want) your services (now | anymore).',
'{person} (are | services) (are not | not | now) (needed | canceled) (now | anymore |).'])
container.train()
I have tried using pickle and dill:
import pickle
import dill
with open('padatious_model', 'wb') as fp: #throws cant pickle swigPy objects
pickle.dump(container, fp)
with open("dillable_config.pkl", "wb") as f: #throws cant pickle swigPy objects
dill.dump(container, f)
I was wondering if there are another ways to save this padatious model after training, load and use them later to avoid re-training everytime.
I had a dependency issue in Ubuntu 18.04 and Raspberry Pi OS that required me to also install the FANN bindings. Recommend updating the Readme to include this dependency.
apt-get install python3-fann2
After installing those things needed from the documentation:
sudo apt-get install libfann-dev python3-dev python3-pip swig
When I then go to run this:
pip3 install padatious
I get this error:
Collecting padatious
Downloading https://files.pythonhosted.org/packages/33/c1/a54ac3f8fe5fac7fc9537beb90576673a660f3da9147e1317adf6e4c3cfb/padatious-0.4.7.tar.gz
Collecting fann2 (from padatious)
Downloading https://files.pythonhosted.org/packages/80/a1/fed455d25c34a62d4625254880f052502a49461a5dd1b80854387ae2b25f/fann2-1.1.2.tar.gz (66kB)
100% |████████████████████████████████| 71kB 4.6MB/s
Complete output from command python setup.py egg_info:
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "/tmp/pip-install-mui8t_t6/fann2/setup.py", line 92, in <module>
build_swig()
File "/tmp/pip-install-mui8t_t6/fann2/setup.py", line 85, in build_swig
find_fann()
File "/tmp/pip-install-mui8t_t6/fann2/setup.py", line 73, in find_fann
raise Exception("Couldn't find FANN source libs!")
Exception: Couldn't find FANN source libs!
Looking for FANN libs...
----------------------------------------
Command "python setup.py egg_info" failed with error code 1 in /tmp/pip-install-mui8t_t6/fann2/
This happens on both a WSL (Windows subsystem) and a digital ocean droplet that has the latest version of ubuntu on it. Did the documentation leave something out or did something break?
From https://mycroft.ai/documentation/padatious/:
The register_intent_file(intent_file, handler) methods arguments are:
intent_file: the filename of above mentioned intent files without the .intent as argument.
Yet, in my skills:
@intent_file_handler("stop.info")
doesn't work, I have to add the .intent extension.
Hi
I forked a skill to https://github.com/aussieW/nature-sound-skill which I am trying to improve but I am having an issue with the .intent file under one specific condition.
The .intent file contains:
play (|{sound}) relaxation music
listen to {sound} relaxation music
relax (with|to) {sound}
relax to the sound of (|the) {sound}
play (|some) relaxing (music|sounds|{sound})
listen to (|some) relaxing (music|sounds|{sound}
I understand from a recent conversation on Mattermost that the last two lines probably don't work yet.
{sound} represents one of a number of available mp3 files.
The .entity file is dynamically constructed from the available mp3 files. In this case it contains:
dawn chorus
rainy river
ocean waves
rainforest
hot spring
urban thunderstorm
tropical storm
My problem relates to line 4, 'relax to the sound of (|the) {sound}'. If 'the' is used in any request it always ends up as part of {sound}.
e.g. 'relax to the sound of the rainforest' results in {sound} = 'the rainforest'
Hello,
Thanks for the wonderful software. Pls is there a way to declare like a datetime entity, which gives as its output some utc or something?
For example I type, “What is the weather in London on Friday?” It gives me weather intent, alongside converting “Friday” to datetime string.
Thanks
Could you please consider to add LICENSE and tests/ into pythonhosted.org tarball?
On Alpine Linux, some architectures (not all of them, e.g. it works fine on x86_64) seem to fail on test_train_timeout_subprocess
:
============================= test session starts ==============================
platform linux -- Python 3.8.3, pytest-5.4.2, py-1.8.1, pluggy-0.13.1
rootdir: /builds/PureTryOut/aports/testing/py3-padatious/src/padatious-0.4.8
collected 36 items
tests/test_all.py ...... [ 16%]
tests/test_container.py ....sF......... [ 58%]
tests/test_entity_edge.py . [ 61%]
tests/test_id_manager.py .... [ 72%]
tests/test_intent.py .. [ 77%]
tests/test_match_data.py .. [ 83%]
tests/test_train_data.py . [ 86%]
tests/test_util.py ..... [100%]
=================================== FAILURES ===================================
______________ TestIntentContainer.test_train_timeout_subprocess _______________
self = <tests.test_container.TestIntentContainer object at 0xffff8f5d54c0>
def test_train_timeout_subprocess(self):
self.cont.add_intent('a', [
' '.join(random.choice('abcdefghijklmnopqrstuvwxyz') for _ in range(5))
for __ in range(300)
])
self.cont.add_intent('b', [
' '.join(random.choice('abcdefghijklmnopqrstuvwxyz') for _ in range(5))
for __ in range(300)
])
a = monotonic()
assert not self.cont.train_subprocess(timeout=0.1)
b = monotonic()
> assert b - a <= 1
E assert (3178089.64288705 - 3178088.391084973) <= 1
tests/test_container.py:149: AssertionError
=========================== short test summary info ============================
FAILED tests/test_container.py::TestIntentContainer::test_train_timeout_subprocess
=================== 1 failed, 34 passed, 1 skipped in 5.89s ====================
I'd think assert with those values would be less than 1 so equal to true, but maybe it doesn't like the .
for some reason?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.