Duckling is a Haskell library that parses text into structured data.
"the first Tuesday of October"
=> {"value":"2017-10-03T00:00:00.000-07:00","grain":"day"}
A Haskell environment is required. We recommend using stack.
On Linux and MacOS you'll need to install PCRE development headers. On Linux, use your package manager to install them. On MacOS, the easiest way to install them is with Homebrew:
brew install pcre
If that doesn't help, try running brew doctor
and fix
the issues it finds.
To compile and run the binary:
stack build
stack exec duckling-example-exe
The first time you run it, it will download all required packages.
This runs a basic HTTP server. Example request:
curl -XPOST http://0.0.0.0:8000/parse --data 'locale=en_GB&text=tomorrow at eight'
In the example application, all dimensions are enabled by default. Provide the parameter dims
to specify which ones you want. Examples:
Identify credit card numbers only:
$ curl -XPOST http://0.0.0.0:8000/parse --data 'locale=en_US&text="4111-1111-1111-1111"&dims="["credit-card-number"]"'
If you want multiple dimensions, comma-separate them in the array:
$ curl -XPOST http://0.0.0.0:8000/parse --data 'locale=en_US&text="3 cups of sugar"&dims="["quantity","numeral"]"'
See exe/ExampleMain.hs
for an example on how to integrate Duckling in your
project.
If your backend doesn't run Haskell or if you don't want to spin your own Duckling server, you can directly use wit.ai's built-in entities.
Duckling supports many languages, but most don't support all dimensions yet (we need your help!). Please look into this directory for language-specific support.
Dimension | Example input | Example value output |
---|---|---|
AmountOfMoney |
"42€" | {"value":42,"type":"value","unit":"EUR"} |
CreditCardNumber |
"4111-1111-1111-1111" | {"value":"4111111111111111","issuer":"visa"} |
Distance |
"6 miles" | {"value":6,"type":"value","unit":"mile"} |
Duration |
"3 mins" | {"value":3,"minute":3,"unit":"minute","normalized":{"value":180,"unit":"second"}} |
Email |
"[email protected]" | {"value":"[email protected]"} |
Numeral |
"eighty eight" | {"value":88,"type":"value"} |
Ordinal |
"33rd" | {"value":33,"type":"value"} |
PhoneNumber |
"+1 (650) 123-4567" | {"value":"(+1) 6501234567"} |
Quantity |
"3 cups of sugar" | {"value":3,"type":"value","product":"sugar","unit":"cup"} |
Temperature |
"80F" | {"value":80,"type":"value","unit":"fahrenheit"} |
Time |
"today at 9am" | {"values":[{"value":"2016-12-14T09:00:00.000-08:00","grain":"hour","type":"value"}],"value":"2016-12-14T09:00:00.000-08:00","grain":"hour","type":"value"} |
Url |
"https://api.wit.ai/message?q=hi" | {"value":"https://api.wit.ai/message?q=hi","domain":"api.wit.ai"} |
Volume |
"4 gallons" | {"value":4,"type":"value","unit":"gallon"} |
Custom dimensions are also supported.
To regenerate the classifiers and run the test suite:
stack build :duckling-regen-exe && stack exec duckling-regen-exe && stack test
It's important to regenerate the classifiers after updating the code and before running the test suite.
To extend Duckling's support for a dimension in a given language, typically 4 files need to be updated:
-
Duckling/<Dimension>/<Lang>/Rules.hs
-
Duckling/<Dimension>/<Lang>/Corpus.hs
-
Duckling/Dimensions/<Lang>.hs
(if not already present inDuckling/Dimensions/Common.hs
) -
Duckling/Rules/<Lang>.hs
To add a new language:
- Make sure that the language code used follows the ISO-639-1 standard.
- The first dimension to implement is
Numeral
. - Follow this example.
To add a new locale:
- There should be a need for diverging rules between the locale and the language.
- Make sure that the locale code is a valid ISO3166 alpha2 country code.
- Follow this example.
Rules have a name, a pattern and a production. Patterns are used to perform character-level matching (regexes on input) and concept-level matching (predicates on tokens). Productions are arbitrary functions that take a list of tokens and return a new token.
The corpus (resp. negative corpus) is a list of examples that should (resp. shouldn't) parse. The reference time for the corpus is Tuesday Feb 12, 2013 at 4:30am.
Duckling.Debug
provides a few debugging tools:
$ stack repl --no-load
> :l Duckling.Debug
> debug (makeLocale EN $ Just US) "in two minutes" [Seal Time]
in|within|after <duration> (in two minutes)
-- regex (in)
-- <integer> <unit-of-duration> (two minutes)
-- -- integer (0..19) (two)
-- -- -- regex (two)
-- -- minute (grain) (minutes)
-- -- -- regex (minutes)
[Entity {dim = "time", body = "in two minutes", value = RVal Time (TimeValue (SimpleValue (InstantValue {vValue = 2013-02-12 04:32:00 -0200, vGrain = Second})) [SimpleValue (InstantValue {vValue = 2013-02-12 04:32:00 -0200, vGrain = Second})] Nothing), start = 0, end = 14}]
Duckling is BSD-licensed.
duckling's People
Forkers
almostimplemented neo4reo silky pakoito iporsut domdere tommyengstrom anghene k-bx from0tohero matteoredaelli traviswhitaker umerazad hhy5277 niteria cdornan akrest mgallo sebastianmika csomai ranjithpillai jdimond iamalbert ongair johan-fx ramtinms empia kkpoon rfranek hdlj bx5974 samtecspg shadowridgedev matrix-revolution neuroradiology huguanglong shellandbull andersonsantos hongwui yame- pcgreat dominicbreuker threefoldo jimregan carlosipe shaoyx dwwoelfel amenella tarung-ml alxferraz mindis ghamrouni pi19404 mikedeshazer flow-ai leezqcst senolakkas kantord chengzh2008 creswick st0yanov xhuvom cogmeta gaybro8777 thientu thienphucst92 alladinian renopp kosio aradhyamathur fiona2015 avinashgupta azappella patlachance emilekos p3gleg ix315 profnandaa stefb965 rasahq oscalarr jojoman2 blankrain jackieju abdallatif rajacsp ilya-murzinov clerval bekerov hollinwilkins adzhondzhorov zewlak igor-drozdov theseafarer arunprasathe mzilinec kasparsu hydercps pranjaltale16 benzid-waelduckling's Issues
"what was the weather on Friday" should extract Friday the 16th if today is Wednesday the 21st
Instead, it seems that we choose the closest Friday (the 23rd).
I realize that detecting this case is a little more complicated than detecting last Friday, since we have to rely on detecting the tense (past) of the sentence, but it would be cool if it is possible.
Parsing estonian symbols
Well I am trying to help out with estonian language. Im new to haskell so asking for help. anyway it seems that it won't match some of my languages special characters. as it turns them into alt codes aka kümnes
becomes k\252mnes
etc. Solution is easy enough to encode all the matches as well, but its harder to grasp and read then. Any other ideas? letters that might be issue in estonian are üäöõšž
they seem to be working in wit.io tho so I think it might be connected to docker container.
Overriding assumptions + Custom dimensions
Our development team is interested in customizing behavior of the currently supported dimensions. My current understanding is that there is not a way to "extend" this when creating your own project and pulling in Duckling as a dependency; one would need to fork the project and make changes to the source code.
Is this understanding accurate? If not, how could a user override dimensions outside of editing the dimension source code?
CC @pcgreat
the meaning of latent time
Hi all,
What is the meaning of latent time in Duckling? Since the library filters latent time, Duckling fails to recognise "2017年5月" in Chinese (aka. 2017-05). More specifically, the rule ruleMonthNumericWithMonthSymbol
at line 971 in Duckling/Times/ZH/Rule.hs makes "5月" as latent time.
To fix the bug, we can remove the mkLatent
function in line 981.
However, I still confuse the latent time. Is there anyone can help me?
Thanks in advance!
Changing the default timezone in Duckling
Hello,
I'm trying to change the default timezone from UTC-7 to UTC. I'm totally new to Haskell, so could you help me understand how and where to change the code to do something like this?
Unexpected behavior : numbers parsed as time (FR, PT)
Hi,
I just found out that when I try to parse un deux trois
it parses following time entities
[
{
"dim": "time",
"body": "un deux",
"value": {
"values": [
{
"value": "2017-09-18T13:02:00.000-07:00",
"grain": "minute",
"type": "value"
},
{
"value": "2017-09-19T01:02:00.000-07:00",
"grain": "minute",
"type": "value"
},
{
"value": "2017-09-19T13:02:00.000-07:00",
"grain": "minute",
"type": "value"
}
],
"value": "2017-09-18T13:02:00.000-07:00",
"grain": "minute",
"type": "value"
},
"start": 0,
"end": 7
},
{
"dim": "time",
"body": "deux trois",
"value": {
"values": [
{
"value": "2017-09-18T02:03:00.000-07:00",
"grain": "minute",
"type": "value"
},
{
"value": "2017-09-18T14:03:00.000-07:00",
"grain": "minute",
"type": "value"
},
{
"value": "2017-09-19T02:03:00.000-07:00",
"grain": "minute",
"type": "value"
}
],
"value": "2017-09-18T02:03:00.000-07:00",
"grain": "minute",
"type": "value"
},
"start": 3,
"end": 13
}
]
The result is the same even if I try to pass the dimension number
in my request.
I also tried to parse one two three
in english and it's working perfectly, so I must assume that it's a specific french rule which alter the expected result.
Any hints on this ?
"next <day-of-week>" should be the day-of-week that comes next week
if the refTime is Wednesday (today), "next Thursday" currently refers to Thursday tomorrow, which doesn't seem right. In my mind, "next Thursday" should always refer to the Thursday of next week. If you also agree with this, I can send a PR to fix it
Identifying where Duckling found the entity (in the text)
Hi,
Duckling is really awesome - thank you for building this tool!
Would it be possible to add the substring in the original text where an entity was detected? (or is this already available?)
For example, a query like I'm going to visit the US on the first Tuesday of October
provides the response:
{ "values": [ { "value": "2017-10-03T00:00:00.000-07:00", "grain": "day", "type": "value" } ], "value": "2017-10-03T00:00:00.000-07:00", "grain": "day", "type": "value" }
Would it be possible to include first Tuesday in October
as part of the JSON response? So the response could be:
{ "values": [ { "value": "2017-10-03T00:00:00.000-07:00", "grain": "day", "type": "value" } ], "value": "2017-10-03T00:00:00.000-07:00", "text": "first Tuesday of October", "grain": "day", "type": "value" }
where "text": "first Tuesday of October",
shows which part of the string contained the Time
entity.
Return latent entities
Sometimes, we might want to "force" a parse of Time
from a text.
Typically, when responding to a bot that expects a Time
alternative. There might be a use case for other dimensions as well, e.g. Temperature
.
To cover these use cases, we'd like to ask Duckling to return latent entities.
Ordinals not recognized when preceded by "the"
The ordinal 4 is successfully extracted in the examples:
debug EN "4th step of the recipe" [This Time, This Ordinal]
debug EN "the 4th step of the recipe" [This Ordinal]
But an incorrect time entity (and no ordinal) is extracted in the example:
debug EN "the 4th step of the recipe" [This Ordinal, This Time]
Is Location extraction on the cards?
Amazing work with the update!
Any plans to extend it to support location extraction ? Would love to contribute!
Parse a relative time from a specific referenceTime
Hi everyone,
Would it be possible to parse a time from a specific referenceTime ?
As i understood (let me know if i'm wrong) for now referenceTime is only based on the given timezone.
What I would like to do is to give as input a specific datetime, and get a result from that specific datetime.
For example if my inputs are
input: {
referenceTime: '2017-06-08T15:00:00+02:00',
text: 'next monday at 8am',
}
I would like to get next monday at 8am from the 8th of june 2017, that means 2017-06-12T08:00:00+02:00
Would it be possible ?
Thanks !
How to handle Lunar holidays
Holidays like Eid al Fitr and Chinese New Year are lunar holidays, which change every year and no explicit pattern to calculate them out. So I wonder if we can explicitly list out the holiday date for the past and upcoming 10 years, but not sure how to write the rules.
incorrect year
for phrase "list all movies released from 23 may to 2 aug", duckling is giving date range as "2018-5-23 to 2018-8-2". Year is being parsed incorrectly.
Amount of money has lossy value type
Now the value is a Double, which is subject to floating point imprecision.
Just an idea - maybe (also) store as Integer + exponent?
"trip for 10 days starting 18th Dec"
above should perhaps return an interval - [December 18, December 28]. I added a custom rule to implement and test this out as shown below. However, the return value using the below rule adds an extra hour to end-date for some reason. Does anyone have an idea on why the extra-hour is added and potential fix?
[Entity {dim = "time", body = "for 10 days from 18th Dec", value = "{\"values\":[{\"to\":{\"value\":\"2013-12-28T01:00:00.000-02:00\",\"grain\":\"hour\"},\"from\":{\"value\":\"2013-12-18T00:00:00.000-02:00\",\"grain\":\"hour\"},\"type\":\"interval\"},{\"to\":{\"value\":\"2014-12-28T01:00:00.000-02:00\",\"grain\":\"hour\"},\"from\":{\"value\":\"2014-12-18T00:00:00.000-02:00\",\"grain\":\"hour\"},\"type\":\"interval\"},{\"to\":{\"value\":\"2015-12-28T01:00:00.000-02:00\",\"grain\":\"hour\"},\"from\":{\"value\":\"2015-12-18T00:00:00.000-02:00\",\"grain\":\"hour\"},\"type\":\"interval\"}],\"to\":{\"value\":\"2013-12-28T01:00:00.000-02:00\",\"grain\":\"hour\"},\"from\":{\"value\":\"2013-12-18T00:00:00.000-02:00\",\"grain\":\"hour\"},\"type\":\"interval\"}", start = 0, end = 25}]
Rule:
ruleIntervalForDurationFrom :: Rule
ruleIntervalForDurationFrom = Rule
{ name = "for <duration> from <time>"
, pattern =
[ regex "for"
, dimension Duration
, regex "(from|starting|beginning|after|starting from)"
, dimension Time
]
, prod = \tokens -> case tokens of
(_:Token Duration dd:_:Token Time td1:_) ->
Token Time <$> interval TTime.Open td1 (durationAfter dd td1)
_ -> Nothing
}
Support for parser combinators
Many of the examples of regexes are reached the point where a parser combinator library would be a much better option - a prime example is the URL matcher which can easily be precisely defined using a parser combinator, while at the moment it's fairly ad hoc and loses a lot of information (the path doesn't work for URLs which contain usernames and passwords, something users might want to be able to match on to forbid or warn users who're posting URLs they shouldn't):
ruleURL :: Rule
ruleURL = Rule
{ name = "url"
, pattern =
[ regex "((([a-zA-Z]+)://)?(w{2,3}[0-9]*\\.)?(([\\w_-]+\\.)+[a-z]{2,4})(:(\\d+))?(/[^?\\s#]*)?(\\?[^\\s#]+)?)"
]
, prod = \tokens -> case tokens of
(Token RegexMatch (GroupMatch (m:_:_protocol:_:domain:_:_:_port:_path:_query:_)):
_) -> Just . Token Url $ url m domain
_ -> Nothing
}
(For this specific example, the Network.URI package already provides parseURI :: String -> Maybe URI
)
I don't have an implementation for this yet (nor a preference for combinator library) because I don't fully understand how duckling all fits together, and wanted to open this to start discussion about it.
Undocumented build dependency: PCRE
While building on OS X 10.12 I received the following build error:
-- While building package regex-pcre-0.94.4 using:
/Users/drew/.stack/setup-exe-cache/x86_64-osx/Cabal-simple_mPHDZzAJ_1.24.2.0_ghc-8.0.2 --builddir=.stack-work/dist/x86_64-osx/Cabal-1.24.2.0 build --ghc-options " -ddump-hi -ddump-to-file"
Process exited with code: ExitFailure 1
Logs have been written to: /Users/drew/oss/duckling/.stack-work/logs/regex-pcre-0.94.4.log
Configuring regex-pcre-0.94.4...
Building regex-pcre-0.94.4...
Preprocessing library regex-pcre-0.94.4...
Wrap.hsc:148:10: fatal error: 'pcre.h' file not found
#include <pcre.h>
^
1 error generated.
compiling .stack-work/dist/x86_64-osx/Cabal-1.24.2.0/build/Text/Regex/PCRE/Wrap_hsc_make.c failed (exit code 1)
command was: /usr/bin/gcc -c .stack-work/dist/x86_64-osx/Cabal-1.24.2.0/build/Text/Regex/PCRE/Wrap_hsc_make.c -o .stack-work/dist/x86_64-osx/Cabal-1.24.2.0/build/Text/Regex/PCRE/Wrap_hsc_make.o -m64 -fno-stack-protector -m64 -fno-stack-protector -m64 -D__GLASGOW_HASKELL__=800 -Ddarwin_BUILD_OS=1 -Dx86_64_BUILD_ARCH=1 -Ddarwin_HOST_OS=1 -Dx86_64_HOST_ARCH=1 -DHAVE_PCRE_H -DSPLIT_BASE=1 -I.stack-work/dist/x86_64-osx/Cabal-1.24.2.0/build/autogen -include .stack-work/dist/x86_64-osx/Cabal-1.24.2.0/build/autogen/cabal_macros.h -I/Users/drew/.stack/programs/x86_64-osx/ghc-8.0.2/lib/ghc-8.0.2/bytestring-0.10.8.1/include -I/Users/drew/.stack/programs/x86_64-osx/ghc-8.0.2/lib/ghc-8.0.2/base-4.9.1.0/include -I/Users/drew/.stack/programs/x86_64-osx/ghc-8.0.2/lib/ghc-8.0.2/integer-gmp-1.0.0.1/include -I/Users/drew/.stack/programs/x86_64-osx/ghc-8.0.2/lib/ghc-8.0.2/include -I/Users/drew/.stack/programs/x86_64-osx/ghc-8.0.2/lib/ghc-8.0.2/include/
The error indicates the PCRE library is absent (or unable to be found). My resolution was to install PCRE with my preferred package manager: brew install pcre
. The solution will vary depending on OS, and in some cases PCRE may already be installed.
Feature request - Weight dimensions
Can you please add support for weight dimensions?
For example:
5 oz, 1.5 lbs, 300 mg, 6.2 kg, etc.
Thanks.
Money Parsing Error
Hi Guys,
I'm using Duckling in Python using this library. For me 1 million dollar and 0.1 million dollar are returning same response. Could someone please confirm if this is an issue with Duckling or the python library?
Python Library Details
Name: duckling
Version: 1.7.2
Home-page: https://github.com/FraBle/python-duckling
Author: Frank Blechschmidt
Requires: JPype1, six, python-dateutil
Great job on Duckling guys 👍
Feature Request: Support for series of consecutive dates
If the query contains a series of day/dates an interval maybe a more reasonable result. For example, "for next Thursday and Friday night" should return interval interval(June 29, June 30). Currently, the return value is one of the dates and a bit ambiguous. Another example is a query "18th, 19th and 20th Dec" which currently returns December 20th and "18th Dec, 19th Dec and 20th Dec" currently returns December 19th. A better result would be an interval(December 18, December 20) in both cases.
[feature] change json response
I feel like encoding json in a string inside of json makes it really hard to use tools such as jq or even parsers like jackson to read responses
$ curl -s -XPOST http://0.0.0.0:8000/parse --data "text=the first Tuesday of October" | jq '.[0].value'
"{\"values\":[{\"value\":\"2017-10-03T00:00:00.000-07:00\",\"grain\":\"day\",\"type\":\"value\"}],\"value\":\"2017-10-03T00:00:00.000-07:00\",\"grain\":\"day\",\"type\":\"value\"}"
It would be much better to just have value
be a json blob instead of a string which requires double parsing
curl -s -XPOST http://0.0.0.0:8000/parse --data "text=the first Tuesday of October" | jq -r '.[0].value' | jq .
{
"values": [
{
"value": "2017-10-03T00:00:00.000-07:00",
"grain": "day",
"type": "value"
}
],
"value": "2017-10-03T00:00:00.000-07:00",
"grain": "day",
"type": "value"
}
Support for Hijri Dates (Islamic Calendar)
I was wondering if there is any plan to support the conversion from Hijri dates to duckling default dates (Gregorian Calendar) if so when will it be available to use?
ex:
7th Safar, 1439h -> 27th October, 2017
Phone numbers parsed as time when prefixed with at
To replicate, replace xxx-yyy-zzzz
with an actual phone number.
Use cases:
please call me at xxx-yyy-zzzz
returns{"dim":"time"
please call me xxx-yyy-zzzz
returns{"dim":"phone-number"
A huge fan of the library but don't know Haskell well enough to fix on my own.
Support early/mid/late <month>
e.g.
early June -> 06/01-06/10
mid June -> 06/11-06/20
late June -> 06/21-06/30
How to provide port number
The program is using the default port number(8000), But I need to use my custom port number. Where can I specify that.
Thanks in advance.
Thanksgiving day returns incorrect date
I believe thanksgiving day 2017 is November 23rd. API currently returns November 30.
Noon parses to 12 AM
Any reason “Tuesday at Noon” gives me the time of 12:00 AM? I’m using Wit.ai integrated with Messenger
Support single word rule composition
Help out a dummie?
Hello guys,
Was wondering if you could provide a screencast/guide on how to setup duckling as a server, adding new entities, etc; allá wit.ai style?
I've recorded screencasts before and more than happy to record one for everyone's benefit as long as someone with a bette grasp of the project feels free to help me out 😄
API Documentation
Hi everyone !
I failed to find documentation of duckling's API and I am wondering where one can found some, similarly to what can be found at https://duckling.wit.ai/. Any idea ?
Thanks in advance, and have a great day :)
Locales support
Currently, Duckling has two different granularity levels for rules: language-wise, and common rules across languages. As an example, AmountOfMoney
has a couple of common rules, and more specific rules for English, French, etc.
Today, if we want to handle country-specific forms, we'd include them under the language umbrella. Although this has worked well so far, it is not ideal. We had to include Cantonese variations for Time
within the Chinese rules (#21).
An improvement for Duckling would be to add a finer granularity for rules: scoping by locale.
This is something we've had in mind for a while, though we are not able to prioritize it today. The goal here is to open the discussion and maybe come up with an actual implementation.
Wrong entity boundaries for 'AmountOfMoney' entity in French
On the sentence 'Je veux emprunter 40 000 euros.', using the 'fr' language, duckling returns the following:
[
{
"body": "40 000",
"dim": "phone-number",
"end": 23,
"start": 17,
"value": {
"value": "40000"
}
},
{
"body": "000 euros",
"dim": "amount-of-money",
"end": 29,
"start": 20,
"value": {
"type": "value",
"unit": "EUR",
"value": 0
}
}
]
In French, spaces can be used within a number, so the body must be '40 0000'. Besides, "40 000" is not a valid phone number.
"last weekend of october"
This returns: a weekend of october from last year (2016). May be more plausible to have it return: fourth weekend of october 2017 i.e. October 27- October 30.
Quickstart with lang parameter and language detection
Hi,
Thanks a lot for this amazing software !
At first, I was really disappointed by the result I got from the api (with docker), then I discovered the lang parameter... You should probably add it to the quickstart example ;)
Are you planning on adding some language detection mechanism inside the software ?
If it's a no, do you have a preferred tool for language detection ? I might be able to create a new dockerfile that expose the current api but add the lang params if not provided.
Regards,
Thomas Pocreau.
Clean up Time rules for "ZH" lang
In "ZH", the Time rules can be updated to sync up with the EN ones. For example, EN rules handles day-of-week and named-month elegantly in one rule, and we can update ZH to use the same approach.
Besides, instead of using the encoded character "\x4e0b", shall we use Chinese character like "份" in the code? It makes things easier for reading, and it works perfect on my local machine.
[Duration][RU] Durations should round to the highest grain
See discussion in #105.
Add Cantonese support to ZH
Situation
In Hong Kong, Cantonese is an official spoken language.
The wording is a bit differ from the formal written Chinese. Yet, a lot of Hong Kong people love typing text in spoken language form.
What to do
In order to support, adding those keywords Cantonese into ZH would help duckling handle Cantonese wordings.
Question
Which language code should add the handling into?
For example in Time
, (1)Duckling/Time/ZH
or (2) Duckling/Time/ZH-YUE
?
which wikipedia use zh-yue
to represent Cantonese content.
Incorrect 'to' value while parsing duration/time interval
Hi Duckling team,
I'm having some difficulty understanding Duckling's output for input that contains a duration/time interval - the end time is incorrect, and is consistently offset from the actual end time by 1 'grain of time', where 'grain' is either hour or minute.
For instance, when I run the command curl -XPOST http://0.0.0.0:8000/parse --data 'text=i am available from 6 to 8 pm today'
, I receive the following response:
{
"to":{"value":"2017-11-14T21:00:00.000-08:00","grain":"hour"},
"from":{"value":"2017-11-14T18:00:00.000-08:00","grain":"hour"},
"type":"interval"
}
For some reason, the to
is set to 1 hour (21:00
) after the end time I specify in the input (20:00
).
If I change the request to curl -XPOST http://0.0.0.0:8000/parse --data 'text=i am available from 6 to 8:30 pm today'
, the response is:
{
"to":{"value":"2017-11-14T20:31:00.000-08:00","grain":"minute"},
"from":{"value":"2017-11-14T18:00:00.000-08:00","grain":"minute"},
"type":"interval"
}
Here to
is set to 1 minute after the end time (20:31
), instead of 20:30
. Guessing this is because of the grain at which the text is interpreted.
How do I change the code so that (in the server example) I get the to
value to be set to the exact end time?
Thank you for being patient and helping out always!
Extending time dimension
Hi,
I wanted to add a new rule for the time dimension in english language.
It is for parsing dates in the format 28-July-1999
I added the following new rule and its working fine.
ruleDOMMonthYear :: Rule
ruleDOMMonthYear = Rule
{ name = "<day-of-month> (ordinal or number) <named-month> year"
, pattern =
[ Predicate isDOMValue
, Predicate isAMonth
, regex "(\\d{2,4})"
]
, prod = \tokens -> case tokens of
(token:Token Time td:Token RegexMatch (GroupMatch (match:_)):_) -> do
intVal <- parseInt match
dom <- intersectDOM td token
Token Time <$> intersect dom (year intVal)
_ -> Nothing
}
However, when I run tests, I see that Aug 8 - Aug 16
fails. Here is the debug log.
*Duckling.Debug> debug EN "Aug 8 - Aug 12" [This Time]
intersect (Aug 8 - Aug 12)
-- August (Aug)
-- -- regex (Aug)
-- <day-of-month> (ordinal or number) <named-month> year (8 - Aug 12)
-- -- integer (numeric) (8)
-- -- -- regex (8)
-- -- August (Aug)
-- -- -- regex (Aug)
-- -- regex (12)
[Entity {dim = "time", body = "Aug 8 - Aug 12", value = "{\"values\":[],\"value\":\"2012-08-08T00:00:00.000-02:00\",\"grain\":\"day\",\"type\":\"value\"}", start = 0, end = 14}]
Is there any way to fix it?
2nd of this month is incorrectly extracted
Duckling is extracting phrases like "2nd of this month" incorrectly. It extracts "2nd" as one phrase and "this month" as another phrase. It should be extracted together. It should be 2 nov 2017 (reference date: 9 nov 2017)
Update:
Few more cases which are not working on duckling.
-
last 5 days of may 2017
It is giving last 5 days from current day, not of a specific month. -
on 23 of this month / on 23 month of last month
It is extracting only this month/last month instead of 23rd of this month/last month. -
list of all movies released on two days backs
it is extracting two days as period instead of two days back date.
X k€ and X M€ are not recognized as `money` in French
In French, it is common to use X k€ (or Xk€) as X thousands euros, or X M€ as X millions euros.
Currently, duckling does not recognize those as entities of type money
.
wit/datetime, am/pm vs 24h notation collides at 12 and midnight
Hi,
When writing "12" it translates to midnight, i.e. 12:05 becomes 5 minutes past midnight. Since Im using a 24h notation (which works well as 13, 14 etc all work well) Id like 12 to translate to noon. I.e. so 12:05 becomes 5 minutes past noon. How do I go about in doing that?
Should have more logical am/pm defaults when am/pm not specified
Not sure if the following type of logic should live within duckling (but would be nice):
- User says
tomorrow at 1
and does not specifyam/pm
- Duckling currently assumes
am
I would argue that there are more cases where the user means pm
rather than am
and that should be the default for values 1 - 5
.
Thoughts?
inconsistent parsing for time with numbers
this one is just wrong
$ curl -s -XPOST http://0.0.0.0:8000/parse --data "text=the second sunday of October last year" | jq .
[
{
"dim": "time",
"body": "the second sunday of October last year",
"value": "{\"values\":[],\"value\":\"2016-10-02T00:00:00.000-07:00\",\"grain\":\"day\",\"type\":\"value\"}",
"start": 0,
"end": 38
}
]
this one gets both parts right, but why did just changing the one word change so much of the parsing?
$ curl -s -XPOST http://0.0.0.0:8000/parse --data "text=the second thursday of October last year" | jq .
[
{
"dim": "time",
"body": "the second thursday of October",
"value": "{\"values\":[{\"value\":\"2017-10-12T00:00:00.000-07:00\",\"grain\":\"day\",\"type\":\"value\"}],\"value\":\"2017-10-12T00:00:00.000-07:00\",\"grain\":\"day\",\"type\":\"value\"}",
"start": 0,
"end": 30
},
{
"dim": "time",
"body": "thursday of October last year",
"value": "{\"values\":[],\"value\":\"2016-10-06T00:00:00.000-07:00\",\"grain\":\"day\",\"type\":\"value\"}",
"start": 11,
"end": 40
}
]
Extending Duckling for Country specific changes to Rules in Standard Languages
I need to add rules for example Indian Holiday rules in time. Where can this be done ??
"may 1-3" versus "from may 1-3"
For "may 1-3", duckling returns result as expected
dd-dd (interval) (may 1-3)
-- May (may)
-- -- regex (may)
-- regex (1)
-- regex (-)
-- regex (3)
For "from may 1-3", not so much:
from - (interval) (from may 1-3)
-- regex (from)
-- (non ordinal) (may 1)
-- -- May (may)
-- -- -- regex (may)
-- -- integer (numeric) (1)
-- -- -- regex (1)
-- regex (-)
-- time-of-day (latent) (3)
-- -- integer (numeric) (3)
-- -- -- regex (3)
What's the reason to choose Monday as the 1st day of a week
I notice you intentionally pick Monday instead of Sunday to be the first day of a week. May I ask what is the reason behind this design?
Add to hackage
If people want to use Duckling as a library in their Haskell apps rather than an HTTP API, it would be helpful to register it on Hackage.
Currently this is missing: https://hackage.haskell.org/package/duckling
why i run "stack test" under the terminal, it reports "Failed to load interface for ‘Duckling.PhoneNumber.Corpus’", consequencely, i doubt that the file is missing or the project is still under the developing?
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.