GithubHelp home page GithubHelp logo

facebook / duckling Goto Github PK

View Code? Open in Web Editor NEW
4.1K 80.0 723.0 7.33 MB

Language, engine, and tooling for expressing, testing, and evaluating composable language rules on input strings.

License: Other

Haskell 99.99% Dockerfile 0.01%

duckling's Introduction

Duckling Logo

Duckling Support Ukraine Build Status

Duckling is a Haskell library that parses text into structured data.

"the first Tuesday of October"
=> {"value":"2017-10-03T00:00:00.000-07:00","grain":"day"}

Requirements

A Haskell environment is required. We recommend using stack.

On Linux and MacOS you'll need to install PCRE development headers. On Linux, use your package manager to install them. On MacOS, the easiest way to install them is with Homebrew:

brew install pcre

If that doesn't help, try running brew doctor and fix the issues it finds.

Quickstart

To compile and run the binary:

stack build
stack exec duckling-example-exe

The first time you run it, it will download all required packages.

This runs a basic HTTP server. Example request:

curl -XPOST http://0.0.0.0:8000/parse --data 'locale=en_GB&text=tomorrow at eight'

In the example application, all dimensions are enabled by default. Provide the parameter dims to specify which ones you want. Examples:

Identify credit card numbers only:
$ curl -XPOST http://0.0.0.0:8000/parse --data 'locale=en_US&text="4111-1111-1111-1111"&dims="["credit-card-number"]"'
If you want multiple dimensions, comma-separate them in the array:
$ curl -XPOST http://0.0.0.0:8000/parse --data 'locale=en_US&text="3 cups of sugar"&dims="["quantity","numeral"]"'

See exe/ExampleMain.hs for an example on how to integrate Duckling in your project. If your backend doesn't run Haskell or if you don't want to spin your own Duckling server, you can directly use wit.ai's built-in entities.

Supported dimensions

Duckling supports many languages, but most don't support all dimensions yet (we need your help!). Please look into this directory for language-specific support.

Dimension Example input Example value output
AmountOfMoney "42€" {"value":42,"type":"value","unit":"EUR"}
CreditCardNumber "4111-1111-1111-1111" {"value":"4111111111111111","issuer":"visa"}
Distance "6 miles" {"value":6,"type":"value","unit":"mile"}
Duration "3 mins" {"value":3,"minute":3,"unit":"minute","normalized":{"value":180,"unit":"second"}}
Email "[email protected]" {"value":"[email protected]"}
Numeral "eighty eight" {"value":88,"type":"value"}
Ordinal "33rd" {"value":33,"type":"value"}
PhoneNumber "+1 (650) 123-4567" {"value":"(+1) 6501234567"}
Quantity "3 cups of sugar" {"value":3,"type":"value","product":"sugar","unit":"cup"}
Temperature "80F" {"value":80,"type":"value","unit":"fahrenheit"}
Time "today at 9am" {"values":[{"value":"2016-12-14T09:00:00.000-08:00","grain":"hour","type":"value"}],"value":"2016-12-14T09:00:00.000-08:00","grain":"hour","type":"value"}
Url "https://api.wit.ai/message?q=hi" {"value":"https://api.wit.ai/message?q=hi","domain":"api.wit.ai"}
Volume "4 gallons" {"value":4,"type":"value","unit":"gallon"}

Custom dimensions are also supported.

Extending Duckling

To regenerate the classifiers and run the test suite:

stack build :duckling-regen-exe && stack exec duckling-regen-exe && stack test

It's important to regenerate the classifiers after updating the code and before running the test suite.

To extend Duckling's support for a dimension in a given language, typically 4 files need to be updated:

  • Duckling/<Dimension>/<Lang>/Rules.hs

  • Duckling/<Dimension>/<Lang>/Corpus.hs

  • Duckling/Dimensions/<Lang>.hs (if not already present in Duckling/Dimensions/Common.hs)

  • Duckling/Rules/<Lang>.hs

To add a new language:

To add a new locale:

Rules have a name, a pattern and a production. Patterns are used to perform character-level matching (regexes on input) and concept-level matching (predicates on tokens). Productions are arbitrary functions that take a list of tokens and return a new token.

The corpus (resp. negative corpus) is a list of examples that should (resp. shouldn't) parse. The reference time for the corpus is Tuesday Feb 12, 2013 at 4:30am.

Duckling.Debug provides a few debugging tools:

$ stack repl --no-load
> :l Duckling.Debug
> debug (makeLocale EN $ Just US) "in two minutes" [Seal Time]
in|within|after <duration> (in two minutes)
-- regex (in)
-- <integer> <unit-of-duration> (two minutes)
-- -- integer (0..19) (two)
-- -- -- regex (two)
-- -- minute (grain) (minutes)
-- -- -- regex (minutes)
[Entity {dim = "time", body = "in two minutes", value = RVal Time (TimeValue (SimpleValue (InstantValue {vValue = 2013-02-12 04:32:00 -0200, vGrain = Second})) [SimpleValue (InstantValue {vValue = 2013-02-12 04:32:00 -0200, vGrain = Second})] Nothing), start = 0, end = 14}]

License

Duckling is BSD-licensed.

duckling's People

Contributors

a2tm7a avatar abdallatif avatar agiantwhale avatar alpmusti avatar anshuman23 avatar chessai avatar desmart18 avatar evjava avatar fil090302 avatar franz-fb avatar haoxuany avatar igor-drozdov avatar jfulse avatar joncoens avatar kckcng avatar kjweng avatar leandropgc avatar mauricedoepke avatar nathanhausman avatar panagosg7 avatar patapizza avatar pheaktra21 avatar potomak avatar rybalkinsd avatar serefayar avatar stroxler avatar woprzech avatar xhavokx avatar yuanbing avatar zliu41 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

duckling's Issues

Parsing estonian symbols

Well I am trying to help out with estonian language. Im new to haskell so asking for help. anyway it seems that it won't match some of my languages special characters. as it turns them into alt codes aka kümnes becomes k\252mnes etc. Solution is easy enough to encode all the matches as well, but its harder to grasp and read then. Any other ideas? letters that might be issue in estonian are üäöõšž
they seem to be working in wit.io tho so I think it might be connected to docker container.

Overriding assumptions + Custom dimensions

Our development team is interested in customizing behavior of the currently supported dimensions. My current understanding is that there is not a way to "extend" this when creating your own project and pulling in Duckling as a dependency; one would need to fork the project and make changes to the source code.

Is this understanding accurate? If not, how could a user override dimensions outside of editing the dimension source code?

CC @pcgreat

the meaning of latent time

Hi all,
What is the meaning of latent time in Duckling? Since the library filters latent time, Duckling fails to recognise "2017年5月" in Chinese (aka. 2017-05). More specifically, the rule ruleMonthNumericWithMonthSymbol at line 971 in Duckling/Times/ZH/Rule.hs makes "5月" as latent time.
To fix the bug, we can remove the mkLatent function in line 981.

However, I still confuse the latent time. Is there anyone can help me?

Thanks in advance!

Changing the default timezone in Duckling

Hello,

I'm trying to change the default timezone from UTC-7 to UTC. I'm totally new to Haskell, so could you help me understand how and where to change the code to do something like this?

Unexpected behavior : numbers parsed as time (FR, PT)

Hi,

I just found out that when I try to parse un deux trois it parses following time entities

[
    {
        "dim": "time",
        "body": "un deux",
        "value": {
            "values": [
                {
                    "value": "2017-09-18T13:02:00.000-07:00",
                    "grain": "minute",
                    "type": "value"
                },
                {
                    "value": "2017-09-19T01:02:00.000-07:00",
                    "grain": "minute",
                    "type": "value"
                },
                {
                    "value": "2017-09-19T13:02:00.000-07:00",
                    "grain": "minute",
                    "type": "value"
                }
            ],
            "value": "2017-09-18T13:02:00.000-07:00",
            "grain": "minute",
            "type": "value"
        },
        "start": 0,
        "end": 7
    },
    {
        "dim": "time",
        "body": "deux trois",
        "value": {
            "values": [
                {
                    "value": "2017-09-18T02:03:00.000-07:00",
                    "grain": "minute",
                    "type": "value"
                },
                {
                    "value": "2017-09-18T14:03:00.000-07:00",
                    "grain": "minute",
                    "type": "value"
                },
                {
                    "value": "2017-09-19T02:03:00.000-07:00",
                    "grain": "minute",
                    "type": "value"
                }
            ],
            "value": "2017-09-18T02:03:00.000-07:00",
            "grain": "minute",
            "type": "value"
        },
        "start": 3,
        "end": 13
    }
]

The result is the same even if I try to pass the dimension number in my request.

I also tried to parse one two three in english and it's working perfectly, so I must assume that it's a specific french rule which alter the expected result.

Any hints on this ?

Identifying where Duckling found the entity (in the text)

Hi,

Duckling is really awesome - thank you for building this tool!

Would it be possible to add the substring in the original text where an entity was detected? (or is this already available?)

For example, a query like I'm going to visit the US on the first Tuesday of October provides the response:

{ "values": [ { "value": "2017-10-03T00:00:00.000-07:00", "grain": "day", "type": "value" } ], "value": "2017-10-03T00:00:00.000-07:00", "grain": "day", "type": "value" }

Would it be possible to include first Tuesday in October as part of the JSON response? So the response could be:

{ "values": [ { "value": "2017-10-03T00:00:00.000-07:00", "grain": "day", "type": "value" } ], "value": "2017-10-03T00:00:00.000-07:00", "text": "first Tuesday of October", "grain": "day", "type": "value" }

where "text": "first Tuesday of October", shows which part of the string contained the Time entity.

Return latent entities

Sometimes, we might want to "force" a parse of Time from a text.
Typically, when responding to a bot that expects a Time alternative. There might be a use case for other dimensions as well, e.g. Temperature.

To cover these use cases, we'd like to ask Duckling to return latent entities.

Ordinals not recognized when preceded by "the"

The ordinal 4 is successfully extracted in the examples:

  • debug EN "4th step of the recipe" [This Time, This Ordinal]
  • debug EN "the 4th step of the recipe" [This Ordinal]

But an incorrect time entity (and no ordinal) is extracted in the example:

  • debug EN "the 4th step of the recipe" [This Ordinal, This Time]

Parse a relative time from a specific referenceTime

Hi everyone,

Would it be possible to parse a time from a specific referenceTime ?
As i understood (let me know if i'm wrong) for now referenceTime is only based on the given timezone.

What I would like to do is to give as input a specific datetime, and get a result from that specific datetime.

For example if my inputs are

input: {
  referenceTime: '2017-06-08T15:00:00+02:00',
  text: 'next monday at 8am',
}

I would like to get next monday at 8am from the 8th of june 2017, that means 2017-06-12T08:00:00+02:00

Would it be possible ?

Thanks !

How to handle Lunar holidays

Holidays like Eid al Fitr and Chinese New Year are lunar holidays, which change every year and no explicit pattern to calculate them out. So I wonder if we can explicitly list out the holiday date for the past and upcoming 10 years, but not sure how to write the rules.

incorrect year

for phrase "list all movies released from 23 may to 2 aug", duckling is giving date range as "2018-5-23 to 2018-8-2". Year is being parsed incorrectly.

"trip for 10 days starting 18th Dec"

above should perhaps return an interval - [December 18, December 28]. I added a custom rule to implement and test this out as shown below. However, the return value using the below rule adds an extra hour to end-date for some reason. Does anyone have an idea on why the extra-hour is added and potential fix?

[Entity {dim = "time", body = "for 10 days from 18th Dec", value = "{\"values\":[{\"to\":{\"value\":\"2013-12-28T01:00:00.000-02:00\",\"grain\":\"hour\"},\"from\":{\"value\":\"2013-12-18T00:00:00.000-02:00\",\"grain\":\"hour\"},\"type\":\"interval\"},{\"to\":{\"value\":\"2014-12-28T01:00:00.000-02:00\",\"grain\":\"hour\"},\"from\":{\"value\":\"2014-12-18T00:00:00.000-02:00\",\"grain\":\"hour\"},\"type\":\"interval\"},{\"to\":{\"value\":\"2015-12-28T01:00:00.000-02:00\",\"grain\":\"hour\"},\"from\":{\"value\":\"2015-12-18T00:00:00.000-02:00\",\"grain\":\"hour\"},\"type\":\"interval\"}],\"to\":{\"value\":\"2013-12-28T01:00:00.000-02:00\",\"grain\":\"hour\"},\"from\":{\"value\":\"2013-12-18T00:00:00.000-02:00\",\"grain\":\"hour\"},\"type\":\"interval\"}", start = 0, end = 25}]

Rule:

ruleIntervalForDurationFrom :: Rule
ruleIntervalForDurationFrom = Rule
  { name = "for <duration> from <time>"
  , pattern =
    [ regex "for" 
    , dimension Duration
    , regex "(from|starting|beginning|after|starting from)"
    , dimension Time
    ]
  , prod = \tokens -> case tokens of
      (_:Token Duration dd:_:Token Time td1:_) ->
        Token Time <$> interval TTime.Open td1 (durationAfter dd td1)
      _ -> Nothing
  }

Support for parser combinators

Many of the examples of regexes are reached the point where a parser combinator library would be a much better option - a prime example is the URL matcher which can easily be precisely defined using a parser combinator, while at the moment it's fairly ad hoc and loses a lot of information (the path doesn't work for URLs which contain usernames and passwords, something users might want to be able to match on to forbid or warn users who're posting URLs they shouldn't):

ruleURL :: Rule
ruleURL = Rule
  { name = "url"
  , pattern =
    [ regex "((([a-zA-Z]+)://)?(w{2,3}[0-9]*\\.)?(([\\w_-]+\\.)+[a-z]{2,4})(:(\\d+))?(/[^?\\s#]*)?(\\?[^\\s#]+)?)"
    ]
  , prod = \tokens -> case tokens of
      (Token RegexMatch (GroupMatch (m:_:_protocol:_:domain:_:_:_port:_path:_query:_)):
       _) -> Just . Token Url $ url m domain
      _ -> Nothing
  }

(For this specific example, the Network.URI package already provides parseURI :: String -> Maybe URI)

I don't have an implementation for this yet (nor a preference for combinator library) because I don't fully understand how duckling all fits together, and wanted to open this to start discussion about it.

Undocumented build dependency: PCRE

While building on OS X 10.12 I received the following build error:

--  While building package regex-pcre-0.94.4 using:
      /Users/drew/.stack/setup-exe-cache/x86_64-osx/Cabal-simple_mPHDZzAJ_1.24.2.0_ghc-8.0.2 --builddir=.stack-work/dist/x86_64-osx/Cabal-1.24.2.0 build --ghc-options " -ddump-hi -ddump-to-file"
    Process exited with code: ExitFailure 1
    Logs have been written to: /Users/drew/oss/duckling/.stack-work/logs/regex-pcre-0.94.4.log

    Configuring regex-pcre-0.94.4...
    Building regex-pcre-0.94.4...
    Preprocessing library regex-pcre-0.94.4...
    Wrap.hsc:148:10: fatal error: 'pcre.h' file not found
    #include <pcre.h>
             ^
    1 error generated.
    compiling .stack-work/dist/x86_64-osx/Cabal-1.24.2.0/build/Text/Regex/PCRE/Wrap_hsc_make.c failed (exit code 1)
    command was: /usr/bin/gcc -c .stack-work/dist/x86_64-osx/Cabal-1.24.2.0/build/Text/Regex/PCRE/Wrap_hsc_make.c -o .stack-work/dist/x86_64-osx/Cabal-1.24.2.0/build/Text/Regex/PCRE/Wrap_hsc_make.o -m64 -fno-stack-protector -m64 -fno-stack-protector -m64 -D__GLASGOW_HASKELL__=800 -Ddarwin_BUILD_OS=1 -Dx86_64_BUILD_ARCH=1 -Ddarwin_HOST_OS=1 -Dx86_64_HOST_ARCH=1 -DHAVE_PCRE_H -DSPLIT_BASE=1 -I.stack-work/dist/x86_64-osx/Cabal-1.24.2.0/build/autogen -include .stack-work/dist/x86_64-osx/Cabal-1.24.2.0/build/autogen/cabal_macros.h -I/Users/drew/.stack/programs/x86_64-osx/ghc-8.0.2/lib/ghc-8.0.2/bytestring-0.10.8.1/include -I/Users/drew/.stack/programs/x86_64-osx/ghc-8.0.2/lib/ghc-8.0.2/base-4.9.1.0/include -I/Users/drew/.stack/programs/x86_64-osx/ghc-8.0.2/lib/ghc-8.0.2/integer-gmp-1.0.0.1/include -I/Users/drew/.stack/programs/x86_64-osx/ghc-8.0.2/lib/ghc-8.0.2/include -I/Users/drew/.stack/programs/x86_64-osx/ghc-8.0.2/lib/ghc-8.0.2/include/

The error indicates the PCRE library is absent (or unable to be found). My resolution was to install PCRE with my preferred package manager: brew install pcre. The solution will vary depending on OS, and in some cases PCRE may already be installed.

Money Parsing Error

Hi Guys,

I'm using Duckling in Python using this library. For me 1 million dollar and 0.1 million dollar are returning same response. Could someone please confirm if this is an issue with Duckling or the python library?

Python Library Details

Name: duckling
Version: 1.7.2
Home-page: https://github.com/FraBle/python-duckling
Author: Frank Blechschmidt
Requires: JPype1, six, python-dateutil

Great job on Duckling guys 👍

Feature Request: Support for series of consecutive dates

If the query contains a series of day/dates an interval maybe a more reasonable result. For example, "for next Thursday and Friday night" should return interval interval(June 29, June 30). Currently, the return value is one of the dates and a bit ambiguous. Another example is a query "18th, 19th and 20th Dec" which currently returns December 20th and "18th Dec, 19th Dec and 20th Dec" currently returns December 19th. A better result would be an interval(December 18, December 20) in both cases.

[feature] change json response

I feel like encoding json in a string inside of json makes it really hard to use tools such as jq or even parsers like jackson to read responses

$ curl -s -XPOST http://0.0.0.0:8000/parse --data "text=the first Tuesday of October" | jq '.[0].value'
"{\"values\":[{\"value\":\"2017-10-03T00:00:00.000-07:00\",\"grain\":\"day\",\"type\":\"value\"}],\"value\":\"2017-10-03T00:00:00.000-07:00\",\"grain\":\"day\",\"type\":\"value\"}"

It would be much better to just have value be a json blob instead of a string which requires double parsing

curl -s -XPOST http://0.0.0.0:8000/parse --data "text=the first Tuesday of October" | jq -r '.[0].value' | jq .
{
  "values": [
    {
      "value": "2017-10-03T00:00:00.000-07:00",
      "grain": "day",
      "type": "value"
    }
  ],
  "value": "2017-10-03T00:00:00.000-07:00",
  "grain": "day",
  "type": "value"
}

Support for Hijri Dates (Islamic Calendar)

I was wondering if there is any plan to support the conversion from Hijri dates to duckling default dates (Gregorian Calendar) if so when will it be available to use?

ex:
7th Safar, 1439h -> 27th October, 2017

Phone numbers parsed as time when prefixed with at

To replicate, replace xxx-yyy-zzzz with an actual phone number.

Use cases:

  • please call me at xxx-yyy-zzzz returns {"dim":"time"
  • please call me xxx-yyy-zzzz returns {"dim":"phone-number"

A huge fan of the library but don't know Haskell well enough to fix on my own.

How to provide port number

The program is using the default port number(8000), But I need to use my custom port number. Where can I specify that.
Thanks in advance.

Noon parses to 12 AM

Any reason “Tuesday at Noon” gives me the time of 12:00 AM? I’m using Wit.ai integrated with Messenger

Support single word rule composition

Today Duckling doesn't allow to create multiple patterns to match against a single word.
As a result, we can't compose rules/dimensions and have to duplicate the regexes.

This is the case for DE (e.g. here and here), NL (e.g. here), EL (e.g. here), and RU (e.g. here).

Help out a dummie?

Hello guys,

Was wondering if you could provide a screencast/guide on how to setup duckling as a server, adding new entities, etc; allá wit.ai style?

I've recorded screencasts before and more than happy to record one for everyone's benefit as long as someone with a bette grasp of the project feels free to help me out 😄

API Documentation

Hi everyone !

I failed to find documentation of duckling's API and I am wondering where one can found some, similarly to what can be found at https://duckling.wit.ai/. Any idea ?

Thanks in advance, and have a great day :)

Locales support

Currently, Duckling has two different granularity levels for rules: language-wise, and common rules across languages. As an example, AmountOfMoney has a couple of common rules, and more specific rules for English, French, etc.

Today, if we want to handle country-specific forms, we'd include them under the language umbrella. Although this has worked well so far, it is not ideal. We had to include Cantonese variations for Time within the Chinese rules (#21).

An improvement for Duckling would be to add a finer granularity for rules: scoping by locale.
This is something we've had in mind for a while, though we are not able to prioritize it today. The goal here is to open the discussion and maybe come up with an actual implementation.

Wrong entity boundaries for 'AmountOfMoney' entity in French

On the sentence 'Je veux emprunter 40 000 euros.', using the 'fr' language, duckling returns the following:

[
    {
        "body": "40 000",
        "dim": "phone-number",
        "end": 23,
        "start": 17,
        "value": {
            "value": "40000"
        }
    },
    {
        "body": "000 euros",
        "dim": "amount-of-money",
        "end": 29,
        "start": 20,
        "value": {
            "type": "value",
            "unit": "EUR",
            "value": 0
        }
    }
]

In French, spaces can be used within a number, so the body must be '40 0000'. Besides, "40 000" is not a valid phone number.

"last weekend of october"

This returns: a weekend of october from last year (2016). May be more plausible to have it return: fourth weekend of october 2017 i.e. October 27- October 30.

Quickstart with lang parameter and language detection

Hi,
Thanks a lot for this amazing software !

At first, I was really disappointed by the result I got from the api (with docker), then I discovered the lang parameter... You should probably add it to the quickstart example ;)

Are you planning on adding some language detection mechanism inside the software ?
If it's a no, do you have a preferred tool for language detection ? I might be able to create a new dockerfile that expose the current api but add the lang params if not provided.

Regards,
Thomas Pocreau.

Clean up Time rules for "ZH" lang

In "ZH", the Time rules can be updated to sync up with the EN ones. For example, EN rules handles day-of-week and named-month elegantly in one rule, and we can update ZH to use the same approach.
Besides, instead of using the encoded character "\x4e0b", shall we use Chinese character like "份" in the code? It makes things easier for reading, and it works perfect on my local machine.

Add Cantonese support to ZH

Situation

In Hong Kong, Cantonese is an official spoken language.
The wording is a bit differ from the formal written Chinese. Yet, a lot of Hong Kong people love typing text in spoken language form.

What to do

In order to support, adding those keywords Cantonese into ZH would help duckling handle Cantonese wordings.

Question

Which language code should add the handling into?
For example in Time, (1)Duckling/Time/ZH or (2) Duckling/Time/ZH-YUE?
which wikipedia use zh-yue to represent Cantonese content.

Incorrect 'to' value while parsing duration/time interval

Hi Duckling team,

I'm having some difficulty understanding Duckling's output for input that contains a duration/time interval - the end time is incorrect, and is consistently offset from the actual end time by 1 'grain of time', where 'grain' is either hour or minute.

For instance, when I run the command curl -XPOST http://0.0.0.0:8000/parse --data 'text=i am available from 6 to 8 pm today', I receive the following response:

{
  "to":{"value":"2017-11-14T21:00:00.000-08:00","grain":"hour"},
  "from":{"value":"2017-11-14T18:00:00.000-08:00","grain":"hour"},
  "type":"interval"
}

For some reason, the to is set to 1 hour (21:00) after the end time I specify in the input (20:00).

If I change the request to curl -XPOST http://0.0.0.0:8000/parse --data 'text=i am available from 6 to 8:30 pm today', the response is:

{
  "to":{"value":"2017-11-14T20:31:00.000-08:00","grain":"minute"},
  "from":{"value":"2017-11-14T18:00:00.000-08:00","grain":"minute"},
  "type":"interval"
}

Here to is set to 1 minute after the end time (20:31), instead of 20:30. Guessing this is because of the grain at which the text is interpreted.

How do I change the code so that (in the server example) I get the to value to be set to the exact end time?

Thank you for being patient and helping out always!

Extending time dimension

Hi,

I wanted to add a new rule for the time dimension in english language.
It is for parsing dates in the format 28-July-1999
I added the following new rule and its working fine.

ruleDOMMonthYear :: Rule
ruleDOMMonthYear = Rule
  { name = "<day-of-month> (ordinal or number) <named-month> year"
  , pattern =
    [ Predicate isDOMValue
    , Predicate isAMonth
    , regex "(\\d{2,4})"
    ]
  , prod = \tokens -> case tokens of
      (token:Token Time td:Token RegexMatch (GroupMatch (match:_)):_) -> do
        intVal <- parseInt match
        dom <- intersectDOM td token
        Token Time <$> intersect dom (year intVal)
      _ -> Nothing
  }

However, when I run tests, I see that Aug 8 - Aug 16 fails. Here is the debug log.

*Duckling.Debug> debug EN "Aug 8 - Aug 12" [This Time]
intersect (Aug 8 - Aug 12)
-- August (Aug)
-- -- regex (Aug)
-- <day-of-month> (ordinal or number) <named-month> year (8 - Aug 12)
-- -- integer (numeric) (8)
-- -- -- regex (8)
-- -- August (Aug)
-- -- -- regex (Aug)
-- -- regex (12)
[Entity {dim = "time", body = "Aug 8 - Aug 12", value = "{\"values\":[],\"value\":\"2012-08-08T00:00:00.000-02:00\",\"grain\":\"day\",\"type\":\"value\"}", start = 0, end = 14}]

Is there any way to fix it?

2nd of this month is incorrectly extracted

Duckling is extracting phrases like "2nd of this month" incorrectly. It extracts "2nd" as one phrase and "this month" as another phrase. It should be extracted together. It should be 2 nov 2017 (reference date: 9 nov 2017)

Update:
Few more cases which are not working on duckling.

  1. last 5 days of may 2017
    It is giving last 5 days from current day, not of a specific month.

  2. on 23 of this month / on 23 month of last month
    It is extracting only this month/last month instead of 23rd of this month/last month.

  3. list of all movies released on two days backs
    it is extracting two days as period instead of two days back date.

wit/datetime, am/pm vs 24h notation collides at 12 and midnight

Hi,

When writing "12" it translates to midnight, i.e. 12:05 becomes 5 minutes past midnight. Since Im using a 24h notation (which works well as 13, 14 etc all work well) Id like 12 to translate to noon. I.e. so 12:05 becomes 5 minutes past noon. How do I go about in doing that?

I.e. how it is today:
screen shot 2017-07-13 at 20 26 51

Should have more logical am/pm defaults when am/pm not specified

Not sure if the following type of logic should live within duckling (but would be nice):

  • User says tomorrow at 1 and does not specify am/pm
  • Duckling currently assumes am

I would argue that there are more cases where the user means pm rather than am and that should be the default for values 1 - 5.

Thoughts?

inconsistent parsing for time with numbers

this one is just wrong

$ curl -s -XPOST http://0.0.0.0:8000/parse --data "text=the second sunday of October last year" | jq .
[
  {
    "dim": "time",
    "body": "the second sunday of October last year",
    "value": "{\"values\":[],\"value\":\"2016-10-02T00:00:00.000-07:00\",\"grain\":\"day\",\"type\":\"value\"}",
    "start": 0,
    "end": 38
  }
]

this one gets both parts right, but why did just changing the one word change so much of the parsing?

$ curl -s -XPOST http://0.0.0.0:8000/parse --data "text=the second thursday of October last year" | jq .
[
  {
    "dim": "time",
    "body": "the second thursday of October",
    "value": "{\"values\":[{\"value\":\"2017-10-12T00:00:00.000-07:00\",\"grain\":\"day\",\"type\":\"value\"}],\"value\":\"2017-10-12T00:00:00.000-07:00\",\"grain\":\"day\",\"type\":\"value\"}",
    "start": 0,
    "end": 30
  },
  {
    "dim": "time",
    "body": "thursday of October last year",
    "value": "{\"values\":[],\"value\":\"2016-10-06T00:00:00.000-07:00\",\"grain\":\"day\",\"type\":\"value\"}",
    "start": 11,
    "end": 40
  }
]

"may 1-3" versus "from may 1-3"

For "may 1-3", duckling returns result as expected
dd-dd (interval) (may 1-3)
-- May (may)
-- -- regex (may)
-- regex (1)
-- regex (-)
-- regex (3)

For "from may 1-3", not so much:

from - (interval) (from may 1-3)
-- regex (from)
-- (non ordinal) (may 1)
-- -- May (may)
-- -- -- regex (may)
-- -- integer (numeric) (1)
-- -- -- regex (1)
-- regex (-)
-- time-of-day (latent) (3)
-- -- integer (numeric) (3)
-- -- -- regex (3)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.