The asoteric from asoteric

Goal

The objective of the open part of the project is to implement the general-purpose grammar engine to make understanding natural languages easier.

Specifically, when we say understanding the natural language we mean parsing free-form utterances into structured blobs of data with precise meaning, where both the structure of utterances to be parsed and the structure of the meaning itself is defined in the grammar. Hence the name of the project, asoteric: that, when it comes to the language in which the utterance is input, we intentionally erase the boundary between “parsing” and “understanding”, focusing deliberately on the (huge!) intersection of what is commonly understood by these two words.

Our grammar engine is, in a nutshell, the way to define various parsing and transformation rules. The grammar, comprised of these rules, can consume utterances, presumably written in a natural language, and produce structured parse trees, represented, for the sake of simplicity, as JSON objects the schema of which is part of the grammar definition.

The process is focused on the high-precision low-recall aspect of natural language processing (NLP). We do not aim to understand all, or even the majority of plausible and implausible inputs in probabilistic ways; rather, our goal is to ensure that whatever inputs or parts of inputs we do understand are understood unambiguously. This approach makes it ideal to use our engine as the key part of systems the loop of which involves humans. In addition, we aim to: a) make it easy to extend the grammars, so that new kinds of inputs can be added quickly, and b) ensure the best possible user experience by providing the spelling correction and real-time input suggestion functionality as part of our core features.

From the technology standpoint the engine consists of the grammar definition language, declarative and functional by design, several reference grammars to demonstrate its capabilities, and a high-performance self-contained production-ready implementation. Along with the grammar engine and examples we also provide the basic toolkit for automated testing of the grammars, as well as for query debugging. Also worth noting that Latin is by far not the only encoding supported; asoteric can work with other languages, the phonetic alphapet included, are easily as with plain simple English words.

On the product side the major application of the engine is to make it simple to add or extend the functionality of free-form text inputs. The immediate examples include search fields, chat bots and similar UI/UX elements, and REPL- or otherwise shell-based interfaces. In these cases our engine can be put next to the already existing processing logic, and be configured to only kick in when the outer product makes the decision that its current response to a particular request is sub-par.

The Grammar Language

The major part of the secret sauce of our engine is that none of the utterance processing logic is imperative: each processing step is either declarative or functions.

Declarative processing is what, very loosely, can be described as the generalization of pattern matching. In the simplest case, the grammar definition may contain a simple clause that describe X, what is X?, or X needs elaboration mean the same thing; which, in fact, is a real-life example of one of the highest-level abstractions supported by our engine. A deeper technical example of declarative processing is parsing dates, where from 4/1 to 6/30, in Q2, or April to June, inclusive all mean the very same thing, and so does [compared to [the]] same quarter last year when asked in the appropriate context.

By functional processing we mean that instead of the imperative (or, God forbid, object-oriented) hierarchical multi-step transformation of the input utterance our grammar “just” defines various ways to look at the same “raw” representation, and, by automagically exploring the multi-dimensional space of these representations, the engine finds the angle from which the “best fit” interpretation becomes clearly visible. To explain this on an the example of simple arithmetic, five plus twice three is straight away parsed/understood unambiguously as 11 (or, rather, as a nested JSON object, the root node of which has some "evaluates_to": 11 value) at the "parsing" level, while twice five plus three or five plus three twice are better viewed as malformed sentences (some "ambiguous": true at that root level), as it's not crystal clear what exactly does "twice" refer to in them.

asoteric / asoteric Goto Github PK

asoteric's Introduction

Goal

The Grammar Language

asoteric's People

Contributors

Watchers

Forkers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs