GithubHelp home page GithubHelp logo

asoteric's Introduction

Goal

The objective of the open part of the project is to implement the general-purpose grammar engine to make understanding natural languages easier.

Specifically, when we say understanding the natural language we mean parsing free-form utterances into structured blobs of data with precise meaning, where both the structure of utterances to be parsed and the structure of the meaning itself is defined in the grammar. Hence the name of the project, asoteric: that, when it comes to the language in which the utterance is input, we intentionally erase the boundary between “parsing” and “understanding”, focusing deliberately on the (huge!) intersection of what is commonly understood by these two words.

Our grammar engine is, in a nutshell, the way to define various parsing and transformation rules. The grammar, comprised of these rules, can consume utterances, presumably written in a natural language, and produce structured parse trees, represented, for the sake of simplicity, as JSON objects the schema of which is part of the grammar definition.

The process is focused on the high-precision low-recall aspect of natural language processing (NLP). We do not aim to understand all, or even the majority of plausible and implausible inputs in probabilistic ways; rather, our goal is to ensure that whatever inputs or parts of inputs we do understand are understood unambiguously. This approach makes it ideal to use our engine as the key part of systems the loop of which involves humans. In addition, we aim to: a) make it easy to extend the grammars, so that new kinds of inputs can be added quickly, and b) ensure the best possible user experience by providing the spelling correction and real-time input suggestion functionality as part of our core features.

From the technology standpoint the engine consists of the grammar definition language, declarative and functional by design, several reference grammars to demonstrate its capabilities, and a high-performance self-contained production-ready implementation. Along with the grammar engine and examples we also provide the basic toolkit for automated testing of the grammars, as well as for query debugging. Also worth noting that Latin is by far not the only encoding supported; asoteric can work with other languages, the phonetic alphapet included, are easily as with plain simple English words.

On the product side the major application of the engine is to make it simple to add or extend the functionality of free-form text inputs. The immediate examples include search fields, chat bots and similar UI/UX elements, and REPL- or otherwise shell-based interfaces. In these cases our engine can be put next to the already existing processing logic, and be configured to only kick in when the outer product makes the decision that its current response to a particular request is sub-par.

The Grammar Language

The major part of the secret sauce of our engine is that none of the utterance processing logic is imperative: each processing step is either declarative or functions.

Declarative processing is what, very loosely, can be described as the generalization of pattern matching. In the simplest case, the grammar definition may contain a simple clause that describe X, what is X?, or X needs elaboration mean the same thing; which, in fact, is a real-life example of one of the highest-level abstractions supported by our engine. A deeper technical example of declarative processing is parsing dates, where from 4/1 to 6/30, in Q2, or April to June, inclusive all mean the very same thing, and so does [compared to [the]] same quarter last year when asked in the appropriate context.

By functional processing we mean that instead of the imperative (or, God forbid, object-oriented) hierarchical multi-step transformation of the input utterance our grammar “just” defines various ways to look at the same “raw” representation, and, by automagically exploring the multi-dimensional space of these representations, the engine finds the angle from which the “best fit” interpretation becomes clearly visible. To explain this on an the example of simple arithmetic, five plus twice three is straight away parsed/understood unambiguously as 11 (or, rather, as a nested JSON object, the root node of which has some "evaluates_to": 11 value) at the "parsing" level, while twice five plus three or five plus three twice are better viewed as malformed sentences (some "ambiguous": true at that root level), as it's not crystal clear what exactly does "twice" refer to in them.

asoteric's People

Contributors

dkorolev avatar avertel avatar

Watchers

James Cloos avatar  avatar

Forkers

dkorolev

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.