Support for currently unsupported Lucene features

liqe

Lightweight and performant Lucene-like parser, serializer and search engine.

Motivation
Usage
Query Syntax
Serializer
AST
Utilities
Compatibility with Lucene
Recipes
- Handling syntax errors
- Highlighting matches
Development
Tutorials

Motivation

Originally built Liqe to enable Roarr log filtering via cli. I have since been polishing this project as a hobby/intellectual exercise. I've seen it being adopted by various CLI and web applications that require advanced search. To my knowledge, it is currently the most complete Lucene-like syntax parser and serializer in JavaScript, as well as a compatible in-memory search engine.

Liqe use cases include:

parsing search queries
serializing parsed queries
searching JSON documents using the Liqe query language (LQL)

Note that the Liqe AST is treated as a public API, i.e., one could implement their own search mechanism that uses Liqe query language (LQL).

Usage

import {
  filter,
  highlight,
  parse,
  test,
} from 'liqe';

const persons = [
  {
    height: 180,
    name: 'John Morton',
  },
  {
    height: 175,
    name: 'David Barker',
  },
  {
    height: 170,
    name: 'Thomas Castro',
  },
];

Filter a collection:

filter(parse('height:>170'), persons);
// [
//   {
//     height: 180,
//     name: 'John Morton',
//   },
//   {
//     height: 175,
//     name: 'David Barker',
//   },
// ]

Test a single object:

test(parse('name:John'), persons[0]);
// true
test(parse('name:David'), persons[0]);
// false

Highlight matching fields and substrings:

highlight(parse('name:john'), persons[0]);
// [
//   {
//     path: 'name',
//     query: /(John)/,
//   }
// ]
highlight(parse('height:180'), persons[0]);
// [
//   {
//     path: 'height',
//   }
// ]

Query Syntax

Liqe uses Liqe Query Language (LQL), which is heavily inspired by Lucene but extends it in various ways that allow a more powerful search experience.

Liqe syntax cheat sheet

# search for "foo" term anywhere in the document (case insensitive)
foo

# search for "foo" term anywhere in the document (case sensitive)
'foo'
"foo"

# search for "foo" term in `name` field
name:foo

# search for "foo" term in `full name` field
'full name':foo
"full name":foo

# search for "foo" term in `first` field, member of `name`, i.e.
# matches {name: {first: 'foo'}}
name.first:foo

# search using regex
name:/foo/
name:/foo/o

# search using wildcard
name:foo*bar
name:foo?bar

# boolean search
member:true
member:false

# null search
member:null

# search for age =, >, >=, <, <=
height:=100
height:>100
height:>=100
height:<100
height:<=100

# search for height in range (inclusive, exclusive)
height:[100 TO 200]
height:{100 TO 200}

# boolean operators
name:foo AND height:=100
name:foo OR name:bar

# unary operators
NOT foo
-foo
NOT foo:bar
-foo:bar
name:foo AND NOT (bio:bar OR bio:baz)

# implicit AND boolean operator
name:foo height:=100

# grouping
name:foo AND (bio:bar OR bio:baz)

Keyword matching

Search for word "foo" in any field (case insensitive).

foo

Search for word "foo" in the name field.

name:foo

Search for name field values matching /foo/i regex.

name:/foo/i

Search for name field values matching f*o wildcard pattern.

name:f*o

Search for name field values matching f?o wildcard pattern.

name:f?o

Search for phrase "foo bar" in the name field (case sensitive).

name:"foo bar"

Number matching

Search for value equal to 100 in the height field.

height:=100

Search for value greater than 100 in the height field.

height:>100

Search for value greater than or equal to 100 in the height field.

height:>=100

Range matching

Search for value greater or equal to 100 and lower or equal to 200 in the height field.

height:[100 TO 200]

Search for value greater than 100 and lower than 200 in the height field.

height:{100 TO 200}

Wildcard matching

Search for any word that starts with "foo" in the name field.

name:foo*

Search for any word that starts with "foo" and ends with "bar" in the name field.

name:foo*bar

Search for any word that starts with "foo" in the name field, followed by a single arbitrary character.

name:foo?

Search for any word that starts with "foo", followed by a single arbitrary character and immediately ends with "bar" in the name field.

name:foo?bar

Boolean operators

Search for phrase "foo bar" in the name field AND the phrase "quick fox" in the bio field.

name:"foo bar" AND bio:"quick fox"

Search for either the phrase "foo bar" in the name field AND the phrase "quick fox" in the bio field, or the word "fox" in the name field.

(name:"foo bar" AND bio:"quick fox") OR name:fox

Serializer

Serializer allows to convert Liqe tokens back to the original search query.

import {
  parse,
  serialize,
} from 'liqe';

const tokens = parse('foo:bar');

// {
//   expression: {
//     location: {
//       start: 4,
//     },
//     quoted: false,
//     type: 'LiteralExpression',
//     value: 'bar',
//   },
//   field: {
//     location: {
//       start: 0,
//     },
//     name: 'foo',
//     path: ['foo'],
//     quoted: false,
//     type: 'Field',
//   },
//   location: {
//     start: 0,
//   },
//   operator: {
//     location: {
//       start: 3,
//     },
//     operator: ':',
//     type: 'ComparisonOperator',
//   },
//   type: 'Tag',
// }

serialize(tokens);
// 'foo:bar'

AST

import {
  type BooleanOperatorToken,
  type ComparisonOperatorToken,
  type EmptyExpression,
  type FieldToken,
  type ImplicitBooleanOperatorToken,
  type ImplicitFieldToken,
  type LiteralExpressionToken,
  type LogicalExpressionToken,
  type RangeExpressionToken,
  type RegexExpressionToken,
  type TagToken,
  type UnaryOperatorToken,
} from 'liqe';

There are 11 AST tokens that describe a parsed Liqe query.

If you are building a serializer, then you must implement all of them for the complete coverage of all possible query inputs. Refer to the built-in serializer for an example.

Utilities

import {
  isSafeUnquotedExpression,
} from 'liqe';

/**
 * Determines if an expression requires quotes.
 * Use this if you need to programmatically manipulate the AST
 * before using a serializer to convert the query back to text.
 */
isSafeUnquotedExpression(expression: string): boolean;

Compatibility with Lucene

The following Lucene abilities are not supported:

Recipes

Handling syntax errors

In case of a syntax error, Liqe throws SyntaxError.

import {
  parse,
  SyntaxError,
} from 'liqe';

try {
  parse('foo bar');
} catch (error) {
  if (error instanceof SyntaxError) {
    console.error({
      // Syntax error at line 1 column 5
      message: error.message,
      // 4
      offset: error.offset,
      // 1
      offset: error.line,
      // 5
      offset: error.column,
    });
  } else {
    throw error;
  }
}

Highlighting matches

Consider using highlight-words package to highlight Liqe matches.

Development

Compiling Parser

If you are going to modify parser, then use npm run watch to run compiler in watch mode.

Benchmarking Changes

Before making any changes, capture the current benchmark on your machine using npm run benchmark. Run benchmark again after making any changes. Before committing changes, ensure that performance is not negatively impacted.

Tutorials

Building advanced SQL search from a user text input

	test.skip(
	'does not include highlights from non-matching branches',
	testQuery,
	'name:foo AND NOT name:foo',
	{
	name: 'foo',
	},
	[],
	);

gajus / liqe Goto Github PK

liqe's Introduction

liqe

Motivation

Usage

Query Syntax

Liqe syntax cheat sheet

Keyword matching

Number matching

Range matching

Wildcard matching

Boolean operators

Serializer

AST

Utilities

Compatibility with Lucene

Recipes

Handling syntax errors

Highlighting matches

Development

Compiling Parser

Benchmarking Changes

Tutorials

liqe's People

Contributors

Stargazers

Watchers

Forkers

liqe's Issues

How do we handle an empty query?

How do we handle foo:?

How do we distinguish foo: bar?

How we handle () and ( )?

Problem

Example of failing searches

Demonstration

Problem

Proposed solution

Versions

Generated AST

Recommend Projects

Recommend Topics

Recommend Org

Jobs

How do we handle `foo:`?

How do we distinguish `foo: bar`?

How we handle `()` and `( )`?