tokee / lucene Goto Github PK

Experiments with low memory overhead sorting and faceting. Read the Wiki for details.

License: Apache License 2.0

Shell 0.01% JavaScript 0.42% Perl 0.45% Java 99.12%

lucene's Introduction

Lucene README file

INTRODUCTION

Lucene is a Java full-text search engine.  Lucene is not a complete
application, but rather a code library and API that can easily be used
to add search capabilities to applications.

The Lucene web site is at:
  http://lucene.apache.org/

Please join the Lucene-User mailing list by sending a message to:
  [email protected]

FILES

lucene-core-XX.jar
  The compiled lucene library.

lucene-demos-XX.jar
  The compiled simple example code.

luceneweb.war
  The compiled simple example Web Application.

contrib/*
  Contributed code which extends and enhances Lucene, but is not
  part of the core library.  Of special note are the JAR files in the analyzers directory which
  contain various analyzers that people may find useful in place of the StandardAnalyzer.



docs/index.html
  The contents of the Lucene website.

docs/api/index.html
  The Javadoc Lucene API documentation.  This includes the core
  library, the demo, as well as all of the contrib modules.

src/java
  The Lucene source code.

src/demo
  Some example code.

lucene's People

Contributors

Stargazers

Watchers

Forkers

yestech bcui berthamilton longhongjun naily betty2012 liyazhou xuruiyao-msft ybv zhoujiang2013 liqing86mj misbms jamalahmedmaaz lianlelucifer lyangyangyang sigh0829 wozywei bhokal hpf311 mohsinhub huangzhongyu wdz0909 alibenmessaoud caesarnap universsky quyenbc baishakhir tcoderwang913 ssontheway qmac1989 samozihu sunmeng007 kusora marge0638 grindwheel krus donsen123 cloudtechs uil-nella kangxuechao songwie z1234567890b liuchjlu gazimahmud lilianevale maolala kevin-14 xkfz007 leejuncc lfenjoy9 lijian06 xinyeah kobea132 xjtuhorse me-126 kimimj mxjl620 wizardmc comdiv nimishzynga xianlei koekj shanzejun2016 chenying1988 felixadmin rlugojr fjpqzm horrypotter kartik-sharma9 rainforc tanglihehe rateyu hu19891110 zhaxiancheng rainingwang ugiwgh babysdfs xuyingzhao fnozoszzt shanweifeng sierus ttesttuser123 dungenessbin vikki196 akhil gouchaohui

lucene's Issues

Startup time is too long

For 10 million terms, the startup time is about 20 minutes on a fast machine. The primary cause is non-sequential term lookup and cache misses. Increasing the cache to hold all terms in memory works fine, but takes just as much memory as a standard Lucene sort.

Solution:

Process the array with term ordinals in cache-sized chunks
Sort each chunk with merge sort, which results in sequential lookup of the terms as there are no cache misses
Perform a merge of all the chunks by using a heap to determine the next term ordinal

Provided the number of chunks is lower than the cache-size, the end result is perfect cache-usage and perfect sequential access for the merge sort step. The term access for the heap-based merge at the end will depend on whether the locale sort is well aligned to unicode sort. This should work fairly well for a number of languages and probably not well at all for some languages. Experiments will show how this turns out.

Indexes with deletions does not sort correctly

The unit-test for Exposed aka LUCENE-2335 is now capable of testing sorting for indexes with deleted documents. The unit-test fails if there are deletions, stating that the order differs for Exposed vs. the default Lucene sorter. The cause has yet to be determined.

DocID -> Term might be faulty

While experimenting on an in-house index with ~5M terms in the sort field, the mapping from docIDs to Term seems wrong. The culprit is of course the docID->termOrder mapper.

tokee / lucene Goto Github PK

lucene's Introduction

lucene's People

Contributors

Stargazers

Watchers

Forkers

lucene's Issues

Startup time is too long

Indexes with deletions does not sort correctly

DocID -> Term might be faulty

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs