tokee / lucene Goto Github PK
View Code? Open in Web Editor NEWExperiments with low memory overhead sorting and faceting. Read the Wiki for details.
License: Apache License 2.0
Experiments with low memory overhead sorting and faceting. Read the Wiki for details.
License: Apache License 2.0
Lucene README file INTRODUCTION Lucene is a Java full-text search engine. Lucene is not a complete application, but rather a code library and API that can easily be used to add search capabilities to applications. The Lucene web site is at: http://lucene.apache.org/ Please join the Lucene-User mailing list by sending a message to: [email protected] FILES lucene-core-XX.jar The compiled lucene library. lucene-demos-XX.jar The compiled simple example code. luceneweb.war The compiled simple example Web Application. contrib/* Contributed code which extends and enhances Lucene, but is not part of the core library. Of special note are the JAR files in the analyzers directory which contain various analyzers that people may find useful in place of the StandardAnalyzer. docs/index.html The contents of the Lucene website. docs/api/index.html The Javadoc Lucene API documentation. This includes the core library, the demo, as well as all of the contrib modules. src/java The Lucene source code. src/demo Some example code.
For 10 million terms, the startup time is about 20 minutes on a fast machine. The primary cause is non-sequential term lookup and cache misses. Increasing the cache to hold all terms in memory works fine, but takes just as much memory as a standard Lucene sort.
Solution:
Provided the number of chunks is lower than the cache-size, the end result is perfect cache-usage and perfect sequential access for the merge sort step. The term access for the heap-based merge at the end will depend on whether the locale sort is well aligned to unicode sort. This should work fairly well for a number of languages and probably not well at all for some languages. Experiments will show how this turns out.
The unit-test for Exposed aka LUCENE-2335 is now capable of testing sorting for indexes with deleted documents. The unit-test fails if there are deletions, stating that the order differs for Exposed vs. the default Lucene sorter. The cause has yet to be determined.
While experimenting on an in-house index with ~5M terms in the sort field, the mapping from docIDs to Term seems wrong. The culprit is of course the docID->termOrder mapper.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.