GithubHelp home page GithubHelp logo

luceneutils's Introduction

LuceneUtils

Apache Lucene utilities.

Auto Commit/Optimize

Performs maintenance operations (commit & optimize) on a IndexWriter with a certain frequency or when the number of docs in memory achieves a certain number.

//Commit every 30 minutes or when there are 10k docs in memory
//Optimize every day
//Wakeup thread for changes every 5 minutes
AutoCommitOptimize optimizer = new AutoCommitOptimize(
		writer, 
		TimeUnit.MINUTES.toMillis(30), 
		TimeUnit.DAYS.toMillis(1), 
		TimeUnit.MINUTES.toMillis(5), 
		10000
);

//Start the optimizer in a Thread
new Thread(optimizer).start();

//Stop the optimizer and the thread
optimizer.stop();

DocID Collector

Lucene Collector that simply gathers all the DocIds in a OpenBitSet.

//Create collector
DocIDCollector collector = new DocIDCollector();

//Search
searcher.search(query, collector);

//Get the DocIDs returned by the search
OpenBitSet docs = collector.getDocs();

Document Iterator

Iterator over all the Documents in a DocIdSet.

//Build iterator
DocumentIterator iterator = new DocumentIterator(reader, docIdSet);

//Iterate over Documents
while(iterator.hasNext()) {
	Document doc = iterator.next();
}

ID Filter

Lucene Filter that only accepts documents with one of the given values in a certain field.

//Create filter
IDFilter filter = new IDFilter(values, "field");

//Search
searcher.search(query, filter, collector);

SubReader Filter

Lucene Filter that only accepts documents contained in a given DocIdSet.

//Create filter
SubReaderFilter filter = new SubReaderFilter(reader, docIdSet);

//Search
searcher.search(query, filter, collector);

Unique ID Writer

Checks and maintains the uniqueness of a certain field's values in an Index. Useful when adding a new document triggers other expensive actions, like updating other documents. Uses an internal real-time IndexReader and a cache of IDs to check the uniqueness of values.

//Maintain uniqueness of field "id"
//Update internal IndexReader when cache = 1k or every 30 minutes
UniqueIDWriter unique = new UniqueIDWriter(
		writer, 
		"id", 
		1000, 
		TimeUnit.MINUTES.toMillis(30)
);

//Check if index contains document with the same id. If not, insert it
if(!unique.contains(doc.get("id"))) {
	unique.insert(doc);
	updateOtherDocuments();
}

//Release internal resources
unique.close();

luceneutils's People

Stargazers

 avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.