GithubHelp home page GithubHelp logo

instaclustr / cassandra-lucene-index Goto Github PK

View Code? Open in Web Editor NEW

This project forked from stratio/cassandra-lucene-index

33.0 11.0 18.0 11.13 MB

Lucene based secondary indexes for Cassandra

License: Apache License 2.0

Java 85.86% Scala 14.14%
cassandra search lucene index sasi geo geolocation fuzzy 2i third-party

cassandra-lucene-index's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

cassandra-lucene-index's Issues

Compatibility with Cassandra 4.0

Sorry if this is not the right place for my question.
Will the cassandra-lucene-index support the Cassandra 4.0 version?
Hugs.

Problem to build cassandra-lucene-index 4.1.0

Hi

I've been following the instructions on how to build the plug and I'm have a Java exception error - See Below and attached

Is anyone able to tell me what I'm doing wrong as I've tried this on both Windows and the Linux server I have?


[ERROR] Failed to execute goal net.alchim31.maven:scala-maven-plugin:4.4.0:compile (scala-compile-first) on project cassandra-lucene-index-plugin: Execution scala-compile-first of goal net.alchim31.maven:scala-maven-plugin:4.4.0:compile failed: An API incompatibility was encountered while executing net.alchim31.maven:scala-maven-plugin:4.4.0:compile: java.lang.NoSuchMethodError: 'java.io.OutputStream org.fusesource.jansi.AnsiConsole.wrapOutputStream(java.io.OutputStream)'

C:\Laptop\Source\Cassandra\cassandra-lucene-index>java -version
java version "1.8.0_333"
Java(TM) SE Runtime Environment (build 1.8.0_333-b02)
Java HotSpot(TM) 64-Bit Server VM (build 25.333-b02, mixed mode)

C:\Laptop\Source\Cassandra\cassandra-lucene-index>javac -version
javac 18.0.1.1

C:\Laptop\Source\Cassandra\cassandra-lucene-index>mvn -version
Apache Maven 3.8.6 (84538c9988a25aec085021c365c560670ad80f63)
Maven home: C:\Maven\apache-maven-3.8.6
Java version: 18.0.1.1, vendor: Oracle Corporation, runtime: C:\Program Files\Java\jdk-18.0.1.1
Default locale: en_GB, platform encoding: UTF-8
OS name: "windows 10", version: "10.0", arch: "amd64", family: "windows"

Full Error File
Error.txt

TaskQueueAsync lock loss

Recently we had a problem with a 3 node Cassandra cluster running 3.11.4 where multiple threads were attempting to lock the ReentrantReadWriteLock in TaskQueueAsync (via the submitAsynchronous() method). In looking at the output from jstack, there were 63 threads attempting to acquire the lock, but no thread held it. Similarly, the lucene-indexer-1 threads (which serve as the executors behind the lock) in the same stack were all idle.

Looking at the code in question (which seems unchanged since 2016), it shouldn't be possible for it to fail to unlock unless the thread holding the lock was interrupted. I also suspect that the original author was a bit overzealous as nothing on the inside of the submitAsynchronous method isn't thread safe (except the passed variable "id"). Is there some reason that I am not seeing where we need to be so protective of the contents? Maybe just not try and lock at all in that function?

Problem to build cassandra-lucene-index 3.11.10.0


-- Check Java Version

java -version
java version "1.8.0_291"
Java(TM) SE Runtime Environment (build 1.8.0_291-b10)
Java HotSpot(TM) 64-Bit Server VM (build 25.291-b10, mixed mode)


-- Check Java Compiler Version

javac -version
javac 1.8.0_291


-- Check Maven Version

mvn -version
Apache Maven 3.8.1 (05c21c65bdfed0f71a2f2ada8b84da59348c4c5d)
Maven home: C:\src\apache-maven-3.8.1\bin..


-- Process checkout cassandra-lucene-index

git clone https://github.com/instaclustr/cassandra-lucene-index.git
cd cassandra-lucene-index
git checkout 3.11.10.0
mvn clean package


-- Error

C:\Users\user\AppData\Local\Temp\scala-maven-plugin-compiler-bridge-sources7168432815935638631\scala\ZincCompat.scala:23: error: type mismatch;
found : pf.type (with underlying type scala.ZincCompat.PlainNioFile)
required: ?{def getClass: ?}
val f = pf.getClass.getDeclaredField("nioPath") // it's not val'd in 2.12 :-/
^
two errors found
[INFO] ------------------------------------------------------------------------
[INFO] Reactor Summary for Cassandra Lucene index 3.11.10.0:
[INFO]
[INFO] Cassandra Lucene index ............................. SUCCESS [ 6.662 s]
[INFO] Cassandra Lucene Index builder ..................... SUCCESS [ 12.936 s]
[INFO] Cassandra Lucene Index plugin ...................... FAILURE [ 16.341 s]
[INFO] Cassandra Lucene Index distribution ................ SKIPPED
[INFO] ------------------------------------------------------------------------
[INFO] BUILD FAILURE
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 36.255 s
[INFO] Finished at: 2021-05-06T09:54:57-03:00
[INFO] ------------------------------------------------------------------------
[ERROR] Failed to execute goal net.alchim31.maven:scala-maven-plugin:4.5.1:compile (scala-compile-first) on project cassandra-lucene-index-plugin: Execution scala-compile-first of goal net.alchim31.maven:scala-maven-plugin:4.5.1:compile failed.: CompileFailed -> [Help 1]
[ERROR]
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR]
[ERROR] For more information about the errors and possible solutions, please read the following articles:
[ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/PluginExecutionException
[ERROR]
[ERROR] After correcting the problems, you can resume the build with the command
[ERROR] mvn -rf :cassandra-lucene-index-plugin


Lucene Plugin installation error for Cassandra 4.0.7 in Ubuntu 22.04

Hi,

While testing for Lucene plugin for my Cassandra version 4.0.7 in Ubuntu 22.04
I faced error in 3rd step i.e.

~/cassandra-lucene-index$ git checkout 4.0.7
error: pathspec '4.0.7' did not match any file(s) known to git

image

Subsequently below step failed also

mvn clean package

image

I am running wrong command,kindly correct me

Keyspace system_views related table not able to select with cassandra-lucene-index-plugin-4.1.0-1.0.0.jar

CQL Query

 select * from system_views.settings;

Error in CQL

NoHostAvailable: ('Unable to complete the operation against any hosts', {<Host: 127.0.0.1:9042 datacenter1>: <Error from server: code=0000 [Server error] message="java.lang.IllegalStateException: Cannot initialize Keyspace with virtual metadata system_views">})

Error in system.log file

ERROR [Native-Transport-Requests-1] 2023-06-01 15:55:28,514 QueryMessage.java:129 - Unexpected error during query
java.lang.IllegalStateException: Cannot initialize Keyspace with virtual metadata system_views
	at org.apache.cassandra.db.Keyspace.<init>(Keyspace.java:342)
	at org.apache.cassandra.db.Keyspace.lambda$open$0(Keyspace.java:163)
	at org.apache.cassandra.utils.concurrent.LoadingMap.blockingLoadIfAbsent(LoadingMap.java:105)
	at org.apache.cassandra.schema.Schema.maybeAddKeyspaceInstance(Schema.java:230)
	at org.apache.cassandra.db.Keyspace.open(Keyspace.java:163)
	at org.apache.cassandra.db.Keyspace.open(Keyspace.java:152)
	at com.stratio.cassandra.lucene.IndexQueryHandler.luceneExpressions(IndexQueryHandler.scala:129)
	at com.stratio.cassandra.lucene.IndexQueryHandler.processStatement(IndexQueryHandler.scala:103)
	at com.stratio.cassandra.lucene.IndexQueryHandler.process(IndexQueryHandler.scala:91)
	at org.apache.cassandra.transport.messages.QueryMessage.execute(QueryMessage.java:116)
	at org.apache.cassandra.transport.Message$Request.execute(Message.java:254)
	at org.apache.cassandra.transport.Dispatcher.processRequest(Dispatcher.java:122)
	at org.apache.cassandra.transport.Dispatcher.processRequest(Dispatcher.java:141)
	at org.apache.cassandra.transport.Dispatcher.processRequest(Dispatcher.java:168)
	at org.apache.cassandra.transport.Dispatcher.lambda$dispatch$0(Dispatcher.java:82)
	at org.apache.cassandra.concurrent.FutureTask$1.call(FutureTask.java:81)
	at org.apache.cassandra.concurrent.FutureTask.call(FutureTask.java:47)
	at org.apache.cassandra.concurrent.FutureTask.run(FutureTask.java:57)
	at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:120)
	at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
	at java.base/java.lang.Thread.run(Thread.java:829)
ERROR [Native-Transport-Requests-1] 2023-06-01 15:55:28,515 ErrorMessage.java:457 - Unexpected exception during request
java.lang.IllegalStateException: Cannot initialize Keyspace with virtual metadata system_views
	at org.apache.cassandra.db.Keyspace.<init>(Keyspace.java:342)
	at org.apache.cassandra.db.Keyspace.lambda$open$0(Keyspace.java:163)
	at org.apache.cassandra.utils.concurrent.LoadingMap.blockingLoadIfAbsent(LoadingMap.java:105)
	at org.apache.cassandra.schema.Schema.maybeAddKeyspaceInstance(Schema.java:230)
	at org.apache.cassandra.db.Keyspace.open(Keyspace.java:163)
	at org.apache.cassandra.db.Keyspace.open(Keyspace.java:152)
	at com.stratio.cassandra.lucene.IndexQueryHandler.luceneExpressions(IndexQueryHandler.scala:129)
	at com.stratio.cassandra.lucene.IndexQueryHandler.processStatement(IndexQueryHandler.scala:103)
	at com.stratio.cassandra.lucene.IndexQueryHandler.process(IndexQueryHandler.scala:91)
	at org.apache.cassandra.transport.messages.QueryMessage.execute(QueryMessage.java:116)
	at org.apache.cassandra.transport.Message$Request.execute(Message.java:254)
	at org.apache.cassandra.transport.Dispatcher.processRequest(Dispatcher.java:122)
	at org.apache.cassandra.transport.Dispatcher.processRequest(Dispatcher.java:141)
	at org.apache.cassandra.transport.Dispatcher.processRequest(Dispatcher.java:168)
	at org.apache.cassandra.transport.Dispatcher.lambda$dispatch$0(Dispatcher.java:82)
	at org.apache.cassandra.concurrent.FutureTask$1.call(FutureTask.java:81)
	at org.apache.cassandra.concurrent.FutureTask.call(FutureTask.java:47)
	at org.apache.cassandra.concurrent.FutureTask.run(FutureTask.java:57)
	at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:120)
	at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
	at java.base/java.lang.Thread.run(Thread.java:829)

Get count of matching results

I just started investigating this project after having some issues with ElasticSearch.
After doing some prototype work I have to say this project is nothing short of amazing.
It incorporates directly into Cassandra and makes so many hard things simple.

I seriously hope I don't find any showstoppers because I really want this project to work.

In some of my search locations I like to show the matches along with the total possible data items.
However I am not sure how to retrieve the total count of document matches along with the matching data.
It appears I could do 1 query for the data and another for the counts. However this seems like a double use of resources.

Select * from tweets;
Select count(*) from tweets;

Is there a way to do it without the additional count query?

Performance concerns

I have a 3 node cluster I am using for testing.
CPU: Intel(R) Xeon(R) CPU X5650 @ 2.67GHz
Mem: 94GB
Disk: SSD

Currently doing about 1000 updates a second seems to bring all the nodes to 100% CPU and start timing out operations.
Is this normal?

What should I check or update?

How to add a new column to the Lucene index?

Hi experts,
It's just a question. Might be a silly one.
I am not sure how to update a Lucene index. It seems I need to drop the index and create it again when I need to add a new column to the index.

For example, if I have index define as following:
CREATE CUSTOM INDEX tweets_index ON tweets ()
USING 'com.stratio.cassandra.lucene.Index'
WITH OPTIONS = {
'refresh_seconds': '1',
'schema': '{
fields: {
id: {type: "integer"},
user: {type: "string"},

  }

}'
};

And now I have another column to add to the index as following:
body: {type: "text", analyzer: "english"}

Should I drop the old index and create a new one as folowing:
CREATE CUSTOM INDEX tweets_index ON tweets ()
USING 'com.stratio.cassandra.lucene.Index'
WITH OPTIONS = {
'refresh_seconds': '1',
'schema': '{
fields: {
id: {type: "integer"},
user: {type: "string"},
body: {type: "text", analyzer: "english"},
}
}'
};

Thanks!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.