saaay71 / solr-vector-scoring Goto Github PK
View Code? Open in Web Editor NEWVector Plugin for Solr: calculate dot product / cosine similarity on documents
License: Apache License 2.0
Vector Plugin for Solr: calculate dot product / cosine similarity on documents
License: Apache License 2.0
I am facing NullPointerException
at https://github.com/saaay71/solr-vector-scoring/blob/master/src/com/github/saaay71/solr/VectorScoreQuery.java#L48 line number.
The term vector is null and when checking the size it throws NullPointerException. I tried to debug this issue but unable to find what is the cause of terms vectors being null.
It may be possible that for all the documents in solr vectors are not available and for those documents the vector field will not even be indexed and hence no term vector.
There should be a null check and if term vectors are null score should be 0.
Lucene Term Vectors are a bit heavy to use in the way this plugin does. And why encode/decode the vector ordinal numbers as terms at all? Instead I propose as follows:
Add a new special text field that has payloads enabled. No Term vectors. This field will only ever index one nominal term, say the empty-string or one letter 'X' -- it doesn't matter. Each vector ordinal 0 thru 5 or however long it is becomes a term position of this term for a document. The payload encodes the number -- a 4-byte float. The home page of this plugin shows the numbers as dense but this approach (and the term vec one) could easily be sparse. This would be somewhat slower than a custom BinaryDocValues (another implementation path) but it leverages Lucene more and is less custom, for whatever benefit that is (e.g. easier debug-ability).
Ideally a FieldType would be added which could be used to enclose the implementation details of analysis, and it could even be used to query without the addition of any other top level classes / plugins, since a FieldType works with most query parsers, including the default/standard/lucene one and you can do some neat things this way. e.g. q=vecField:"0.1,4.75,0.3,1.2,0.7,4.0"
(taken from the example)
After implementing custom scorer, search became too slow.
My index has around 4000000 docs. Querying takes around 50 seconds which is too slow.
Any solution on how to make in faster and search in ms ?
Any thoughts on how to get this running on solr8.5? CustomScoreQuery and FunctionScoreQuery both seem to be unavailable in solr8.5.
When following your installation procedure and executing your Examples, I ran into the following error with Query 1:
java.lang.UnsupportedOperationException: Query {! type=vp f=vector vector=0.1,4.75,0.3,1.2,0.7,4.0 v=} does not implement createWeight
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.