Comments (6)
I think it's possible. In a very very early version I was actually using Painless directly in the plugin. It's not very efficient but has its place at times.
Can you give more detail on how you'd like to use it? For example, do you want to just have access to the raw vector values? It should be possible to support the same or similar interface that Elasticsearch provides to its native vectors. I'm pretty sure I actually did this at one point early on so hopefully I can dig through the git history and find it. I'm trying to think of a different usecase or different type of operation on vectors in a script, but struggling. Let me know what you think.
from elastiknn.
Yeah, let's back up and talk about what I'm trying to do. You might have better suggestions. Basically I want to combine your stuff (a score based on cosine similarity of vectors) with filters, and I want it to be efficient. Some observations:
- The simplest thing is to combine a filter and an "elastiknn_nearest_neighbors" clause in the main body of the query. For example, something like:
"bool": {
"filters": [some filters]
"must": [{"elasticknn_nearest_neighbors": ...}]
}
What I've noticed is that the performance doesn't really depend on how many documents match the filter. It looks like Elasticsearch is evaluating elasticknn_nearest_neighbors on all the documents, regardless of whether they satisfy the filter. That's an inference based on all the timings being the same.
-
That led me to think I should use a rescore query as you mention in the documentation. Then elasticknn_nearest_neighbors only gets evaluated on documents that match the filter. There are two issues: a) I think there's possibly a bug, or at least I couldn't get it to work as described elsewhere; b) you can only pass up to 10,000 documents from the main query to the rescore query (that's the max "window_size"). Sometimes our filters are broad and match more than 10,000 documents.
-
Finally, I've noticed in other queries that a Painless script in a function score query will only get evaluated for documents that match the filters. And there's no limit of 10,000 documents. So it seems like a solution is to evaluate elasticknn_nearest_neighbors from within a Painless function.
Maybe you have better suggestions.
from elastiknn.
What I've noticed is that the performance doesn't really depend on how many documents match the filter. It looks like Elasticsearch is evaluating elasticknn_nearest_neighbors on all the documents, regardless of whether they satisfy the filter. That's an inference based on all the timings being the same.
Yeah that seems like a bug internally. The "correct" thing to do here would be to make sure only the filtered vectors get evaluated.
from elastiknn.
b) you can only pass up to 10,000 documents from the main query to the rescore query (that's the max "window_size"). Sometimes our filters are broad and match more than 10,000 documents.
I guess even if we solve the issue of evaluating all vectors, it won't get around this issue. Will take that into consideration.
from elastiknn.
As I think about it more, this issue may not be a total blocker for us (the fact that only 10,000 results can be rescored). If I could get rescore queries working with elastiknn, that would be very useful.
from elastiknn.
Hi @ejackson-eb . I'm gonna close this just to tidy up a bit. Please comment and tag me if you'd like to discuss more and I'll re-open.
from elastiknn.
Related Issues (20)
- Try vectors from Project Panama for LSH operations HOT 3
- can't create a mapping HOT 1
- Try quick select algorithm for KthGreatest implementation HOT 4
- Try resampling vectors to speed up L2LshModel
- Try getting rid of HashAndFreq to minimize allocations HOT 1
- Try re-using threadlocal arrays in ArrayHitCounter HOT 2
- Try caching the query vector's FloatVector segments when computing distance HOT 2
- Get Fashion Mnist 96% recall up to 200 queries/second HOT 2
- Try using a byte array in ArrayHitCounter instead of a short array
- Try Lucene VectorUtil instead/alongside PanamaFloatVectorOps HOT 1
- Try index sorting to reduce number of shards/segments accessed HOT 2
- Kibana does not show the data of elastiknn_sparse_bool_vector HOT 1
- Q&A: Scale effects HOT 2
- Support range queries (neighbors within some distance) HOT 1
- Try using Lucene IntIntHashMap to speedup and reduce memory usage of top-K counting HOT 1
- Hope to support version 7.17.20, later 7.17.x can be downloaded HOT 1
- a problem about hybrid search HOT 3
- cannot create runtime field during seach HOT 1
- Using bitnami/elasticsearch: 8.14.1 add elastiknn I start an error HOT 1
- Support for index patterns
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from elastiknn.