I think this is a feature request, but maybe it is already possible.. <p dir="auto

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

Perform nearest neighbors search from Painless script? about elastiknn HOT 6 CLOSED

alexklibisz commented on August 26, 2024

Perform nearest neighbors search from Painless script?

from elastiknn.

Comments (6)

alexklibisz commented on August 26, 2024

I think it's possible. In a very very early version I was actually using Painless directly in the plugin. It's not very efficient but has its place at times.

Can you give more detail on how you'd like to use it? For example, do you want to just have access to the raw vector values? It should be possible to support the same or similar interface that Elasticsearch provides to its native vectors. I'm pretty sure I actually did this at one point early on so hopefully I can dig through the git history and find it. I'm trying to think of a different usecase or different type of operation on vectors in a script, but struggling. Let me know what you think.

from elastiknn.

ejackson-eb commented on August 26, 2024

Yeah, let's back up and talk about what I'm trying to do. You might have better suggestions. Basically I want to combine your stuff (a score based on cosine similarity of vectors) with filters, and I want it to be efficient. Some observations:

The simplest thing is to combine a filter and an "elastiknn_nearest_neighbors" clause in the main body of the query. For example, something like:

     "bool": {
       "filters": [some filters]
      "must": [{"elasticknn_nearest_neighbors": ...}]
     }

What I've noticed is that the performance doesn't really depend on how many documents match the filter. It looks like Elasticsearch is evaluating elasticknn_nearest_neighbors on all the documents, regardless of whether they satisfy the filter. That's an inference based on all the timings being the same.

That led me to think I should use a rescore query as you mention in the documentation. Then elasticknn_nearest_neighbors only gets evaluated on documents that match the filter. There are two issues: a) I think there's possibly a bug, or at least I couldn't get it to work as described elsewhere; b) you can only pass up to 10,000 documents from the main query to the rescore query (that's the max "window_size"). Sometimes our filters are broad and match more than 10,000 documents.
Finally, I've noticed in other queries that a Painless script in a function score query will only get evaluated for documents that match the filters. And there's no limit of 10,000 documents. So it seems like a solution is to evaluate elasticknn_nearest_neighbors from within a Painless function.

Maybe you have better suggestions.

from elastiknn.

alexklibisz commented on August 26, 2024

What I've noticed is that the performance doesn't really depend on how many documents match the filter. It looks like Elasticsearch is evaluating elasticknn_nearest_neighbors on all the documents, regardless of whether they satisfy the filter. That's an inference based on all the timings being the same.

Yeah that seems like a bug internally. The "correct" thing to do here would be to make sure only the filtered vectors get evaluated.

from elastiknn.

alexklibisz commented on August 26, 2024

b) you can only pass up to 10,000 documents from the main query to the rescore query (that's the max "window_size"). Sometimes our filters are broad and match more than 10,000 documents.

I guess even if we solve the issue of evaluating all vectors, it won't get around this issue. Will take that into consideration.

from elastiknn.

ejackson-eb commented on August 26, 2024

As I think about it more, this issue may not be a total blocker for us (the fact that only 10,000 results can be rescored). If I could get rescore queries working with elastiknn, that would be very useful.

from elastiknn.

alexklibisz commented on August 26, 2024

Hi @ejackson-eb . I'm gonna close this just to tidy up a bit. Please comment and tag me if you'd like to discuss more and I'll re-open.

from elastiknn.

Perform nearest neighbors search from Painless script? about elastiknn HOT 6 CLOSED

Comments (6)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs