GithubHelp home page GithubHelp logo

Comments (6)

alexklibisz avatar alexklibisz commented on August 26, 2024

I think it's possible. In a very very early version I was actually using Painless directly in the plugin. It's not very efficient but has its place at times.

Can you give more detail on how you'd like to use it? For example, do you want to just have access to the raw vector values? It should be possible to support the same or similar interface that Elasticsearch provides to its native vectors. I'm pretty sure I actually did this at one point early on so hopefully I can dig through the git history and find it. I'm trying to think of a different usecase or different type of operation on vectors in a script, but struggling. Let me know what you think.

from elastiknn.

ejackson-eb avatar ejackson-eb commented on August 26, 2024

Yeah, let's back up and talk about what I'm trying to do. You might have better suggestions. Basically I want to combine your stuff (a score based on cosine similarity of vectors) with filters, and I want it to be efficient. Some observations:

  1. The simplest thing is to combine a filter and an "elastiknn_nearest_neighbors" clause in the main body of the query. For example, something like:
     "bool": {
       "filters": [some filters]
      "must": [{"elasticknn_nearest_neighbors": ...}]
     }

What I've noticed is that the performance doesn't really depend on how many documents match the filter. It looks like Elasticsearch is evaluating elasticknn_nearest_neighbors on all the documents, regardless of whether they satisfy the filter. That's an inference based on all the timings being the same.

  1. That led me to think I should use a rescore query as you mention in the documentation. Then elasticknn_nearest_neighbors only gets evaluated on documents that match the filter. There are two issues: a) I think there's possibly a bug, or at least I couldn't get it to work as described elsewhere; b) you can only pass up to 10,000 documents from the main query to the rescore query (that's the max "window_size"). Sometimes our filters are broad and match more than 10,000 documents.

  2. Finally, I've noticed in other queries that a Painless script in a function score query will only get evaluated for documents that match the filters. And there's no limit of 10,000 documents. So it seems like a solution is to evaluate elasticknn_nearest_neighbors from within a Painless function.

Maybe you have better suggestions.

from elastiknn.

alexklibisz avatar alexklibisz commented on August 26, 2024

What I've noticed is that the performance doesn't really depend on how many documents match the filter. It looks like Elasticsearch is evaluating elasticknn_nearest_neighbors on all the documents, regardless of whether they satisfy the filter. That's an inference based on all the timings being the same.

Yeah that seems like a bug internally. The "correct" thing to do here would be to make sure only the filtered vectors get evaluated.

from elastiknn.

alexklibisz avatar alexklibisz commented on August 26, 2024

b) you can only pass up to 10,000 documents from the main query to the rescore query (that's the max "window_size"). Sometimes our filters are broad and match more than 10,000 documents.

I guess even if we solve the issue of evaluating all vectors, it won't get around this issue. Will take that into consideration.

from elastiknn.

ejackson-eb avatar ejackson-eb commented on August 26, 2024

As I think about it more, this issue may not be a total blocker for us (the fact that only 10,000 results can be rescored). If I could get rescore queries working with elastiknn, that would be very useful.

from elastiknn.

alexklibisz avatar alexklibisz commented on August 26, 2024

Hi @ejackson-eb . I'm gonna close this just to tidy up a bit. Please comment and tag me if you'd like to discuss more and I'll re-open.

from elastiknn.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.