GithubHelp home page GithubHelp logo

lior-k / fast-elasticsearch-vector-scoring Goto Github PK

View Code? Open in Web Editor NEW
393.0 14.0 112.0 89 KB

Score documents using embedding-vectors dot-product or cosine-similarity with ES Lucene engine

License: Apache License 2.0

Java 100.00%
elasticsearch vector cosine-similarity dot-product lucene embedding-vectors

fast-elasticsearch-vector-scoring's Introduction

Fast Elasticsearch Vector Scoring

This Plugin allows you to score Elasticsearch documents based on embedding-vectors, using dot-product or cosine-similarity.

General

  • This plugin was inspired from This elasticsearch vector scoring plugin and this discussion to achieve 10 times faster processing over the original. give it a try.
  • I gained this substantial speed improvement by using the lucene index directly
  • I developed it for my workplace which needs to pick KNN from a set of ~4M vectors. our current ES setup is able to answer this in ~80ms
  • Note: Elasticsearch introduced a similar vector similarity functions in version 7.4 and above. [Elasticsearch version 8.0] (https://www.elastic.co/blog/introducing-approximate-nearest-neighbor-search-in-elasticsearch-8-0) includes native ANN support. This makes this plug-in obsolete for new Elasticsearch versions, unless for some reason their implementation is slower than this plugin.

Elasticsearch version

  • master branch is designed for Elasticsearch 5.6.9.
  • for Elasticsearch 7.9.0 use branch es-7.9.0
  • for Elasticsearch 7.5.2 use branch es-7.5.2
  • for Elasticsearch 7.5.0 use branch es-7.5.0
  • for Elasticsearch 7.2.1 use branch es-7.2.1
  • for Elasticsearch 7.1.0 use branch es-7.1
  • for Elasticsearch 6.8.1 use branch es-6.8.1
  • for Elasticsearch 5.2.2 use branch es-5.2.2
  • for Elasticsearch 2.4.4 use branch es-2.4.4

Maven configuration

  • Clone the project
  • mvn package to compile the plugin as a zip file
  • In the Elasticsearch root folder run ./bin/elasticsearch-plugin install file://<PATH_TO_ZIP> to install plugin. for example: ./bin/elasticsearch-plugin install file:///Users/lior/dev/fast-elasticsearch-vector-scoring/target/releases/elasticsearch-binary-vector-scoring-5.6.9.zip

Usage

Documents

  • Each document you score should have a field containing the base64 representation of your vector. for example:
   {
   	"id": 1,
   	....
   	"embedding_vector": "v7l48eAAAAA/s4VHwAAAAD+R7I5AAAAAv8MBMAAAAAA/yEI3AAAAAL/IWkeAAAAAv7s480AAAAC/v6DUgAAAAL+wJi0gAAAAP76VqUAAAAC/sL1ZYAAAAL/dyq/gAAAAP62FVcAAAAC/tQRvYAAAAL+j6ycAAAAAP6v1KcAAAAC/bN5hQAAAAL+u9ItAAAAAP4ckTsAAAAC/pmkjYAAAAD+cYpwAAAAAP5renEAAAAC/qY0HQAAAAD+wyYGgAAAAP5WrCcAAAAA/qzjTQAAAAD++LBzAAAAAP49wNKAAAAC/vu/aIAAAAD+hqXfAAAAAP4FfNCAAAAA/pjC64AAAAL+qwT2gAAAAv6S3OGAAAAC/gfMtgAAAAD/If5ZAAAAAP5mcXOAAAAC/xYAU4AAAAL+2nlfAAAAAP7sCXOAAAAA/petBIAAAAD9soYnAAAAAv5R7X+AAAAC/pgM/IAAAAL+ojI/gAAAAP2gPz2AAAAA/3FonoAAAAL/IHg1AAAAAv6p1SmAAAAA/tvKlQAAAAD/I2OMAAAAAP3FBiCAAAAA/wEd8IAAAAL94wI9AAAAAP2Y1IIAAAAA/rnS4wAAAAL9vriVgAAAAv1QxoCAAAAC/1/qu4AAAAL+inZFAAAAAv7aGA+AAAAA/lqYVYAAAAD+kNP0AAAAAP730BiAAAAA="
   }
  • Use this field mapping:
        "embedding_vector": {
        "type": "binary",
        "doc_values": true
	}
  • The vector can be of any dimension

Converting a vector to Base64

to convert an array of float32 to a base64 string we use these example methods:

Java

public static float[] convertBase64ToArray(String base64Str) {
    final byte[] decode = Base64.getDecoder().decode(base64Str.getBytes());
    final FloatBuffer floatBuffer = ByteBuffer.wrap(decode).asFloatBuffer();
    final float[] dims = new float[floatBuffer.capacity()];
    floatBuffer.get(dims);

    return dims;
}

public static String convertArrayToBase64(float[] array) {
    final int capacity = Float.BYTES * array.length;
    final ByteBuffer bb = ByteBuffer.allocate(capacity);
    for (float v : array) {
        bb.putFloat(v);
    }
    bb.rewind();
    final ByteBuffer encodedBB = Base64.getEncoder().encode(bb);

    return new String(encodedBB.array());
}

Python

import base64
import numpy as np

dfloat32 = np.dtype('>f4')

def decode_float_list(base64_string):
    bytes = base64.b64decode(base64_string)
    return np.frombuffer(bytes, dtype=dfloat32).tolist()

def encode_array(arr):
    base64_str = base64.b64encode(np.array(arr).astype(dfloat32)).decode("utf-8")
    return base64_str

Ruby

require 'base64'

def decode_float_list(base64_string)
  Base64.strict_decode64(base64_string).unpack('g*')
end

def encode_array(arr)
  Base64.strict_encode64(arr.pack('g*'))
end

Go

import(
    "math"
    "encoding/binary"
    "encoding/base64"
)

func convertArrayToBase64(array []float32) string {
	bytes := make([]byte, 0, 4*len(array))
	for _, a := range array {
		bits := math.Float32bits(a)
		b := make([]byte, 4)
		binary.BigEndian.PutUint32(b, bits)
		bytes = append(bytes, b...)
	}

	encoded := base64.StdEncoding.EncodeToString(bytes)
	return encoded
}

func convertBase64ToArray(base64Str string) ([]float32, error) {
	decoded, err := base64.StdEncoding.DecodeString(base64Str)
	if err != nil {
		return nil, err
	}

	length := len(decoded)
	array := make([]float32, 0, length/4)

	for i := 0; i < len(decoded); i += 4 {
		bits := binary.BigEndian.Uint32(decoded[i : i+4])
		f := math.Float32frombits(bits)
		array = append(array, f)
	}
	return array, nil
}

Querying

  • For querying the 100 KNN documents use this POST message on your ES index:

    For ES 5.X and ES 7.X:

{
  "query": {
    "function_score": {
      "boost_mode": "replace",
      "script_score": {
        "script": {
	      "source": "binary_vector_score",
          "lang": "knn",
          "params": {
            "cosine": false,
            "field": "embedding_vector",
            "vector": [
               -0.09217305481433868, 0.010635560378432274, -0.02878434956073761, 0.06988169997930527, 0.1273992955684662, -0.023723633959889412, 0.05490724742412567, -0.12124507874250412, -0.023694118484854698, 0.014595639891922474, 0.1471538096666336, 0.044936809688806534, -0.02795785665512085, -0.05665992572903633, -0.2441125512123108, 0.2755320072174072, 0.11451690644025803, 0.20242854952812195, -0.1387604922056198, 0.05219579488039017, 0.1145530641078949, 0.09967200458049774, 0.2161576747894287, 0.06157230958342552, 0.10350126028060913, 0.20387393236160278, 0.1367097795009613, 0.02070528082549572, 0.19238869845867157, 0.059613026678562164, 0.014012521132826805, 0.16701748967170715, 0.04985826835036278, -0.10990987718105316, -0.12032567709684372, -0.1450948715209961, 0.13585780560970306, 0.037511035799980164, 0.04251480475068092, 0.10693439096212387, -0.08861573040485382, -0.07457160204648972, 0.0549330934882164, 0.19136285781860352, 0.03346432000398636, -0.03652812913060188, -0.1902569830417633, 0.03250952064990997, -0.3061246871948242, 0.05219300463795662, -0.07879918068647385, 0.1403723508119583, -0.08893408626317978, -0.24330253899097443, -0.07105310261249542, -0.18161986768245697, 0.15501035749912262, -0.216160386800766, -0.06377710402011871, -0.07671763002872467, 0.05360138416290283, -0.052845533937215805, -0.02905619889497757, 0.08279753476381302
             ]
          }
        }
      }
    }
  },
  "size": 100
}
For ES 2.X:
{
  "query": {
    "function_score": {
      "boost_mode": "replace",
      "script_score": {
        "lang": "knn",
        "params": {
          "cosine": false,
          "field": "embedding_vector",
          "vector": [
               -0.09217305481433868, 0.010635560378432274, -0.02878434956073761, 0.06988169997930527, 0.1273992955684662, -0.023723633959889412, 0.05490724742412567, -0.12124507874250412, -0.023694118484854698, 0.014595639891922474, 0.1471538096666336, 0.044936809688806534, -0.02795785665512085, -0.05665992572903633, -0.2441125512123108, 0.2755320072174072, 0.11451690644025803, 0.20242854952812195, -0.1387604922056198, 0.05219579488039017, 0.1145530641078949, 0.09967200458049774, 0.2161576747894287, 0.06157230958342552, 0.10350126028060913, 0.20387393236160278, 0.1367097795009613, 0.02070528082549572, 0.19238869845867157, 0.059613026678562164, 0.014012521132826805, 0.16701748967170715, 0.04985826835036278, -0.10990987718105316, -0.12032567709684372, -0.1450948715209961, 0.13585780560970306, 0.037511035799980164, 0.04251480475068092, 0.10693439096212387, -0.08861573040485382, -0.07457160204648972, 0.0549330934882164, 0.19136285781860352, 0.03346432000398636, -0.03652812913060188, -0.1902569830417633, 0.03250952064990997, -0.3061246871948242, 0.05219300463795662, -0.07879918068647385, 0.1403723508119583, -0.08893408626317978, -0.24330253899097443, -0.07105310261249542, -0.18161986768245697, 0.15501035749912262, -0.216160386800766, -0.06377710402011871, -0.07671763002872467, 0.05360138416290283, -0.052845533937215805, -0.02905619889497757, 0.08279753476381302
             ]
        },
        "script": "binary_vector_score"
      }
    }
  },
  "size": 100
}
  • The example above shows a vector of 64 dimensions

  • Parameters:

    1. field: The field containing the base64 vector.
    2. cosine: Boolean. if true - use cosine-similarity, else use dot-product.
    3. vector: The vector (comma separated) to compare to.
  • Note for ElasticSearch 6 and 7 only: Because scores produced by the script_score function must be non-negative on elasticsearch 7, We convert the dot product score and cosine similarity score by using these simple equations: (changed dot product) = e^(original dot product) (changed cosine similarity) = ((original cosine similarity) + 1) / 2

    We can use these simple equation to convert them to original score. (original dot product) = ln(changed dot product) (original cosine similarity) = (changed cosine similarity) * 2 - 1

  • Question: I've encountered the error java.lang.IllegalStateException: binaryEmbeddingReader can't be null while running the query. what should I do?

    Answer: this error happens when the plugin fails to access the field you specified in the field parameter in at least one of the documents.

    To solve it:

    • make sure that all the documents in your index contains the filed you specified in the field parameter. see more details here
    • make sure that the filed you specified in the field parameter has a binary type in the index mapping

fast-elasticsearch-vector-scoring's People

Contributors

cakirmuha avatar dependabot[bot] avatar lior-k avatar ran22 avatar zewelor avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

fast-elasticsearch-vector-scoring's Issues

plugin not installing

Hi I am using E.S 5.6.9 and want to use this plugin. Facing this error while installing

Downloading file:/home/vljklkj/Desktop/fast-elasticsearch-vector-scoring/target/releases/elasticsearch-binary-vector-scoring-5.6.9.zip
[=================================================] 100%  
ERROR: elasticsearch directory is missing in the plugin zip

I cloned the project, changed ES version to 5.6.9 in POM files and did mvn clean. Got a zip file and ran this command to install the plugin.
"sudo bin/elasticsearch-plugin install file:/home/vljklkj/Desktop/fast-elasticsearch-vector-scoring/target/releases/elasticsearch-binary-vector-scoring-5.6.9.zip"

please help me solve this issue as early as possible.... thanks in advance.

Algorithm behind this plugin

I am curious about the algorithm behind this plugin in order to achieve fast speed. Does it implement any sort of approximate-nearest neighbour algorithms, OR it's actually compute scores over the entire corpus using brutal force and speeding up via low-level code implementation? Thanks!

How to achieve ANDing on nested Vectors?

I will start by saying thank you for the amazing work!

Is there a way to achieve multi-vector ANDing when the vectors are saved as nested?

Having the following mapping:

"Faces": {
"type": "nested",
"properties": {
"Features": {
"type": "binary",
"doc_values": true
}}}

we use the score_mode: max attribute to get the documents containing the KNN vectors

When looking for multiple vectors to match any of the given vectors (ORing) we run the function_score for each vector separately like this:

{
  "query": {
    "bool": {
      "must": [
        {
          "nested": {
            "query": {
              "function_score": {
                "boost_mode": "replace",
                "script_score": {
                  "script": {
                    "source": "binary_vector_score",
                    "lang": "knn",
                    "params": {
                      "cosine": true,
                      "field": "Faces.Features",
                      "vector": [
                        -0.5,
                        10.0,
                        10.0
                      ]
                    }
                  }
                }
              }
            },
            "path": "Faces",
            "score_mode": "max"
          }
        },
        {
          "nested": {
            "query": {
              "function_score": {
                "boost_mode": "replace",
                "script_score": {
                  "script": {
                    "source": "binary_vector_score",
                    "lang": "knn",
                    "params": {
                      "cosine": true,
                      "field": "Faces.Features",
                      "vector": [
                        0.5,
                        10.0,
                        6.0
                      ]
                    }
                  }
                }
              }
            },
            "path": "Faces",
            "score_mode": "max"
          }
        }
      ]
    }
  },
  "size": 10
}

Is there a way to add a wrapper around this saying I want my documents to be scored by the score of the first function (+) the score from the second one so we can achieve scoring by more than one vector?

Error: binaryEmbeddingReader can't be null

I'm using Elasticsearch as docker container with the binary-vector-scoring plugin installed, but I'm getting an intermittent error when doing search with the following query:

{
  "function_score": {
    "boost": 1,
    "score_mode": "avg",
    "boost_mode": "multiply",
    "min_score": 0,
    "script_score": {
      "script": {
        "source": "binary_vector_score",
        "lang": "knn",
        "params": {
          "cosine": true,
          "field": "image_embedding",
          "vector": "MY_VECTOR_HERE"
        }
      }
    }
  }
}

The search runs ok for a while (first dozen of requests) and then it starts returning the following error:

Caused by: java.lang.IllegalStateException: binaryEmbeddingReader can't be null
elasticsearch    | 	at com.liorkn.elasticsearch.script.VectorScoreScript.setBinaryEmbeddingReader(VectorScoreScript.java:67) ~[?:?]
elasticsearch    | 	at com.liorkn.elasticsearch.service.VectorScoringScriptEngineService$1.getLeafSearchScript(VectorScoringScriptEngineService.java:65) ~[?:?]
elasticsearch    | 	at org.elasticsearch.common.lucene.search.function.ScriptScoreFunction.getLeafScoreFunction(ScriptScoreFunction.java:79) ~[elasticsearch-5.6.0.jar:5.6.0]
elasticsearch    | 	at org.elasticsearch.common.lucene.search.function.FunctionScoreQuery$CustomBoostFactorWeight.functionScorer(FunctionScoreQuery.java:140) ~[elasticsearch-5.6.0.jar:5.6.0]
...

Reindexing all documents is the only way to make the search work again, has anybody faced the same problem?

jar file

Hi,
Could you please upload the built file (jar)?

How to filter questions based on tf-idf before applying cosine similarity.

How to filter questions based on tf-idf before applying cosine similarity.

In my elasticsearch each document has these fields. I have about 200000 such documents.

question: What is the capital of China?
answer : Beijing
embedding vector: []
client : 123  #(a unique id)

The issue i'm facing is that for eg for the above question, "What is the capital of China?", I get the averaged wordvectors using googles pretrained model(from word2vec). When I query "What is the capital of India?",it matches with the question "What is the capital of China?" with a similarity score of 1 and returns a wrong response. To solve this issue im trying to retrieve question by TF-Idf and then apply cosine similarity on that filtered query.

My mapping is as follows

PUT /chat-history-old
{
  "settings": {
    "analysis": {
      "filter": {
        "stop": {
          "type":"stop",
          "stopwords": [ "a","an","the","in","on"]
        },
        "synonym" : {
          "type" : "synonym",
          "synonyms" : [
"pull_off, manage, cope, oversee, wangle, do, wield",
"replace, supplant, substitute"]
        }
       },
      "analyzer": {
        "synonym" : {
          "tokenizer" : "whitespace",
            "filter": [
            "lowercase",
            "synonym",
            "stop"
]
          }
      }
    }
  },
  "mappings": {
    "doc": {
      "properties": {
        "question": {"type": "text",
                     "analyzer": "standard", 
                     "search_analyzer": "synonym"},
        "embedding_vector": {"type": "binary",
                             "doc_values": true},
        "source_url":{ "type": "keyword" }
      }
    }
  }
}

And my Query is as follows: I tried various versions of the below query but everything would give me a score of 1.00, can you help?

PUT /chat-history-old
{
  "mappings": {
    "doc": { 
      "properties": {
        "question": {"type": "nested"}, 
        "embedding_vector": {"type": "binary",
                         "doc_values": true},
        "source_url":    { "type": "keyword" }
      }
    }
  }
}

Document retrieval is slow

The plugin works great for similarity on small data, recently, for my use case i indexed about 1,50,000 to my elasticsearch and tried performing searches. A small change was, instead of using averaged word2vec of google , I used infersent of facebook(a doc2vec model) to get vectors for my sentences. The search time takes between 7 to 9 sec to retrieve answers. The mapping I use is same as that of in readme. My search query in python is as follows

search = self.es.search(index=Config.faq_index, body={
                "query": {"function_score": {"query": {"bool": {"filter": {"term": {"account_id": account_id}}}},
                                             "boost_mode": "replace",
                                             "script_score": {
                                                 "script": {
                                                     "inline": "binary_vector_score",
                                                     "lang": "knn",
                                                     "params": {
                                                         "cosine": True,
                                                         "field": "embedding_vector",
                                                         "vector": vector_array
                                                     }}}}}, "size": 10})

Can you suggest an approach to increase speed to be as good as your, you were able to search through 40 million documents in 0.8 sec?

plugin install error

dear author,
when i install this plugin to elasticsearch , i use comand on linux:
/bin/elasticsearch-plugin file:///data//rrjia/project/elasticsearch-binary-vector-scoring-7.5.0.zip
but i meet error:
ERROR: Unknown command [file:///data//rrjia/project/elasticsearch-binary-vector-scoring-7.5.0.zip]
So I came to ask for your help.

elasticsearch 7.5.0

thanks.

Plugin is not installed on elasticsearch 7.5.2

Hello, we are currently using elasticsearch version 7.5.2, I have compiled a zip file from the 7.5.0 branch.
When trying to install this plugin, I get the error:

Exception in thread "main" java.lang.IllegalArgumentException: Plugin [elasticsearch-binary-vector-scoring] was built for Elasticsearch version 7.5.0 but version 7.5.2 is running
at org.elasticsearch.plugins.PluginsService.verifyCompatibility(PluginsService.java:346)
at org.elasticsearch.plugins.InstallPluginCommand.loadPluginInfo(InstallPluginCommand.java:728)
at org.elasticsearch.plugins.InstallPluginCommand.installPlugin(InstallPluginCommand.java:803)
at org.elasticsearch.plugins.InstallPluginCommand.install(InstallPluginCommand.java:786)
at org.elasticsearch.plugins.InstallPluginCommand.execute(InstallPluginCommand.java:232)
at org.elasticsearch.plugins.InstallPluginCommand.execute(InstallPluginCommand.java:217)
at org.elasticsearch.cli.EnvironmentAwareCommand.execute(EnvironmentAwareCommand.java:86)
at org.elasticsearch.cli.Command.mainWithoutErrorHandling(Command.java:125)
at org.elasticsearch.cli.MultiCommand.execute(MultiCommand.java:77)
at org.elasticsearch.cli.Command.mainWithoutErrorHandling(Command.java:125)
at org.elasticsearch.cli.Command.main(Command.java:90)
at org.elasticsearch.plugins.PluginCli.main(PluginCli.java:47)

Please tell me how to install this plugin on elasticsearch version 7.5.2?

query size is ignored

Thank you for the great plugin! I'm seeing some unexpected behavior where no matter what I set the query size to, the entire size of the index is returned in the hits. Using the python, my query is like so:

def img_q(vector):
    query = { 
                  "query": {
                    "function_score": 
                      {"boost_mode": "replace",
                      "script_score": {
                        "script": {
                          "source": "elasticsearch-binary-vector-scoring",
                          "lang": "knn",
                         "params": {
                            "cosine": True,
                            "field": "feature",
                            "vector":  vector
                    
                          }
                        }
                      }
                    }
                  },
                  "from": 0,
                  "size": 100
                }
    return query

res = es.search(index='maars', body=img_q(myvector))

And then res['hits']['total'] is always the number of vectors in the index. Any idea why that could be happening? Thank you!

Elastic 6.1.1

Hello!
How can I build it for Elastic version 6.1.1?

Performance Benchmarking ?

Hello, I am quite impressed by your 80ms latency for 64 dimensional floats and ~4 million items. What does your infrastructure look like? Does this include parellization using sharding? What hardware type are you using? Is 80ms on a single machine?

I have a similarly sized corpus, 5 million documents, 50 dimensional floats. I wrote a KNN function using a script in Elasticsearch’s Painless language, and am getting about 13 seconds to score by nearest neighbors on a single AWS i3.4xl EC2 instance.

I am curious if using a plugin rather than Painless will give me significantly better performance... but was curious how you ended up with good numbers, before I invest in the plugin approach.

Unable to query in Elastic 6.8.1

Hi @lior-k

i have set up an elastic 6.8.1 docker image and installed the es-6.8.1 branch.

docker run -p 9200:9200 -p 9300:9300 -v ${elastic_plugin_folder}/target/releases:/tmp/cosine_sim -e discovery.type=single-node" docker.elastic.co/elasticsearch/elasticsearch:6.8.1

and installed it

docker exec -it trusting_banach /bin/bash

elasticsearch-plugin install file:///tmp/cosine_sim/elasticsearch-binary-vector-scoring-6.8.1.zip 

i can see that it is installed

[root@8a2db7afc779 elasticsearch]# bin/elasticsearch-plugin list
elasticsearch-binary-vector-scoring

When i try to query it i'm getting the following error

query = {
  "query": {
    "function_score": {
      "boost_mode": "replace",
      "script_score": {
        "script": {
        "source": "binary_vector_score",
          "lang": "knn",
          "params": {
            "cosine": False,
            "field": "title_vec_b64",
            "vector": [-0.11091560125350952,
 -0.012368256226181984,
 -0.05769440531730652,
 0.10475035011768341,
 0.08878926932811737,
 -0.07440587133169174,
 0.08918392658233643,
 -0.016563668847084045,
 0.05303652584552765,
 0.010436886921525002,
 -0.06147288531064987,
 0.07994012534618378,
 -0.09865640848875046,
 0.01937752775847912,
 -0.004414134658873081,
 0.04878390580415726,
 0.1500605344772339,
 -0.049730271100997925,
 0.10887563228607178,
 0.07925839722156525,
 -0.03000705875456333,
 -0.026256615296006203,
 0.01621185801923275,
 0.039685823023319244,
 -0.001498837023973465,
 -0.0267797764390707,
 0.030115759000182152,
 0.005555476061999798,
 0.046774279326200485,
 0.19266989827156067,
 -0.02019449509680271,
 0.006202386692166328,
 0.015504976734519005,
 0.06700988858938217,
 -0.019451739266514778,
 0.018186839297413826,
 0.014493770897388458,
 -0.06764443963766098,
 0.09920039027929306,
 0.08250521868467331,
 0.03336912393569946,
 0.11773338913917542,
 -0.03351331502199173,
 0.07179781794548035,
 -0.18448804318904877,
 -0.0197548046708107,
 -0.017714664340019226,
 0.12445169687271118,
 0.02748987078666687,
 0.01750885136425495,
 -0.059615299105644226,
 0.0033058077096939087,
 0.08752410113811493,
 0.0942256897687912,
 -0.1654362976551056,
 -0.06923578679561615,
 0.04007047042250633,
 0.026420842856168747,
 -0.04796523600816727,
 0.13311980664730072,
 0.018357206135988235,
 -0.01189229916781187,
 0.06303215026855469,
 0.034484561532735825,
 0.0814635381102562,
 0.08240565657615662,
 -0.003505193628370762,
 0.04957431182265282,
 -0.11623919755220413,
 0.028763558715581894,
 -0.05360335111618042,
 0.01727301813662052,
 -0.010051139630377293,
 0.08175407350063324,
 -0.02495856210589409,
 0.04708561301231384,
 0.12423346191644669,
 -0.019826268777251244,
 -0.01785324141383171,
 0.033566221594810486,
 0.03995600715279579,
 0.16404400765895844,
 -0.05547409504652023,
 -0.0010914923623204231,
 0.004697272554039955,
 0.07940614223480225,
 -0.03247788920998573,
 0.01867639645934105,
 0.06854085624217987,
 -0.06572879105806351,
 -0.024124089628458023,
 0.16388298571109772,
 -0.024828284978866577,
 0.0249160323292017,
 -0.013938387855887413,
 0.020326070487499237,
 0.006996334530413151,
 0.023765256628394127,
 -0.005187366157770157,
 0.09260062873363495,
 0.005694741848856211,
 -0.11202815920114517,
 0.023884126916527748,
 -0.051436565816402435,
 -0.04748039320111275,
 -0.019691068679094315,
 -0.11744862794876099,
 -0.056783631443977356,
 0.0620306134223938,
 -0.0157561544328928,
 -0.03777662664651871,
 0.09631360322237015,
 -0.1594366431236267,
 -0.02411549910902977,
 0.018708499148488045,
 -0.011242246255278587,
 -0.05065244063735008,
 0.0920867845416069,
 -0.045188501477241516,
 0.02484002336859703,
 -0.061890728771686554,
 0.04936341196298599,
 -0.07623309642076492,
 0.06964029371738434,
 -0.09506729245185852,
 0.1260889172554016,
 -0.05375320091843605,
 -0.028824951499700546,
 0.07088449597358704,
 -0.0563826858997345,
 0.06858667731285095,
 -0.09268361330032349,
 -0.07293982058763504,
 -0.021877095103263855,
 0.04640849679708481,
 -0.08267684280872345,
 0.1712718904018402,
 0.08969420939683914,
 0.12968459725379944,
 0.06828564405441284,
 -0.07446791976690292,
 0.04404822364449501,
 0.09619954228401184,
 0.014123346656560898,
 -0.05217517167329788,
 -0.033148810267448425,
 -0.1279505342245102,
 0.17056585848331451,
 0.07830381393432617,
 -0.02065315656363964,
 0.018212372437119484,
 -0.05478692054748535,
 0.0008633993566036224,
 -0.11331601440906525,
 -0.0970207080245018,
 -0.019216883927583694,
 0.02283022552728653,
 -0.09400290250778198,
 0.019097940996289253,
 0.09848718345165253,
 0.01893545687198639,
 0.012185772880911827,
 0.10844842344522476,
 -0.12887753546237946,
 0.06064837425947189,
 -0.06580662727355957,
 -0.007481692358851433,
 -0.03142566978931427,
 -0.06736018508672714,
 -0.052196428179740906,
 0.019975129514932632,
 0.04874351993203163,
 -0.2319813221693039,
 0.16128496825695038,
 0.07921537011861801,
 0.02356587164103985,
 0.21304066479206085,
 0.04416591674089432,
 -0.06543312966823578,
 0.10867374390363693,
 -0.059921473264694214,
 0.004250713624060154,
 -0.04853147640824318,
 -0.08667157590389252,
 0.08203355222940445,
 0.02892845682799816,
 0.11836115270853043,
 0.002109323628246784,
 0.04675440117716789,
 0.08557461202144623,
 0.03520556166768074,
 0.005658429116010666,
 0.07631449401378632,
 -0.11538150906562805,
 0.08367155492305756,
 -0.04233440011739731,
 0.026365965604782104,
 0.11810635775327682,
 -0.008726771920919418,
 -0.006178293377161026,
 -0.032966498285532,
 0.009704645723104477,
 0.10991569608449936,
 0.07687313854694366,
 0.045167192816734314,
 -0.10288700461387634,
 0.08407768607139587,
 -0.0060685910284519196,
 -0.004732324741780758,
 0.13219217956066132,
 -0.01730496622622013,
 -0.05590490251779556,
 -0.022574706003069878,
 -0.06257370859384537,
 0.025937484577298164,
 0.18327824771404266,
 0.062333844602108,
 -0.12896911799907684,
 0.016680652275681496,
 0.03501032292842865,
 -0.039892345666885376,
 -0.0010370910167694092,
 -0.059304557740688324,
 -0.007233946584165096,
 -0.008397813886404037,
 0.06103808432817459,
 -0.10880440473556519,
 -0.1236773282289505,
 -0.08680218458175659,
 0.03410463407635689,
 -0.0372888557612896,
 0.04124067723751068,
 0.053624026477336884,
 0.020399751141667366,
 -0.09959948062896729,
 0.03808313235640526,
 0.004636439960449934,
 0.033109381794929504,
 -0.009622188284993172,
 -0.12189603596925735,
 0.08236785978078842,
 -0.05138041079044342,
 -0.06599799543619156,
 -0.059065062552690506,
 0.06687866151332855,
 -0.012619949877262115,
 -0.07758430391550064,
 -0.04862333834171295,
 -0.12815603613853455,
 0.04510051757097244,
 0.06221261993050575,
 -0.01896725967526436,
 -0.11125870048999786,
 0.07486312836408615,
 0.07476145774126053,
 0.07034306228160858,
 -0.21859833598136902,
 -0.024630384519696236,
 0.02307208999991417,
 -0.060816820710897446,
 -0.0612434521317482,
 -0.037258923053741455,
 0.027293622493743896,
 -0.01798875629901886,
 -0.10089060664176941,
 -0.030837880447506905,
 -0.054076097905635834,
 0.09301923215389252,
 -0.03546946868300438,
 -0.01776343211531639,
 -0.02929227240383625,
 -0.00369332917034626,
 -0.08080127090215683,
 0.005425403825938702,
 0.006371726747602224,
 0.07843196392059326,
 -0.06922762095928192,
 0.09849295020103455,
 -0.05541716888546944,
 -0.02776404842734337,
 0.04205353558063507,
 -0.01546469610184431,
 -0.0618584007024765,
 -0.13895854353904724,
 -0.036999598145484924,
 -0.012432890012860298,
 -0.01635652780532837,
 -0.06800690293312073,
 0.07489623129367828,
 0.008075803518295288,
 -0.06403892487287521,
 0.06564713269472122,
 -0.09574733674526215,
 0.06792283803224564,
 -0.09947624057531357,
 0.056209221482276917,
 0.09886199235916138,
 0.10442547500133514,
 -0.019203905016183853,
 -0.22731733322143555]
          }
        }
      }
    }
  },
  "size": 100
}

response = requests.post(f"http://localhost:9200/infringement/_search", json=query)
response.json()

{'error': {'root_cause': [{'type': 'query_shard_exception',
    'reason': 'script_score: the script could not be loaded',
    'index_uuid': 'Y88sphZVRteeoVEj1xxvMA',
    'index': 'infringement'}],
  'type': 'search_phase_execution_exception',
  'reason': 'all shards failed',
  'phase': 'query',
  'grouped': True,
  'failed_shards': [{'shard': 0,
    'index': 'infringement',
    'node': 'FT2lVCDISOSXfUfAfxqyhw',
    'reason': {'type': 'query_shard_exception',
     'reason': 'script_score: the script could not be loaded',
     'index_uuid': 'Y88sphZVRteeoVEj1xxvMA',
     'index': 'infringement',
     'caused_by': {'type': 'illegal_argument_exception',
      'reason': 'script_lang not supported [knn]'}}}]},
 'status': 400}

The documentation has the query body for ES 5.X and ES 7.1:. Is there anything different for 6.8.1?

Get NaN score for [Doc_id]

Got the same Error

org.elasticsearch.ElasticsearchException: script_score query returned an invalid score [NaN] for doc [762].
        at org.elasticsearch.common.lucene.search.function.ScriptScoreQuery$ScriptScorable.score(ScriptScoreQuery.java:268) ~[elasticsearch-7.6.0.jar:7.6.0]
        at org.apache.lucene.search.TopScoreDocCollector$SimpleTopScoreDocCollector$1.collect(TopScoreDocCollector.java:76) ~[lucene-core-8.4.0.jar:8.4.0 bc02ab906445fcf4e297f4ef00ab4a54fdd72ca2 - jpountz - 2019-12-19 20:16:14]
        at org.apache.lucene.search.FilterLeafCollector.collect(FilterLeafCollector.java:43) ~[lucene-core-8.4.0.jar:8.4.0 bc02ab906445fcf4e297f4ef00ab4a54fdd72ca2 - jpountz - 2019-12-19 20:16:14]
        at org.apache.lucene.search.MatchAllDocsQuery$1$1.score(MatchAllDocsQuery.java:64) ~[lucene-core-8.4.0.jar:8.4.0 bc02ab906445fcf4e297f4ef00ab4a54fdd72ca2 - jpountz - 2019-12-19 20:16:14]
        at org.elasticsearch.common.lucene.search.function.ScriptScoreQuery$ScriptScoreBulkScorer.score(ScriptScoreQuery.java:296) ~[elasticsearch-7.6.0.jar:7.6.0]
        at org.elasticsearch.search.internal.CancellableBulkScorer.score(CancellableBulkScorer.java:56) ~[elasticsearch-7.6.0.jar:7.6.0]
        at org.apache.lucene.search.BulkScorer.score(BulkScorer.java:39) ~[lucene-core-8.4.0.jar:8.4.0 bc02ab906445fcf4e297f4ef00ab4a54fdd72ca2 - jpountz - 2019-12-19 20:16:14]
        at org.elasticsearch.search.internal.ContextIndexSearcher.searchLeaf(ContextIndexSearcher.java:198) ~[elasticsearch-7.6.0.jar:7.6.0]
        at org.elasticsearch.search.internal.ContextIndexSearcher.search(ContextIndexSearcher.java:171) ~[elasticsearch-7.6.0.jar:7.6.0]
        at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:445) ~[lucene-core-8.4.0.jar:8.4.0 bc02ab906445fcf4e297f4ef00ab4a54fdd72ca2 - jpountz - 2019-12-19 20:16:14]
        at org.elasticsearch.search.query.QueryPhase.searchWithCollector(QueryPhase.java:333) ~[elasticsearch-7.6.0.jar:7.6.0]
        at org.elasticsearch.search.query.QueryPhase.executeInternal(QueryPhase.java:295) ~[elasticsearch-7.6.0.jar:7.6.0]
        at org.elasticsearch.search.query.QueryPhase.execute(QueryPhase.java:134) ~[elasticsearch-7.6.0.jar:7.6.0]
        at org.elasticsearch.search.SearchService.loadOrExecuteQueryPhase(SearchService.java:338) ~[elasticsearch-7.6.0.jar:7.6.0]
        at org.elasticsearch.search.SearchService.executeQueryPhase(SearchService.java:358) ~[elasticsearch-7.6.0.jar:7.6.0]
        at org.elasticsearch.search.SearchService.lambda$executeQueryPhase$1(SearchService.java:343) ~[elasticsearch-7.6.0.jar:7.6.0]
        at org.elasticsearch.action.ActionListener.lambda$map$2(ActionListener.java:146) ~[elasticsearch-7.6.0.jar:7.6.0]
        at org.elasticsearch.action.ActionListener$1.onResponse(ActionListener.java:63) [elasticsearch-7.6.0.jar:7.6.0]
        at org.elasticsearch.action.ActionRunnable.lambda$supply$0(ActionRunnable.java:58) [elasticsearch-7.6.0.jar:7.6.0]
        at org.elasticsearch.action.ActionRunnable$2.doRun(ActionRunnable.java:73) [elasticsearch-7.6.0.jar:7.6.0]
        at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) [elasticsearch-7.6.0.jar:7.6.0]
        at org.elasticsearch.common.util.concurrent.TimedRunnable.doRun(TimedRunnable.java:44) [elasticsearch-7.6.0.jar:7.6.0]
        at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:692) [elasticsearch-7.6.0.jar:7.6.0]
        at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) [elasticsearch-7.6.0.jar:7.6.0]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?]
        at java.lang.Thread.run(Thread.java:835) [?:?]
[2020-07-30T12:07:01,049][WARN ][r.suppressed             ] [MTL343] path: /enron_first/_search, params: {index=enron_first}
org.elasticsearch.action.search.SearchPhaseExecutionException: all shards failed
        at org.elasticsearch.action.search.AbstractSearchAsyncAction.onPhaseFailure(AbstractSearchAsyncAction.java:545) [elasticsearch-7.6.0.jar:7.6.0]
        at org.elasticsearch.action.search.AbstractSearchAsyncAction.executeNextPhase(AbstractSearchAsyncAction.java:306) [elasticsearch-7.6.0.jar:7.6.0]
        at org.elasticsearch.action.search.AbstractSearchAsyncAction.onPhaseDone(AbstractSearchAsyncAction.java:574) [elasticsearch-7.6.0.jar:7.6.0]
        at org.elasticsearch.action.search.AbstractSearchAsyncAction.onShardFailure(AbstractSearchAsyncAction.java:386) [elasticsearch-7.6.0.jar:7.6.0]
        at org.elasticsearch.action.search.AbstractSearchAsyncAction.access$200(AbstractSearchAsyncAction.java:66) [elasticsearch-7.6.0.jar:7.6.0]
        at org.elasticsearch.action.search.AbstractSearchAsyncAction$1.onFailure(AbstractSearchAsyncAction.java:242) [elasticsearch-7.6.0.jar:7.6.0]
        at org.elasticsearch.action.search.SearchExecutionStatsCollector.onFailure(SearchExecutionStatsCollector.java:73) [elasticsearch-7.6.0.jar:7.6.0]
        at org.elasticsearch.action.ActionListenerResponseHandler.handleException(ActionListenerResponseHandler.java:59) [elasticsearch-7.6.0.jar:7.6.0]
        at org.elasticsearch.action.search.SearchTransportService$ConnectionCountingHandler.handleException(SearchTransportService.java:423) [elasticsearch-7.6.0.jar:7.6.0]
        at org.elasticsearch.transport.TransportService$ContextRestoreResponseHandler.handleException(TransportService.java:1118) [elasticsearch-7.6.0.jar:7.6.0]
        at org.elasticsearch.transport.TransportService$DirectResponseChannel.processException(TransportService.java:1227) [elasticsearch-7.6.0.jar:7.6.0]
        at org.elasticsearch.transport.TransportService$DirectResponseChannel.sendResponse(TransportService.java:1201) [elasticsearch-7.6.0.jar:7.6.0]
        at org.elasticsearch.transport.TaskTransportChannel.sendResponse(TaskTransportChannel.java:60) [elasticsearch-7.6.0.jar:7.6.0]
        at org.elasticsearch.action.support.ChannelActionListener.onFailure(ChannelActionListener.java:56) [elasticsearch-7.6.0.jar:7.6.0]
        at org.elasticsearch.action.ActionListener$1.onFailure(ActionListener.java:71) [elasticsearch-7.6.0.jar:7.6.0]
        at org.elasticsearch.action.ActionListener$1.onResponse(ActionListener.java:65) [elasticsearch-7.6.0.jar:7.6.0]
        at org.elasticsearch.action.ActionRunnable.lambda$supply$0(ActionRunnable.java:58) [elasticsearch-7.6.0.jar:7.6.0]
        at org.elasticsearch.action.ActionRunnable$2.doRun(ActionRunnable.java:73) [elasticsearch-7.6.0.jar:7.6.0]
        at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) [elasticsearch-7.6.0.jar:7.6.0]
        at org.elasticsearch.common.util.concurrent.TimedRunnable.doRun(TimedRunnable.java:44) [elasticsearch-7.6.0.jar:7.6.0]
        at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:692) [elasticsearch-7.6.0.jar:7.6.0]
        at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) [elasticsearch-7.6.0.jar:7.6.0]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?]
        at java.lang.Thread.run(Thread.java:835) [?:?]

Great help if one have solution.

How to query for two vectors with different weights?

This plugin is great for me. I have two vectors: image_vector and text_vector, and I added this two vectors in my ES index.

The question is how to query two vectors at the same time with different weight(since the image vector is more important than text vector)? thanks very much!

internal algorithm

Hi, I just want to know which type of KNN (like HNSW, LSH, and so forth) you built in this plugin.

Read string field for Document

Hi, Lior. I would like to make a similarity search, not for a vector of a number, but for a vector of objects (with a name for each of numbers: [{"name":"a", "value": 0.42},{"name":"b", "value": 0.52}]), is any way to read string value like BinaryDocValues way?

Cosine Similarities are not proper

Hello,

I was trying to measure the cosine similarity between vectors with dimensions (1, 128). Here is the query.

{ "query": { "function_score": { "boost_mode": "replace", "min_score": 0, "script_score": { "script": { "source": "binary_vector_score", "lang": "painless", "params": { "cosine": true, "field": "embedding_vector", "vector": [ <VECTOR> ] } } } } }, "size": 170 }

But results are meaningless. Eventhough two vectors is not similar, it's score is over 0.9.
Similarity scores decrease gradually.

Wrong result when searching query vector (512,)

I had followed the issue #25 and create a index consist of some documents (512,) dimension. When I have searched search all of those documents. Some results give me score 1.0 with the id which I searched for, but some results give me socre 0.5~0.6 with the id not the id I looked for. Please tell me the reason why this happens. Could you help me out to solve this problems? Hope you response soon. Thanks!

Cannot make it work

Hi,
I am relatively new to elasticsearch plugins, and I really appreciate your plugin but I couldn't make elasticsearch work with it.
I installed elasticsearch version 2.4.4 like you mentioned and compiled plugin using maven. When I tried to put it in plugins/ directory and run elasticsearch, I got an error

Exception in thread "main" java.lang.IllegalStateException: Could not load plugin descriptor for existing plugin [elasticsearch-binary-vector-scoring-2.4.4.jar]. Was the plugin built before 2.0?
Likely root cause: java.nio.file.FileSystemException: plugins/elasticsearch-binary-vector-scoring-2.4.4.jar/plugin-descriptor.properties: Not a directory

So I repackaged it as a folder with the following structure
__ fast-elasticsearch-vector-scoring/
__|-> elasticsearch-binary-vector-scoring-2.4.4.jar
__|-> plugin-descriptor.properties

Where plugin-descriptor.properties is a file I created manually that contains:

description=ElasticSearch Plugin for Binary Vector Scoring
version=2.4.4
name=VectorScoringPlugin
jvm=true
classname=com.liorkn.elasticsearch.plugin.VectorScoringPlugin
java.version=9
elasticsearch.version=2.4.4

Now, when I run elasticsearch it seems to load the plugin without any errors, however, when I try to use it I get the following error trace:

nested: QueryParsingException[script_score the script could not be loaded]; nested: ScriptException[scripts of type [inline], operation [search] and lang [knn] are disabled]; }
at org.elasticsearch.action.search.AbstractSearchAsyncAction.onFirstPhaseResult(AbstractSearchAsyncAction.java:206)
at org.elasticsearch.action.search.AbstractSearchAsyncAction$1.onFailure(AbstractSearchAsyncAction.java:152)
at org.elasticsearch.action.ActionListenerResponseHandler.handleException(ActionListenerResponseHandler.java:46)
at org.elasticsearch.transport.TransportService$DirectResponseChannel.processException(TransportService.java:874)
at org.elasticsearch.transport.TransportService$DirectResponseChannel.sendResponse(TransportService.java:852)
at org.elasticsearch.transport.TransportService$4.onFailure(TransportService.java:389)
at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:39)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1167)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:641)
at java.base/java.lang.Thread.run(Thread.java:844)
Caused by: ScriptException[scripts of type [inline], operation [search] and lang [knn] are disabled]
at org.elasticsearch.script.ScriptService.compile(ScriptService.java:244)
at org.elasticsearch.script.ScriptService.search(ScriptService.java:456)
at org.elasticsearch.index.query.functionscore.script.ScriptScoreFunctionParser.parse(ScriptScoreFunctionParser.java:104)
at org.elasticsearch.index.query.functionscore.FunctionScoreQueryParser.parse(FunctionScoreQueryParser.java:140)
at org.elasticsearch.index.query.QueryParseContext.parseInnerQuery(QueryParseContext.java:250)
at org.elasticsearch.index.query.IndexQueryParserService.innerParse(IndexQueryParserService.java:324)
at org.elasticsearch.index.query.IndexQueryParserService.parse(IndexQueryParserService.java:224)
at org.elasticsearch.index.query.IndexQueryParserService.parse(IndexQueryParserService.java:219)
at org.elasticsearch.search.query.QueryParseElement.parse(QueryParseElement.java:33)
at org.elasticsearch.search.SearchService.parseSource(SearchService.java:856)
at org.elasticsearch.search.SearchService.createContext(SearchService.java:667)
at org.elasticsearch.search.SearchService.createAndPutContext(SearchService.java:633)
at org.elasticsearch.search.SearchService.executeQueryPhase(SearchService.java:377)
at org.elasticsearch.search.action.SearchServiceTransportAction$SearchQueryTransportHandler.messageReceived(SearchServiceTransportAction.java:368)
at org.elasticsearch.search.action.SearchServiceTransportAction$SearchQueryTransportHandler.messageReceived(SearchServiceTransportAction.java:365)
at org.elasticsearch.transport.TransportRequestHandler.messageReceived(TransportRequestHandler.java:33)
at org.elasticsearch.transport.RequestHandlerRegistry.processMessageReceived(RequestHandlerRegistry.java:77)
at org.elasticsearch.transport.TransportService$4.doRun(TransportService.java:378)
at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37)

Did I miss something when building a jar file?

Thanks in advance,
Best,
Zakkery

Response has no hits even though index consists of vectors

Hi @lior-k @zewelor @ran22 @cakirmuha ,

Thanks for this great plugin! I was testing it out with elasticsearch version 6.8.1 (from the same branch of the plugin). I was able to index the data and even get a response when I queried over it. Unfortunately, the hits are empty. Here is my the code I used (similar to #25).

import base64
import numpy as np
import json

_float32_dtype = np.dtype('>f4')

import elasticsearch
print(elasticsearch.__version__)

def decode_float_list(base64_string):
    buffer = base64.b64decode(base64_string)
    return np.frombuffer(buffer, dtype=_float32_dtype).tolist()


def encode_array(arr):
    base64_str = base64.b64encode(np.array(arr).astype(_float32_dtype)).decode("utf-8")
    return base64_str

def create_index(name):

    #this request body works for es 6.7 (??) or higher, for es versions less than that, add mappings to on top of prepoerties like :
    """
    request_body = '''{
        "settings": {
            "number_of_shards": 2,
            "number_of_replicas": 1
        },

        'mappings':{
            "properties": {
                "embedding_vector": {
                    "type": "binary",
                    "doc_values": true
                }
            }
        }
    }'''
    """
    request_body = '''{
        "settings": {
            "number_of_shards": 2,
            "number_of_replicas": 1
        },

            "properties": {
                "embedding_vector": {
                    "type": "binary",
                    "doc_values": true
                }
            }
    }'''
    print(f"creating {name} index... {request_body}")
    es.indices.create(index = name, body = request_body)


def index_data(data):
    counter = 1
    for vector in data:
        body = {
            "id": counter,
            "embedding_vector": encode_array(vector)
        }
        es.index(index=INDEX_NAME, body=body)
        counter += 1


def search():
    # "vector": [ 0.6172189116477966, 0.4812350273132324, 0.2395150065422058, 0.41844668984413147, 0.8617216944694519, 0.12854498624801636, 0.2627895176410675, 0.22640013694763184, 0.5444879531860352, 0.52374267578125, 0.7576023936271667, 0.25305455923080444, 0.5308356285095215, 0.6852802038192749, 0.4624062180519104, 0.1816617250442505, 0.2958976626396179, 0.025580303743481636, 0.16926740109920502, 0.7047653198242188, 0.6931900978088379, 0.04226350784301758, 0.9671088457107544, 0.47195401787757874, 0.2582820653915405, 0.11039293557405472, 0.6919737458229065, 0.5618643760681152, 0.6426474452018738, 0.6258983612060547, 0.8140584826469421, 0.2586701810359955, 0.2690378725528717, 0.9467039704322815, 0.474464476108551, 0.7006123661994934, 0.3056519627571106, 0.934620201587677, 0.33563244342803955, 0.38651159405708313, 0.3424995541572571, 0.23031608760356903, 0.641241729259491, 0.01252000406384468, 0.5705199837684631, 0.24167191982269287, 0.4995182156562805, 0.9633683562278748, 0.618108868598938, 0.9971736669540405, 0.24285273253917694, 0.4431900978088379, 0.67298823595047, 0.5439957976341248, 0.5564237833023071, 0.2304188311100006, 0.4888533055782318, 0.4624284505844116, 0.788846492767334, 0.44891494512557983, 0.9873254299163818, 0.8286163806915283, 0.7455354332923889, 0.8039408326148987, 0.5274253487586975, 0.4829685688018799, 0.6627996563911438, 0.3408285975456238, 0.5105639100074768, 0.066745325922966, 0.13178864121437073, 0.35720911622047424, 0.1358930915594101, 0.5904856324195862, 0.12224390357732773, 0.7346777319908142, 0.9671003222465515, 0.48915180563926697, 0.7750203013420105, 0.14900848269462585, 0.6375364661216736, 0.21111196279525757, 0.8424895405769348, 0.13458995521068573, 0.5942713618278503, 0.6773364543914795, 0.8135702610015869, 0.33085259795188904, 0.3377285897731781, 0.9505098462104797, 0.5543105006217957, 0.9818258285522461, 0.297512948513031, 0.4442136883735657, 0.9673498868942261, 0.7054122090339661, 0.724175751209259, 0.6931982636451721, 0.8991569876670837, 0.01580190286040306, 0.11919090896844864, 0.38001662492752075, 0.5516496300697327, 0.8624045848846436, 0.13067130744457245, 0.12067067623138428, 0.642181932926178, 0.32152852416038513, 0.9839213490486145, 0.6214938759803772, 0.8877131342887878, 0.6137049198150635, 0.14480671286582947, 0.5091487169265747, 0.8738197088241577, 0.6978392004966736, 0.8988777995109558, 0.10804525017738342, 0.7366241216659546, 0.7556180357933044, 0.22851991653442383, 0.1791202872991562, 0.11619532108306885, 0.04393879696726799, 0.7954261898994446, 0.8965669870376587, 0.7234428524971008, 0.23360027372837067, 0.9665877223014832, 0.14681114256381989, 0.9289661645889282, 0.9380605816841125, 0.4196012616157532, 0.4730188846588135, 0.514502227306366, 0.5517736673355103, 0.6869121193885803, 0.8567425608634949, 0.7314034700393677, 0.9989842772483826, 0.3868770897388458, 0.9380677342414856, 0.4927084743976593, 0.7979277968406677, 0.45593059062957764, 0.0170291718095541, 0.6517185568809509, 0.5005806684494019, 0.8620452880859375, 0.5568361282348633, 0.07004088908433914, 0.5770776271820068, 0.8143753409385681, 0.8382748961448669, 0.0996832400560379, 0.5101017355918884, 0.4771038293838501, 0.9274903535842896, 0.22478686273097992, 0.9320020079612732, 0.05571257323026657, 0.6283928155899048, 0.6742311120033264, 0.0424797385931015, 0.7878830432891846, 0.5152276158332825, 0.16908106207847595, 0.5440091490745544, 0.7015048861503601, 0.25502151250839233, 0.40467849373817444, 0.432849258184433, 0.7071661353111267, 0.14723558723926544, 0.38334646821022034, 0.9520816802978516, 0.8364397287368774, 0.8559724688529968, 0.008303776383399963, 0.9050803184509277, 0.32011473178863525, 0.4527781903743744, 0.7674447298049927, 0.4480983316898346, 0.1805608868598938, 0.4140874147415161, 0.27097389101982117, 0.8837590217590332, 0.7211946845054626, 0.34096693992614746, 0.4692194163799286, 0.29635292291641235, 0.272903710603714, 0.00385366752743721, 0.17514188587665558, 0.6346434950828552]

    search = {
        "query": {
            "function_score": {
                "boost_mode": "replace",
                "script_score": {
                    "script": {
                        "source": "binary_vector_score",
                        "lang": "knn",
                        "params": {
                            "cosine": False,
                            "field": "embedding_vector",
                            "vector": [
                                -0.09217305481433868, 0.010635560378432274, -0.02878434956073761, 0.06988169997930527,
                                0.1273992955684662, -0.023723633959889412, 0.05490724742412567, -0.12124507874250412,
                                -0.023694118484854698, 0.014595639891922474, 0.1471538096666336, 0.044936809688806534,
                                -0.02795785665512085, -0.05665992572903633, -0.2441125512123108, 0.2755320072174072,
                                0.11451690644025803, 0.20242854952812195, -0.1387604922056198, 0.05219579488039017,
                                0.1145530641078949, 0.09967200458049774, 0.2161576747894287, 0.06157230958342552,
                                0.10350126028060913, 0.20387393236160278, 0.1367097795009613, 0.02070528082549572,
                                0.19238869845867157, 0.059613026678562164, 0.014012521132826805, 0.16701748967170715,
                                0.04985826835036278, -0.10990987718105316, -0.12032567709684372, -0.1450948715209961,
                                0.13585780560970306, 0.037511035799980164, 0.04251480475068092, 0.10693439096212387,
                                -0.08861573040485382, -0.07457160204648972, 0.0549330934882164, 0.19136285781860352,
                                0.03346432000398636, -0.03652812913060188, -0.1902569830417633, 0.03250952064990997,
                                -0.3061246871948242, 0.05219300463795662, -0.07879918068647385, 0.1403723508119583,
                                -0.08893408626317978, -0.24330253899097443, -0.07105310261249542, -0.18161986768245697,
                                0.15501035749912262, -0.216160386800766, -0.06377710402011871, -0.07671763002872467,
                                0.05360138416290283, -0.052845533937215805, -0.02905619889497757, 0.08279753476381302
                            ]

                        }
                    }
                }
            }
        },
        "size": 5
    }
    print(json.dumps(search))
    return es.search(index=INDEX_NAME, body=search)


if __name__ == '__main__':

    es = Elasticsearch("localhost:9200", send_get_body_as='POST', retry_on_timeout=True, timeout=5000)

    # workaround for http error for max long HTTP lines (>4096 bytes). either upgrade or downgrade es for permanent solution

    INDEX_NAME = "testindex16"
    create_index(INDEX_NAME)

    data = np.random.rand(10, 8, 8).tolist()
    #data = np.random.rand(10, 2, 2).tolist()

    #print(data)
    index_data(data)

    res = search()
    print(res)```

This is the result I get:

{'took': 0, 'timed_out': False, '_shards': {'total': 2, 'successful': 2, 'skipped': 0, 'failed': 0}, 'hits': {'total': 0, 'max_score': None, 'hits': []}}

Please help me out with this issue.

Can not find 6.8.1 branch ?

Sorry but I git clone the repository and cd into the directory and run:
mvn package

Eveything went fine but I could not find the 6.8.1 version anywhere ? What do I have to do to get ElasticSearch compatible version I need ?

I am indeed running 6.8.1

Thanks,
Steve

Does the plugin support array of vector?

as example said, it support this:

{
   	"id": 1,
   	....
   	"embedding_vector": "v7l48eAAAAA/s4VHwAAAAD+R7I5AAAAAv8MBMAAAAAA/yEI3AAAAAL/IWkeAAAAAv7s480AAAAC/v6DUgAAAAL+wJi0gAAAAP76VqUAAAAC/sL1ZYAAAAL/dyq/gAAAAP62FVcAAAAC/tQRvYAAAAL+j6ycAAAAAP6v1KcAAAAC/bN5hQAAAAL+u9ItAAAAAP4ckTsAAAAC/pmkjYAAAAD+cYpwAAAAAP5renEAAAAC/qY0HQAAAAD+wyYGgAAAAP5WrCcAAAAA/qzjTQAAAAD++LBzAAAAAP49wNKAAAAC/vu/aIAAAAD+hqXfAAAAAP4FfNCAAAAA/pjC64AAAAL+qwT2gAAAAv6S3OGAAAAC/gfMtgAAAAD/If5ZAAAAAP5mcXOAAAAC/xYAU4AAAAL+2nlfAAAAAP7sCXOAAAAA/petBIAAAAD9soYnAAAAAv5R7X+AAAAC/pgM/IAAAAL+ojI/gAAAAP2gPz2AAAAA/3FonoAAAAL/IHg1AAAAAv6p1SmAAAAA/tvKlQAAAAD/I2OMAAAAAP3FBiCAAAAA/wEd8IAAAAL94wI9AAAAAP2Y1IIAAAAA/rnS4wAAAAL9vriVgAAAAv1QxoCAAAAC/1/qu4AAAAL+inZFAAAAAv7aGA+AAAAA/lqYVYAAAAD+kNP0AAAAAP730BiAAAAA="
   }

does it support this?

{
   	"id": 1,
   	....
   	"vector_list": [
           "vector_1",
           "vector_2"
        ]
   }

In 5.6.0 ,can't plugin ,how to fix it?

bin/elasticsearch-plugin install /root/vector/target/releases/elasticsearch-binary-vector-scoring-5.6.0.zip

alert:
ERROR: Unknown plugin /root/vector/target/releases/elasticsearch-binary-vector-scoring-5.6.0.zip

how to fix it,thanks!

Does not return Documents

Hi lior,

This is one of the document I pushed along with several other, to Elasticsearch version 5.6.0
PUT /test/test/1
{
"account_id": 2000000007,
"question": " i have one number what is the process to add the second number?",
"answer": " You cannot add any additional numbers ",
"embedding_vector": "IGkgaGF2ZSBvbmUgbnVtYmVyIHdoYXQgaXMgdGhlIHByb2Nlc3MgdG8gYWRkIHRoZSBzZWNvbmQgbnVtYmVyPw=="
}

The python(3.6) code I used for embedding vector field is as follows

def stringToBase64(string): # converts string to base64
    return base64.b64encode(bytes(string, 'utf-8'))

def decode_float_list(base64_string): # converts base64 string to array
    byte = base64.b64decode(base64_string)
    print("byte is "+str(byte))
    return np.frombuffer(byte, dtype=dbig).tolist()

def encode_array(arr): # converts array back to base64 string
    base64_str = base64.b64encode(np.array(arr).astype(dbig)).decode("utf-8")
    return base64_str

The same functions as given in readme except for the stringToBase64() function

stringToBase64(" i have one number what is the process to add the second number?")
returns b'IGkgaGF2ZSBvbmUgbnVtYmVyIHdoYXQgaXMgdGhlIHByb2Nlc3MgdG8gYWRkIHRoZSBzZWNvbmQgbnVtYmVyPw=='

decode_float_list("IGkgaGF2ZSBvbmUgbnVtYmVyIHdoYXQgaXMgdGhlIHByb2Nlc3MgdG8gYWRkIHRoZSBzZWNvbmQgbnVtYmVyPw==")
returns [1.4992215195544858e-152, 5.760354975542939e+228, 4.701095635989595e+180, 9.150375480313843e+199, 1.6743793267120413e+243, 1.9402257160408965e+227, 1.3332560325640997e+179, 1.8173709219006215e-152]

Now when i query using this array returned, it does not give me the original document
POST /test/_search
{
"query": {
"function_score": {
"boost_mode": "replace",
"script_score": {
"script": {
"inline": "binary_vector_score",
"lang": "knn",
"params": {
"cosine": true,
"field": "embedding_vector",
"vector":[1.4992215195544858e-152, 5.760354975542939e+228, 4.701095635989595e+180, 9.150375480313843e+199, 1.6743793267120413e+243, 1.9402257160408965e+227, 1.3332560325640997e+179, 1.8173709219006215e-152]
}
}
}
}
},
"size": 100
}

also for most of the strings it throws this error for decode_float_list() function
an example string " but in case my free trial is over and i have no money to purchase yet can i still receive calls from people?"

Traceback (most recent call last):

  File "<ipython-input-39-619dfb271a22>", line 1, in <module>
    runfile('/home/robot/Desktop/base64.py', wdir='/home/robot/Desktop')

  File "/home/robot/anaconda3/lib/python3.6/site-packages/spyder/utils/site/sitecustomize.py", line 705, in runfile
    execfile(filename, namespace)
    
  File "/home/robot/anaconda3/lib/python3.6/site-packages/spyder/utils/site/sitecustomize.py", line 102, in execfile
    exec(compile(f.read(), filename, 'exec'), namespace)

  File "/home/robot/Desktop/base64.py", line 28, in <module>
    b=decode_float_list(a)

  File "/home/robot/Desktop/base64.py", line 12, in decode_float_list
    return np.frombuffer(byte, dtype=dbig).tolist()

ValueError: buffer size must be a multiple of element size

My primary guess are that maybe im using the wrong encodings to encode string to base64,
i dont get arrays for all strings(throws above error)
Can you help me out here?

Regards

Get the same results no matter how different the embedded vector

I queried something like this:

curl -XGET "$HOST/sappearance/_search?pretty" -H 'Content-Type: application/json' -d'
{
  "query": {
    "function_score": {
      "boost_mode": "replace",
      "boost": 1,
      "script_score": {
        "script": {
          "source": "binary_vector_score",
          "lang": "knn",
          "params": {
            "cosine": false,
            "field": "embedded",
            "encoded_vector":
"AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAPd9jHwAAAAA3BiywAAAAAAAAAAAAAAAAAAAAAD2f0Q8AAAAAAAAAAD1IY4k8kpnOAAAAAAAAAAA9r5S5PNp8mgAAAAAAAAAAOpXNNwAAAAA78meIAAAAAAAAAAA7yL4/AAAAAAAAAAA9c3QgAAAAAAAAAAAAAAAAAAAAADwIS5M+JBA+PM1oEwAAAAAAAAAAPb7UFgAAAAAAAAAAAAAAAAAAAAAAAAAAPWuvPwAAAAAAAAAAPgdXbwAAAAAAAAAAAAAAAAAAAAA8vUoQAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAPe0Z4QAAAAAAAAAAAAAAAAAAAAA9WR3bAAAAAD4IS4kAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAPbxhFzu8AFU9wkM7Pu8+/DWUJz0AAAAAAAAAAAAAAAAAAAAAPgz/cTvKBXU6sPdoO7pa0wAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAD1I7TE9QvIvAAAAAAAAAAAAAAAAAAAAADG9brM9v30LAAAAADwYBwAAAAAAAAAAADwkH0w9wzCCAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAPSlb/wAAAAAAAAAAPIfoPTwH5rEAAAAAAAAAAAAAAAAAAAAAAAAAAD15pBs7yhn+AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAADpW/8YAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA9KR/wAAAAAD0CALg+WdvnAAAAAAAAAAA9x1dyAAAAAAAAAAAAAAAAAAAAAD0W7kcAAAAAAAAAAD1UdJAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA80xyCPKIAQjv0KwA7pOv4PO7vZwAAAAA+CdeuAAAAAD2+5c08jKHQAAAAAAAAAAAAAAAAPb3M3wAAAAAAAAAAAAAAAAAAAAAAAAAAPOrJWQAAAAAAAAAAAAAAAAAAAAA8vOEoAAAAAAAAAAAAAAAAPLulGAAAAAAAAAAAPHMhBQAAAAAAAAAAPLpOowAAAAAAAAAAAAAAAAAAAAAAAAAAPfysagAAAAAAAAAAPUgk8j0WmvA9aA/APQ4yugAAAAA+Ua3RAAAAAD14ZH0AAAAAAAAAADymFS4AAAAAPKGDGwAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAD4pxqc9sLFDAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAPa7qzT0wi0A95EjtAAAAADv6DtEAAAAAAAAAADGeapUAAAAAAAAAAD0cxAUAAAAAPixwEQAAAAAAAAAAPVjWnAAAAAAAAAAAAAAAADy4S54AAAAAPgOCQwAAAAAAAAAAPC+VNQAAAAAAAAAAAAAAAAAAAAA62RwdPQAfkwAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA7gRl7AAAAAAAAAAAAAAAAAAAAADs3BtgAAAAAPZdp/wAAAAA81lgkAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAPFfUigAAAAAAAAAAPa36iwAAAAA8/DmxAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAPMlt/j0sYX0AAAAAAAAAAAAAAAAAAAAAPL7EtgAAAAAAAAAAAAAAAAAAAAA9jHwmPT1Faz1IuMgAAAAAAAAAAAAAAAA98OD7PiGCrAAAAAAAAAAAAAAAAD2Ap8MAAAAAAAAAAAAAAAAAAAAAAAAAADytj6EAAAAAAAAAAAAAAAA8bkR0AAAAAAAAAAA9wx9rAAAAAAAAAAAAAAAAAAAAAAAAAAA9uXPQO7nOmwAAAAAAAAAAPiQipQAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAPofOGgAAAAAAAAAAPQXMtQAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAD2msr0AAAAAPI17JTzLI+AAAAAAAAAAAAAAAAA88OoNPP2TBwAAAAA8ml1bAAAAAAAAAAAAAAAAPZCNWwAAAAAAAAAAAAAAAAAAAAA9BkzEPlRJ8j1FFjgAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAPSTy4gAAAAA9BumrPak55wAAAAAAAAAAO4/wfwAAAAAAAAAAPNad0QAAAAAAAAAAPZl7JQAAAAA9WPtHOmZDAAAAAAAAAAAAAAAAAAAAAAA9K2YzAAAAAAAAAAAxuLrzAAAAAAAAAAAAAAAAAAAAAAAAAAA9EWHfPF86lgAAAAAAAAAAAAAAAAAAAAAAAAAAPRaP7QAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAADyR3nMAAAAAAAAAAD3O5ZEAAAAAPKdB3AAAAAAAAAAAAAAAAAAAAAA9I9JvAAAAAD1IMhM8HrxQAAAAAAAAAAAAAAAAAAAAADvxOS0AAAAAAAAAADwUEzsAAAAAO5RG1wAAAAAAAAAAOwzWEAAAAAA9Axy1Pi9ZcQAAAAA8nnvhAAAAAAAAAAA8tvZAAAAAAAAAAAA="
          }
        }
      }
    }

And here is the sample results:

{
       "_index" : "sappearance",
       "_type" : "_doc",
       "_id" : "GYE1_3YB81_RQuAAxOtM",
       "_score" : 1.54172978,
       "_source" : {
         "_pid" : 1,
         "timestamp" : "2021-01-14T10:52:03.269931",
         "embedded" : "AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAPH44hAAAAAA2tlDbAAAAAAAAAAAAAAAAAAAAADu2GI4AAAAAAAAAADzdAeUAAAAAAAAAAAAAAAA9b6oLPLjqkgAAAAAAAAAAO4ARngAAAAA7RMb5AAAAAAAAAAA+QUhjAAAAAAAAAAA92vL+AAAAAAAAAAAAAAAAAAAAAD0W+xM8nRQrAAAAAAAAAAAAAAAAPHEDYQAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAADzXB4EAAAAAPh0ZFgAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAD4vqOU9HL46AAAAAAAAAAA9Jby8AAAAADwdLPYAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA8723XAAAAAD4nAwcAAAAAAAAAAAAAAAAAAAAAPQvDVwAAAAAAAAAAAAAAAAAAAAAAAAAAPh9yFgAAAAA9AVv4PkSx5DWWR1sAAAAAAAAAAAAAAAAAAAAAPbCmkQAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAADptaK4AAAAAPPxQewAAAAA8FK5VAAAAAAAAAAAAAAAAAAAAADHDmmg+jGplAAAAAD0Bk8IAAAAAAAAAADpaBB49cp4xAAAAAAAAAAAAAAAAAAAAAD1aN9MAAAAAAAAAAAAAAAAAAAAAPIqs2gAAAAAAAAAAAAAAAAAAAAAAAAAAO5j31TyGwcI9BY5pAAAAAAAAAAAAAAAAPLWsTD3CWH4AAAAAAAAAADyVkqYAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAD0SScgAAAAAAAAAAAAAAAA7n+1XAAAAAD2Ds0E+B+N/AAAAAAAAAAA9UZolAAAAAAAAAAAAAAAAAAAAAD2P5sUAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA8zuPoAAAAAAAAAAAAAAAAPYz23QAAAAA9Prd0AAAAADygm7I9ucLlAAAAAAAAAAAAAAAAO1v98QAAAAAAAAAAAAAAAAAAAAAAAAAAPT2OfAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAOzdYiwAAAAA7VvQ2PruU0wAAAAAAAAAAAAAAAD2W9VUAAAAAPnezjwAAAAAAAAAAPdYWIgAAAAA9i4gFPONVsgAAAAA8WP4vAAAAADzj28kAAAAAAAAAADtBIEwAAAAAPXMdPAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA7bnL6Pgf5pgAAAAA8K6R9AAAAAD2jhusAAAAAAAAAADGqyysAAAAAAAAAAAAAAAAAAAAAPXzVwAAAAAA91k8rAAAAAAAAAAAAAAAAAAAAAD4V+NoAAAAAAAAAAAAAAAAAAAAAPXA7ZgAAAAAAAAAAAAAAAAAAAAA8CLdaAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA7RWH8AAAAAAAAAAAAAAAAAAAAADxDZ9EAAAAAPRxPMQAAAAA9K34AAAAAAAAAAAA9o/L3AAAAAAAAAAAAAAAAAAAAAD1d8Z0AAAAAAAAAAAAAAAAAAAAAPjnlmgAAAAAAAAAAPek/lgAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA89YpRAAAAAD4ZSw4AAAAAAAAAAD3SrmYAAAAAO+5TZgAAAAAAAAAAAAAAAAAAAAA8fuj6Pe9kqQAAAAAAAAAAAAAAAD4XCz89IKRCPm3YtAAAAAAAAAAAAAAAADuWYsoAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA+DdPtAAAAAAAAAAAAAAAAPg6qtgAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAD0b8YI8r1EXAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA840plAAAAAAAAAAAAAAAAPSibcgAAAAAAAAAAAAAAADwzzUQAAAAAPZ8IuD0OtxcAAAAAPd9RmQAAAAAAAAAAAAAAAAAAAAAAAAAAPLG+wAAAAAA+DcGNPKaStwAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAADyKOF48xiVyPglCuQAAAAAAAAAAAAAAAAAAAAA+PaxuAAAAAAAAAAAxuH6eAAAAAAAAAAAAAAAAAAAAADzZKPw9yr4VOyWMOgAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAO88zPAAAAAAAAAAAAAAAAAAAAAAAAAAAOzSNSAAAAAAAAAAAAAAAAAAAAAAAAAAAPAiZfwAAAAAAAAAAAAAAAAAAAAA+EOQyAAAAAAAAAAA7HLBxAAAAAAAAAAAAAAAAAAAAAD3LuNcAAAAAAAAAAD07eUkAAAAAPAmuJgAAAAAAAAAAPHIMCgAAAAA7NrnXPLvIHQAAAAA8Jly5AAAAAAAAAAA9GtBEAAAAAAAAAAA=",
         "camera_id" : 100
       }
     },
     {
       "_index" : "sappearance",
       "_type" : "_doc",
       "_id" : "NoE1_3YB81_RQuAAxes0",
       "_score" : 1.54172978,
       "_source" : {
         "_pid" : 1,
         "timestamp" : "2021-01-14T10:52:05.291303",
         "embedded" : "AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA23yhyAAAAAAAAAAAAAAAAAAAAADzBIC4AAAAAAAAAADyI2DoAAAAAAAAAAAAAAAA9yBa5PYqL5QAAAAAAAAAAPBttUQAAAAAAAAAAPa1SagAAAAA+mKkpAAAAAAAAAAA9l/WTAAAAAAAAAAAAAAAAAAAAAD1iQKo9ITYOAAAAAAAAAAA8cJwnPJX+FAAAAAAAAAAAPTgvuwAAAAAAAAAAAAAAAAAAAAAAAAAAPXuwvwAAAAAAAAAAPVKqqwAAAAAAAAAAAAAAAD2HtOYAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA9WBXjAAAAAD48xacAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA9UNXzAAAAAAAAAAAAAAAAPchA5TwermA9QjEXPgIeBzWT4uwAAAAAAAAAAAAAAAAAAAAAPJpPFQAAAAA9wIsaPJZ4VgAAAAAAAAAAAAAAAAAAAAAAAAAAPTniIgAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAADG+hcs+fYybAAAAAD0IKpYAAAAAAAAAADttBAQAAAAAAAAAAAAAAAAAAAAAAAAAADyadWAAAAAAAAAAAAAAAAAAAAAAPPErtQAAAAA7B6Q+PBHenAAAAAAAAAAAAAAAAD0Q29IAAAAAAAAAAAAAAAAAAAAAPO4hqj0Yv5E9UB1dAAAAADvguH0AAAAAAAAAAD4Z+2YAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA9L5hkAAAAAD2aF3I9sdv3AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAPIKrEj1E7w4AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA9jXcYPJcOjgAAAAAAAAAAPkQSmgAAAAA9au0CAAAAAAAAAAA6zpROAAAAAAAAAAAAAAAAAAAAADuWckcAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA8r05QAAAAAAAAAAA92eCCAAAAAAAAAAAAAAAAAAAAAAAAAAA7EfcdPpYRJwAAAAAAAAAAAAAAAD33VcAAAAAAPeyTAQAAAAAAAAAAPReOZAAAAAA9rqaDAAAAAAAAAAAAAAAAAAAAAD1I99sAAAAAAAAAAAAAAAAAAAAAPNFcYwAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA6bC1GPcxXYj1ZYdkAAAAAAAAAAD0vkhEAAAAAPJitvTGnD6kAAAAAAAAAAAAAAAAAAAAAPQZTAAAAAAA94cfZAAAAAAAAAAAAAAAAPW4ziT3s/6YAAAAAPSbGVwAAAAAAAAAAPTrVywAAAAAAAAAAAAAAAAAAAAA9XlB6AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA7QrBCAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAPTXIgAAAAAA9/PqqAAAAADwgoLU+HPLnAAAAAAAAAAAAAAAAAAAAAD2FeggAAAAAAAAAADvMW/sAAAAAPZE39gAAAAAAAAAAPh2xLAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAD6ktcwAAAAAAAAAAAAAAAAAAAAAPJBpHgAAAAAAAAAAAAAAAAAAAAAAAAAAPapEgzxg51cAAAAAAAAAAD3ezFc+BVLNPXtmMgAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAADyL7WoAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA8kwECAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAPNSIVwAAAAAAAAAAPHhYDQAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAD2HjRQ76oQrAAAAAD2eYwsAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA9apjLAAAAAAAAAAAAAAAAPT328gAAAAAAAAAAAAAAAD6OjWA89K8kPmDYVz2DNZgAAAAAPOrL6AAAAAAAAAAAPAGmtQAAAAAAAAAAPOAJMAAAAAA97jIrOkbWCAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAADw1rCI9GCJFAAAAAAAAAAAxuHOHAAAAAAAAAAAAAAAAAAAAAD2APzg+EWdMOkfA3gAAAAAAAAAAAAAAAAAAAAAAAAAAPgjZDgAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAD2qvQ8AAAAAPExDSQAAAAAAAAAAAAAAAAAAAAA9dX39AAAAAAAAAAA9TPA7AAAAAAAAAAAAAAAAAAAAAD1g4ygAAAAAAAAAADwqjHAAAAAAPCe20AAAAAAAAAAAPI6GmQAAAAAAAAAAPR9I+QAAAAAAAAAAAAAAAAAAAAA7neglAAAAAAAAAAA=",
         "camera_id" : 100
       }
     },
     {
       "_index" : "sappearance",
       "_type" : "_doc",
       "_id" : "G4E1_3YB81_RQuAAxOtb",
       "_score" : 1.54172978,
       "_source" : {
         "_pid" : 1,
         "timestamp" : "2021-01-14T10:52:03.269931",
         "embedded" : "AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA2l/zeAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA9hlGVPIS9vwAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA+O1EDAAAAAAAAAAA9i7acAAAAAAAAAAAAAAAAAAAAAD3VwS09S38KAAAAAAAAAAAAAAAAPHjBHQAAAAAAAAAAAAAAAAAAAAAAAAAAOxjfrAAAAAAAAAAAPjGkgAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAD5H5CQ89DxFAAAAAAAAAAA9KXtfAAAAAD1u6akAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA9SXprAAAAAD4N9LIAAAAAAAAAAAAAAAAAAAAAPTsULQAAAAAAAAAAAAAAAAAAAAAAAAAAPiD6FQAAAAA9VhptPlm1xTWFN6sAAAAAAAAAAAAAAAAAAAAAPTNeMzy2K58AAAAAAAAAAAAAAAAAAAAAAAAAADyMiRIAAAAAPGt9ZQAAAAAAAAAAAAAAAAAAAAAAAAAAOwkT8DGv9f0+fmVKAAAAADzYwHEAAAAAAAAAADpLvqo9ZOqHAAAAAAAAAAAAAAAAAAAAAD1i1VcAAAAAAAAAAAAAAAAAAAAAPKhAVgAAAAAAAAAAPEHewAAAAAAAAAAAPX8GuDybM14AAAAAAAAAAAAAAAAAAAAAPLKkTz3u06kAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAD0Cw0oAAAAAAAAAAAAAAAA7rhq9AAAAAD2XC4w96oDRAAAAAAAAAAA8/tFlAAAAAAAAAAAAAAAAAAAAAD2YxPwAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA4rdd0PYzkmwAAAAAAAAAAPZK/xwAAAAA9t2zuAAAAAAAAAAA+Hr/aAAAAAAAAAAAAAAAAO94qFzyVv+QAAAAAAAAAAAAAAAAAAAAAPQN+HwAAAAAAAAAAAAAAAAAAAAA8on+4AAAAAAAAAAA9Y8HyAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAPrLM5gAAAAAAAAAAAAAAAD2s3SY629qxPo4vTAAAAAAAAAAAPYOI1wAAAAA9mbYePPoOEwAAAAA88rWIAAAAADypHSsAAAAAAAAAADwdL/cAAAAAPb699wAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAD0DXU4AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA8SvTpPiUilQAAAAAAAAAAAAAAADztpJAAAAAAAAAAADGVDBIAAAAAAAAAADsMnkcAAAAAPT+g3AAAAAA9pSkEAAAAAAAAAAAAAAAAPMQDxj4mMUAAAAAAPQBy3jydD0QAAAAAPa4JtQAAAAAAAAAAAAAAAAAAAAA7lkETAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA7STIvAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAPN8p/AAAAAAAAAAAAAAAAAAAAAA9DGc4AAAAAAAAAAAAAAAAAAAAAD2f8psAAAAAAAAAADwXTYYAAAAAPmPyzAAAAAA9jdetPhEd6QAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA8rgeGAAAAAD4TI0MAAAAAAAAAADyca14AAAAAPGkKSAAAAAAAAAAAAAAAAAAAAAA8v7OFPaT8VAAAAAAAAAAAAAAAAD4SH+k9gazbPlQ/RgAAAAAAAAAAAAAAADwoTUgAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA9rm2lAAAAAAAAAAAAAAAAPeujogAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAADyYPO88rhomAAAAADotqhIAAAAAAAAAAAAAAAA8wAoPAAAAAAAAAAA8lP0KAAAAAAAAAAAAAAAAOonWtgAAAAAAAAAAAAAAAD2AYDMAAAAAPVtrzT1su48AAAAAPDCxVgAAAAAAAAAAAAAAAAAAAAAAAAAAPHG1PAAAAAA91szHO10AOAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAPkBlcwAAAAAAAAAAAAAAAAAAAAA9Fg1SAAAAAAAAAAAxor94AAAAAAAAAAAAAAAAAAAAADs6Ho49s6ryAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAPKA2GAAAAAAAAAAAAAAAAAAAAAA+HWniAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAD3mAGkAAAAAAAAAAD2i2QIAAAAAAAAAAAAAAAAAAAAAO+V7yQAAAAA8R4saPLMS8QAAAAAAAAAAAAAAAAAAAAA8i4mBAAAAAAAAAAA=",
         "camera_id" : 100
       }
     }
   ]
Could you help me out? If I use the built-in dot product function there is no problem. 

es 5.6.3

hello,my es version is 5.6.3,how can I build it.
expect for you reply. thanks

Question: Score on multiple vectors

Hi @lior-k ,

first of all, many thanks for maintaining this plugin, I haven't tried it yet but it's something we were about to code ourselves.
I just had a question with regards to comparing documents to multiple vectors and how that would work, I presume through a fork or even something you might want to merge into the code base.

Going by the code, I think the main thing to changes is in this function where the value that is fetched is then treated as an array of vectors, on which the calculations are then performed, instead of the just the 1 vector that is now used... Is that correct and if so, is that something you might want to merge if/when we make the code change in a fork?

Keep up the good work!

elasticsearch run on jdk11,but plugin set java version 8, is that ok?

I got elasticsearch 7.5.2 and it run with jdk11, and got fast-elasticsearch-vector-scoring 7.5.2 which run with jdk1.8. Package the plugin with command 'mvn package' then install it. Then error comes, it hint ‘ERROR: Unkown plugin .\plugins\elasticsearch-binary-vector-scoring-7.5.2.zip’.

Go Code for convertion to Array fails

The Go Code from the Readme fails:

bits := binary.BigEndian.Uint64(decoded[i : i+4])
f := math.Float32frombits(bits)

as bits is uint64 i think the code should be

bits := binary.BigEndian.Uint32(decoded[i : i+4])
f := math.Float32frombits(bits)

How to support ES 7.5

I'm pretty new to ES. How can I modify this plugin to make it work on ES 7.5? Thanks!

change data types

What will happen if I change all the data types to double? Will the speed change a lot?

Should we use refresh=true on production? (Test is not suitable for 7.9)

Hello, I would like to raise an issue about the test case.

According to fast-elasticsearch-vector-scoring/src/test/java/com/liorkn/elasticsearch/PluginTest.java file,
the test case uses params.put("refresh", "true") on data insertion request, which makes vector scoring result works properly.

public void test() throws Exception {
  final Map<String, String> params = new HashMap<>();
  params.put("refresh", "true");

However, in production case, we often do not use refresh=true option on data insertion.
If we do not use this option, the vector scoring result does not work properly. (such as there are same cosine similarity score among result documents...)

I think one of two options should be considered

  1. Modify the test code (removing refresh=true) -> this may cause code level modification of vector scoring plugin
  2. Mention in the document that refresh=true option must be provided on data insertion

Incorrect scores for cosine similarity

Hi there, I'm trying to use the plugin with Elasticsearch 6.8.1 but getting strange document scores when doing cosine similarity queries.

I have the following query:

{
    "query": {
        "function_score": {
            "boost_mode": "replace",
            "script_score": {
                "script": {
                    "source": "binary_vector_score",
                    "lang": "knn",
                    "params": {
                        "cosine": true,
                        "field": "embedding_vector",
                        "encoded_vector": "PVC0BjxrQug8VcbbvRqFmrz5KQy9aQ2ku7g0eL1AxVo+KQK6vGHZJz56bne9YwQNvoFb7r21yio8TIP4PZH8b71ksh+8NetEvh/vhL3+gL89pUxfPfdQ8D3j8j47/BHwvio1dL62+ak5LioAvgslND2Hy9O+BvLdvAepBD3P1fW+MjrwvaWz+D1mjWw9vSibvUqxBL09jYM+Nx6FPfTthr4safw9LmFMvT+ZCD5mid0+M9wxPUQ3VD2K8kK+Lrv2PdsAdr6FzIg9pDl3Pd+3/j2C0Q67Vw6gPiMK1b4V43o7RwUAPd8Q9b6ZqnE9wArBPY9d0rx5qUS9xDpQvdPjKj4BZSc9rvZMvb3B373G6eg+JhPOvjg0o72BcnU9/Up6vYwqzr5VTkK+oOwOPUIGXD6mofU9/95Kvkt2/DvSv0i8o6poPKxrKD3gzTA7JUFAvZRJdjnaugC9DVqJPQkg1D4VoQW9JIsQvYnK0D5oMQe9XsByPVlXdj1Hoxk866z4vgtqyjyUp/y+DROkuwS1QD3OKlS+GaOBPPq92D1Xq8i+WVwqPgsmPL2CFY2+AuraPV35uL1VjGC9+bikPKydSD5Rk+K+oYchPoe5ez5bYWE9wBhcPgm1GT1L43A8mKqIPA+NULw6xgC+T0Hova0b1jxpyYy8/RdsvKX5OLycdjA="
                    }
                }
            }
        }
    },
    "size": 100
}

Which returns the following hit:

{
    "_index": "test_index",
    "_type": "_doc",
    "_id": "wJUwGnABOtCXXTPakSP3",
    "_score": 0.94501674,
    "_source": {
        ...
        "embedding_vector": "vRfgbD01DzG9RhYYvVYFdb0z/i69n1VGvEj+Qr3nw3g+PAt3vZZsHz3uAA48eLY0vmkUzb0Lrmy9U2W4PGLMdb4gdSi9lorsvWP/rL4hyg09s1xEPQQPfj2RYjI9FC7avgOK8r6EPzK95e7IvilbiDzxH3S9xA2EPYD4Sj1rkVG+VO7qvZL7Lr0JLYY8wf2yPMpmxDzN44A9+Is6O572hr29T7Q8mEXYPM7/xD6DwzU+SUVZPbrdizznYfG9q4yiPcuGaL5BUDo9PwzWPeRaNz2BUIw85y1sPX/2dL5SsfI82nFgPRG17L5Pi+Y92SyIPXEezr0gcn69L2QpPNajKD5FR3k9FlwLveHtVL2vyNY+E5KOvl6xXztcz4A9qd25vfodoL4y2a6+jZO8PRmFhD7Uxx4+B+lGvZVW4z2MvMC+GtDwPSvnhD2ADPS6/DgAvcmDiTx8j4C982oePbJ9gD4EQj+85ABIveBOKD4nGse9Ik2oPQCXojzcDsw9x9oeviKutbwopsi982++u5f1QD1JZIy97Ul5vBff4D2hK7O+F1woPgWhF7zE2+y9hXfCvbxKDzyM2Wi+AF2gO7XkYD4eYD6+jTnoPmmgqz5GX3477NuIPkzHJDw1ZzA9dUM6vT2p+rzXI4i+AppVvcY047w2xCg9AflKPaailDydrYg="
    }
}

The score returned from the plugin is 0.94501674, however if you decode both vectors (using python code provided in the README) and compute the cosine similarity using the below function then the actual (correct) answer is 0.8900337068367593:

import numpy as np
from numpy.linalg import norm


def cosine_sim(vec1, vec2):
    """
    Computes the cosine similarity between two vectors
    :param vec1:
    :param vec2:
    :return:
    """
    cos_sim = np.dot(vec1, vec2) / \
        (norm(vec1) * norm(vec2))
    return cos_sim

I've tried the same with Elasticsearch 7.5.0 (and appropriate version of the plugin) and get the same result. I'm using the following Dockerfile to build/install the plugin and run Elasticsearch:

FROM maven:3.5-jdk-8-alpine AS build
COPY fast-elasticsearch-vector-scoring /opt/fast-elasticsearch-vector-scoring
RUN cd /opt/fast-elasticsearch-vector-scoring && mvn package


FROM elasticsearch:6.8.1
COPY --from=build /opt/fast-elasticsearch-vector-scoring/target/releases/elasticsearch-binary-vector-scoring-6.8.1.zip /plugins/elasticsearch-binary-vector-scoring-6.8.1.zip

# Set development mode ENV variables
ENV xpack.security.enabled=false
ENV discovery.type=single-node

# Install the plugin
RUN /usr/share/elasticsearch/bin/elasticsearch-plugin install file:///plugins/elasticsearch-binary-vector-scoring-6.8.1.zip

Any ideas why the score is inconsistent with the calculated value? Any help is greatly appreciated!

Scoring problem

Hi,
I applied your plug-in for my IR system. however, I got some problem in result, the score is all 0 or too big but I don't where I am wrong. I use elasticsearch 5.6 and import data, create vector and base64 representation by python. The vector have 300 float elements
this is my mapping:

    "mappings": {
        "QA_PAIR": {
            "properties": {
                "answer": {
                    "type": "text",
                    "fields": {
                        "keyword": {
                            "type": "keyword",
                            "ignore_above": 256
                        }
                    }
                },
                "embedding_vector": {
                    "type": "binary",
                    "doc_values": true
                },
                "field": {
                    "type": "text",
                    "fields": {
                        "keyword": {
                            "type": "keyword",
                            "ignore_above": 256
                        }
                    }
                },
                "id": {
                    "type": "long"
                },
                "question": {
                    "type": "text",
                    "fields": {
                        "keyword": {
                            "type": "keyword",
                            "ignore_above": 256
                        }
                    }
                },
                "vector": {
                    "type": "float"
                }
            }
        }
    }

This is my POST request:
{"query": {
"function_score": {
"script_score": {
"script": {
"source": "binary_vector_score",
"lang": "knn",
"params": {
"cosine": false,
"field": "embedding_vector",
"vector": [
-0.393664352595806,
-0.0226647332310677,
-0.00257770717144012,
-0.0262848362326622,
0.0157693792134523,
0.021204886958003,
-0.00300449808128178,
-0.0353441946208477,
-0.0108432853594422,
-0.00832815747708082,
-0.0334242396056652,
0.0286900401115417,
-0.0486984103918076,
-0.0764514692127705,
-0.098212705925107,
-0.0188445895910263,
-0.00867833849042654,
-0.00819160416722298,
0.00789748691022396,
0.00992754753679037,
0.00328949838876724,
-0.0214207042008638,
0.053695060312748,
0.0114074740558863,
0.018918838351965,
-0.00494892848655581,
0.0158535111695528,
0.000340433209203184,
-0.0416207760572433,
0.00885819271206856,
0.0203350558876991,
0.0196341946721077,
0.00805150344967842,
0.0172809381037951,
-0.0163073185831308,
-0.0242628678679466,
-0.0123006738722324,
-0.0107043273746967,
0.0132684763520956,
0.0103193828836083,
0.0209956336766481,
0.0202617701143026,
-0.00449958629906178,
-0.0305256191641092,
-0.011040105484426,
0.00017193662642967,
-0.00850920379161835,
-0.00640081474557519,
0.00420213444158435,
-0.0274321790784597,
-0.0123433945700526,
0.0274937450885773,
-0.0168586056679487,
0.0176509246230125,
-0.0169484633952379,
0.0091985696926713,
0.0104979490861297,
-0.0262205079197884,
-0.00236284523271024,
-0.00904732383787632,
-0.0107208713889122,
-0.00682731717824936,
0.00660926382988691,
0.00560688180848956,
-0.0247711259871721,
-0.0190500840544701,
0.00775287440046668,
-0.0289954785257578,
-0.00747184874489903,
0.0193586144596338,
0.0120802875608206,
0.016206843778491,
-0.0174826458096504,
0.0147814601659775,
0.017630135640502,
-0.0204884801059961,
-0.014909434132278,
0.0196892376989126,
-0.00302323698997498,
-0.0090223103761673,
0.01304164621979,
-0.014606237411499,
0.0423114001750946,
0.00438828580081463,
-0.0239541307091713,
-0.00360357551835477,
-0.0266071911901236,
-0.0276379473507404,
0.0115173496305943,
0.0411788076162338,
0.00390775268897414,
0.00776109704747796,
-0.0186252873390913,
-0.0338746309280396,
-0.00381790637038648,
-0.0157410316169262,
-0.00173758645541966,
-0.0276267547160387,
-0.0256764013320208,
-0.0143972961232066,
0.00422005029395223,
0.00174226681701839,
-0.00529280072078109,
-0.00298483157530427,
0.0237466674298048,
-0.000381538906367496,
0.0214440282434225,
-0.00530762178823352,
0.000689058564603329,
-0.0125927813351154,
-0.00102692015934736,
-0.0095093185082078,
0.00104711460880935,
-0.0184073429554701,
-0.0241216663271189,
-0.00370215787552297,
-0.0137345250695944,
-0.0212880503386259,
0.0143173299729824,
-0.0115857478231192,
-0.0470556654036045,
0.0245862007141113,
-0.0331159941852093,
0.030479134991765,
-0.0119145521894097,
-0.0107079083099961,
0.0156362932175398,
0.00851063057780266,
-0.0122106913477182,
-0.0186388660222292,
-0.0129399606958032,
-0.00317808636464179,
0.0182554833590984,
0.0426450856029987,
0.000623300904408097,
-0.0300357472151518,
-0.00161296606529504,
-0.0180671066045761,
-0.00666029332205653,
0.00505791231989861,
0.0265592858195305,
0.0255843754857779,
-0.0100457491353154,
0.0253202132880688,
-0.012139892205596,
-0.0362259522080421,
-0.0152987893670797,
-0.0245484430342913,
-0.00930904969573021,
0.0191434323787689,
-0.0159061457961798,
-0.0140967750921845,
0.0159904416650534,
0.0179337114095688,
0.0332022458314896,
0.0305974595248699,
-0.0322530381381512,
0.0296993963420391,
-0.0249535031616688,
-0.0145818749442697,
-0.00493068667128682,
-0.0129649350419641,
-0.00838589575141668,
-0.00907904282212257,
-0.0218371748924255,
-0.00743792951107025,
0.0131364790722728,
-0.0295483488589525,
0.0190455988049507,
-0.0349388867616653,
0.0131504293531179,
-0.0179475024342537,
-0.0248983968049288,
-0.00160752714145929,
-0.027133621275425,
0.0055695753544569,
-0.00650271447375417,
-0.0160584971308708,
-0.0184302050620317,
-0.00805380754172802,
0.0180833451449871,
-0.0151601936668158,
0.0300506465137005,
0.00744215678423643,
0.00804111268371344,
-0.0188703071326017,
0.0101022161543369,
-0.0489775165915489,
0.0232952125370502,
-3.43503925250843e-05,
0.0263597127050161,
-0.00297580193728209,
-0.0150877563282847,
0.0113939428701997,
-0.0141372494399548,
0.00690645119175315,
0.00420609675347805,
-0.00289567932486534,
-0.0260642115026712,
0.023167734965682,
-0.0119347721338272,
-0.0317580290138721,
-0.00838809460401535,
0.014795234426856,
0.00265358691103756,
0.00478976033627987,
-0.00687875179573894,
-0.0365001782774925,
-0.00282259122468531,
0.0045535396784544,
0.0161611028015614,
-0.0309465117752552,
-0.0457352623343468,
-0.00723987026140094,
0.0160336326807737,
-0.0495293997228146,
-0.0121549246832728,
-0.0176614001393318,
0.0323552377521992,
-0.0234786756336689,
0.0126905748620629,
-0.0129129076376557,
-0.0250209085643291,
-0.0159870591014624,
0.0168980434536934,
-0.00574642140418291,
-0.0127127803862095,
-0.0235360134392977,
0.0177464168518782,
-0.0410195142030716,
0.0124300876632333,
-0.0408283472061157,
-0.00202183122746646,
0.0129581457003951,
0.00769506394863129,
0.0152833797037601,
0.00773142138496041,
0.0372184813022614,
-0.00283751101233065,
0.0116243157535791,
0.0340076796710491,
-0.0178979858756065,
0.00190313707571477,
0.0128652043640614,
0.0319913737475872,
0.0199989080429077,
0.00583306839689612,
0.0433395206928253,
-0.00441714935004711,
-0.0122670559212565,
0.00676291109994054,
-0.00459515443071723,
-0.0100066177546978,
-0.028498774394393,
0.00900082103908062,
0.0115390392020345,
0.0213581789284945,
-0.021892013028264,
-0.0315549038350582,
-0.00541313784196973,
0.00368455122224987,
0.0329747684299946,
-0.0160139333456755,
0.0159961935132742,
-0.0186245664954185,
0.012709497474134,
-0.00321295321919024,
-0.03536007553339,
-0.00312552158720791,
-0.00993746519088745,
0.0135975889861584,
0.0121281202882528,
0.0401832796633244,
-0.0233429968357086,
-0.0272878929972649,
-0.00900955405086279,
-0.0143176056444645,
0.0258419327437878,
-0.0159984435886145,
0.00148292630910873,
-0.0245296824723482,
-0.0071764518506825,
-0.0401290319859982,
-0.0263967737555504,
0.010935009457171,
-0.0164317842572927,
-0.0263612680137157,
0.00294899567961693,
0.00230222591198981,
0.000246382056502625,
-0.0147611405700445,
-0.0356217697262764,
-0.0434923432767391,
-0.00176502345129848,
-0.034493625164032,
-0.0177724678069353,
0.00524684647098184,
0.0187848526984453,
-0.00780178792774677,
-0.00951886456459761]
}
}
}
}
},"size": 2
}

The result:
{
"_score": 0,
"_source": {
"id": 1,
...
},
{
"_score": 0,
"_source": {
"id": 2,
...
}
Please help me, I don't know why.
Thanks

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.