GithubHelp home page GithubHelp logo

Binary vectors and Index about pgvecto.rs HOT 7 CLOSED

tensorchord avatar tensorchord commented on September 26, 2024 1
Binary vectors and Index

from pgvecto.rs.

Comments (7)

gaocegege avatar gaocegege commented on September 26, 2024 1

We wrote a post for the feature: https://blog.pgvecto.rs/my-binary-vector-search-is-better-than-your-fp32-vectors

from pgvecto.rs.

VoVAllen avatar VoVAllen commented on September 26, 2024

Thank you for your interest and contributions are greatly appreciated.

For the binary support, there're actually two part, how to store the binary vector inside postgres, and the implementation of binary vector search at the vector search part. And we can reuse the binary vector search part for the binary quantization, like qdrant's latest blog.

I think we can start with the vector search part by supporting binary vector search first, and then try to store it efficiently in postgres. @usamoi can introduce more on the technical details

from pgvecto.rs.

AmineDiro avatar AmineDiro commented on September 26, 2024

Thanks @VoVAllen for the response, you're correct I did miss the quantization part of distance computation. The Distance does implement scalar_quantization_distance and scalar_quantization_distance2 that takes two '&[u8]. The issue here is that I don't really think we can reuse those for computing the hamming distance between binary data if we don't have them stored as binary, or at least I don't know how to do so. The issue here is that hamming should be a separate distance from L2 and should be added as a variant to Distance.

pub enum Distance {
    L2,
    Cosine,
    Dot,
   Hamming,
}

But the impl distance is tightly coupled with the Scalar struct. For hamming distance, you shouldn't have to operate on the Scalar data. Maybe having a generic T : Data where Data ressembles RawData trait 🤔 could resolve this ?

from pgvecto.rs.

gaocegege avatar gaocegege commented on September 26, 2024

@usamoi could you please help answer the question?

from pgvecto.rs.

usamoi avatar usamoi commented on September 26, 2024

Binary vectors require a lot of work. Maybe we should implement binary quantization. It is simpler and saves memory too.

from pgvecto.rs.

usamoi avatar usamoi commented on September 26, 2024

Completed in #368

from pgvecto.rs.

gaocegege avatar gaocegege commented on September 26, 2024

Here is an example:

CREATE TABLE items (
  id bigserial PRIMARY KEY,
  embedding bvector(3) NOT NULL
);

INSERT INTO items (embedding) VALUES ('[1,0,1]'), ('[0,1,0]');

from pgvecto.rs.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.