Comments (7)
We wrote a post for the feature: https://blog.pgvecto.rs/my-binary-vector-search-is-better-than-your-fp32-vectors
from pgvecto.rs.
Thank you for your interest and contributions are greatly appreciated.
For the binary support, there're actually two part, how to store the binary vector inside postgres, and the implementation of binary vector search at the vector search part. And we can reuse the binary vector search part for the binary quantization, like qdrant's latest blog.
I think we can start with the vector search part by supporting binary vector search first, and then try to store it efficiently in postgres. @usamoi can introduce more on the technical details
from pgvecto.rs.
Thanks @VoVAllen for the response, you're correct I did miss the quantization part of distance computation. The Distance
does implement scalar_quantization_distance
and scalar_quantization_distance2
that takes two '&[u8]
. The issue here is that I don't really think we can reuse those for computing the hamming distance between binary data if we don't have them stored as binary, or at least I don't know how to do so. The issue here is that hamming
should be a separate distance from L2 and should be added as a variant to Distance
.
pub enum Distance {
L2,
Cosine,
Dot,
Hamming,
}
But the impl distance is tightly coupled with the Scalar
struct. For hamming distance, you shouldn't have to operate on the Scalar
data. Maybe having a generic T : Data
where Data
ressembles RawData trait 🤔 could resolve this ?
from pgvecto.rs.
@usamoi could you please help answer the question?
from pgvecto.rs.
Binary vectors require a lot of work. Maybe we should implement binary quantization. It is simpler and saves memory too.
from pgvecto.rs.
Completed in #368
from pgvecto.rs.
Here is an example:
CREATE TABLE items (
id bigserial PRIMARY KEY,
embedding bvector(3) NOT NULL
);
INSERT INTO items (embedding) VALUES ('[1,0,1]'), ('[0,1,0]');
from pgvecto.rs.
Related Issues (20)
- copy with BINARY FORMAT fails with cannot find a dumper for type vector HOT 3
- sdk: Sparse vector indices type mismatch between sdk and function signature with numpy.ndarray
- How to find index size ? HOT 3
- Execute pg_resetwal in docker Unraid HOT 3
- feat: Support vector aggregation function HOT 1
- feat(fdw): How to be compatible with new pgvector types HOT 1
- feat: ANN benchmark HOT 3
- bench(fdw): Latency HOT 4
- fix(bench): Fix ZillizBench HOT 1
- feat: Add pgvecto.rs to vector hub HOT 2
- unknown x86 target feature HOT 2
- install patched pgrx failed HOT 4
- Can I index array of vectors? HOT 4
- chore(ecosystem): Langchain Python SDK Bump Version HOT 2
- Feature Request: Add Sum Aggregation and Column-Wise Multiplication for Sparse Vectors HOT 4
- SELECT * FROM pg_vector_index_stat does not work with partitions HOT 4
- feat(CI): performance integration by codespeed
- feat: normalized hamming distance HOT 1
- Partitions and partial indexing HOT 4
- Crash on indexing with 0.2.1 docker container HOT 12
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from pgvecto.rs.