Comments (5)
The result you get is consistent with the current query logic, in the sense that each predicate returns a subset of the records, and the result is the intersection of these two subsets:
flag = true
returns[pts:1, pts:3, pts5, pts7]
point <|2|> $pt
returns[pts:4, pts5]
The intersection of both returns is [pts:5]
, which is correct from a technical point of view.
That said, it is true that we don't currently have a way to filter the KNN operation so you can have the N results matching the filter.
As a workaround, you could store two tables. One table would store points where the flag is true, and the other would store the points where the flag is false.
DELETE FROM pts_true;
DELETE FROM pts_false;
DEFINE INDEX mt_pt1 ON pts_true FIELDS point MTREE DIMENSION 1;
DEFINE INDEX mt_pt1 ON pts_false FIELDS point MTREE DIMENSION 1;
INSERT INTO pts_true [
{ id: pts_true:1, point: [ 1f ], flag: true },
{ id: pts_true:3, point: [ 3f ], flag: true },
{ id: pts_true:5, point: [ 5f ], flag: true },
{ id: pts_true:7, point: [ 7f ], flag: true }
];
INSERT INTO pts_false [
{ id: pts_false:2, point: [ 2f ], flag: false },
{ id: pts_false:4, point: [ 4f ], flag: false },
{ id: pts_false:6, point: [ 6f ], flag: false },
];
LET $pt = [4.5f];
SELECT
id,
vector::similarity::cosine(point, $pt) AS similarity
FROM
pts_true
WHERE
point <|2|> $pt
ORDER BY
similarity DESC;
That would work for this simple example, but I agree that it would not be suitable for something more complex involving more intricate filters.
To meet this requirement we could introduce the following syntax:
SELECT
id,
flag,
vector::similarity::cosine(point, $pt) AS similarity
FROM
pts
WHERE
flag = true &&
point <||> $pt
ORDER BY
similarity DESC
LIMIT 2
In this case, the KNN operator would not limit the result and would only stop providing results once the limit is reached. That would be compatible with the way SurrealDB executes queries, allowing for any complexity in the filtering as well as pagination.
Would that work for you / Would you be happy with this syntax?
from surrealdb.
Yes, this would be amazing, thank you! And thank you for mentioning it in the stream :)
Will it be the only syntax instead of the current one? Having it work only by LIMIT may simplify it
from surrealdb.
We have started working on the implementation. For this to work efficiently, we need to make sure that the query is primarily ordered by the KNN distance. So, we will add a vector::distance::knn()
function that will be mandatory to be placed in the ORDER BY clause. It will also return the computed distance, so the distance does not need to be recomputed.
SELECT
id,
flag,
vector::distance::knn() AS distance
FROM
pts
WHERE
flag = true &&
point <||> $pt
ORDER BY
vector::distance::knn() DESC
LIMIT 2
I think both syntaxes may coexist.
from surrealdb.
Assigning to @emmanuel-keller who is able to answer this better.
from surrealdb.
Awesome! Will it be possible to order by distance alias? Will it work for cosine? And will it be possible to build index on multiple fields, e.g. vector + boolean?
from surrealdb.
Related Issues (20)
- Bug: Rust fails to build surrealdb HOT 15
- Bug: Parser interpretes string id with starting with number followed by 'e' as exponenets HOT 2
- Bug: COMPOSITE INDEX does not work
- Bug: 'Unsupported value' in no-index WHERE condition, causes INDEX to not be used HOT 1
- Bug: The alphabetical order of the properties of an object-based Record ID is query-significant HOT 1
- Bug: HTTP CREATE not working when using TOKEN auth HOT 10
- Bug: Can't define a schema with flexible object keys HOT 4
- Bug: database fails and stop on attempt to delete namespace HOT 1
- Bug: the parse::url::port() does not work on https url HOT 3
- Bug: Parse error when using RELATE query from docs
- Feature: NN Vector search and similarity masking (partial vector search)
- Bug: Floats treated as integers and duplicate records not checked when floats used as record id
- Bug: Http endpoint empty response HOT 3
- Bug: Strand does not check for nul bytes in release mode
- Bug: New parser does not work with backticks HOT 3
- Bug: CLI handling of version check not working HOT 1
- Bug: "No Iterator has been found" when mixing indexes
- Bug: REPL exits immediately HOT 3
- Bug: graph queries do not work as documented HOT 2
- Bug: Relation creation using Rust SDK fails
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from surrealdb.