Comments (9)
Haha @rushmorem this is an interesting discussion. Just before 1.0.0-beta.1 we had a VOID
type too. It all got a bit confusing!
I'll just summarise why we have both a NONE
and a NULL
type, and then we can go from there.
Setting a field to NULL
explicitly sets the field on the document (if the FIELD
definition allows it) so that it exists, but its value is null
(in JSON terms). We also need to deal with null
values from JSON when working with JavaScript libraries.
Setting a field to NONE
explicitly deletes a field on the document (if the FIELD
definition allows it) so that it no longer exists on the record. This means the record takes up less space.
When searching for values we effectively want three different queries (and there may be an issue here, because currently NONE
= NULL
, but we do have an ==
operator to check on value type aswell).
- SELECT all records where the field doesn't exist at all, or does exist and the value is
null
. - SELECT all records where the field doesn't exist at all.
- SELECT all records where the field exists and the value is
null
.
Current you can do this using...
SELECT * FROM person WHERE email = NONE;
orSELECT * FROM person WHERE email IS NONE;
SELECT * FROM person WHERE email == NONE;
SELECT * FROM person WHERE email == NULL;
I think it's also important to note, that although we can have schema-full tables, with fields which have specific types, we can also have schema-full tables without any specific type defined on a field, and schame-less tables where any field values can be inserted. As a result, an index on a field doesn't necessarily know the type of that field, or know that all values in that field are of a certain type.
from surrealdb.
Finally, instead of EMPTY I think we should use NONE for consistency. That is IGNORE NONE not IGNORE EMPTY. Also empty is ambiguous. For example, users might think that empty is referring to that particular type's representation of an empty value rather than NONE. For example, if that field is of type string they might think we are referring to "".
@rushmorem I agree with this. Couchbase has a MISSING
keyword which can be used to find fields which exist but are empty, or fields which are not set (don't exist on the record). This is kind of what I meant with EMPTY
, but I absolutely agree that users might expect ""
to also equate to EMPTY
!
from surrealdb.
Hi @rushmorem this is an interesting issue.
Looking at the FAQs for SQLite (https://sqlite.org/faq.html#q26), there's definitely two different opinions on this.
I'd like to continue supporting the current approach either way, so my thoughts are:
- Default to including null/none values, and add the ability to specify that unique indexes shouldn't include them...
DEFINE INDEX email ON TABLE user FIELDS email UNIQUE;
DEFINE INDEX email ON TABLE user FIELDS email UNIQUE IGNORE EMPTY;
- Default to not including null/none values, and add the ability to specify that unique indexes should include them...
DEFINE INDEX email ON TABLE user FIELDS email UNIQUE;
DEFINE INDEX email ON TABLE user FIELDS email UNIQUE INCLUDE EMPTY;
I personally prefer the first approach, as that is how it currently works, and it's easier to 'ignore' something from the default, rather than 'include' it. However, it would be great to get other thoughts on this.
from surrealdb.
Hi @tobiemh, yes NULL
s are definitely an interesting topic to say the least. For the purpose of just getting the job done I think either approach is fine. However, I think the most suitable approach in terms of correctness and least surprising behaviour depends on the semantics we decide to go for. That is, whether we decide to adopt the SQL NULL or NONE with Rust's semantics.
According to this Stanford article:-
A boolean comparison between two values involving a NULL returns neither true nor false, but unknown in SQL's three-valued logic. [3] For example, neither NULL equals NULL nor NULL not-equals NULL is true. Testing whether a value is NULL requires an expression such as IS NULL or IS NOT NULL.
This means NULL values cannot be compared against each other and as such should be treated as different. If we translate this to Rust, it's analogous to comparing two None
s of different types, which is not possible by default. That is, None::<T> == None::<U>
which doesn't even compile by default.
If we take this approach, option 2 becomes the most intuitive one. This is the mindset I opened this issue in.
On the other hand, if we choose to go with Rust semantics, given that we know a field is of the same type T
, then NONE becomes comparable. That is None::<T> == None::<T>
which is always true
. That makes option 1 the most intuitive implementation.
Given that nulls in general, as they are implemented in SQL and other languages, are very confusing and error prone I would like to propose that we adopt Rust semantics of None
and stick to that. Since we are not fully StructuredQL
compliant anyway, I think that strategy will pay off for us. It can eliminate a whole lot of confusion. To benefit from this we should strictly stick to NONE
and not recognise the SQL keyword NULL at all.
Finally, instead of EMPTY
I think we should use NONE
for consistency. That is, IGNORE NONE
not IGNORE EMPTY
. Also empty is ambiguous. For example, users might think that empty is referring to that particular type's representation of an empty value rather than NONE
. For example, if that field is of type string
they might think we are referring to ""
.
from surrealdb.
This also is related to another issue (of yours) and a comment that I have recently added... #73 (comment)
from surrealdb.
Thanks for the explanation @tobiemh! Seeing NULL mentioned in some places and NONE in others was confusing me. I thought perhaps you had started off using NONE but now want to move to NULL, which would be more familiar to people coming from StructuredQL
. This explains it.
This situation is definitely confusing. I'm glad you got rid of VOID
! 😄 It would be even more confusing now. Now let's close the loop and get rid of NULL
. As per that article I linked earlier SELECT * FROM person WHERE email == NULL;
shouldn't even be possible, only SELECT * FROM person WHERE email IS NULL;
. This will likely confuse users.
When searching for values we effectively want three different queries
So if we get rid of NULL
, that will reduce those queries to just one:
SELECT all records where the field doesn't exist at all
right? Coupled with the space savings of not having those fields on the document at all, that sounds awesome! Perhaps I'm missing something but I don't see any reason why one would want NULL over NONE in this case. After all, whether we explicitly set the field on the document or skip it altogether should be an implementation detail, right?
from surrealdb.
We also need to deal with null values from JSON when working with JavaScript libraries.
I can see how this can be a problem for fields or tables with no defined schema if we want to give users exactly the same documents they inserted (are we not potentially already altering the documents in some cases by casting them to our internal types?). I'm not familiar with how JavaScript handles missing fields. Doesn't it deserialise them to null
anyway or is it something the developer would have to handle manually?
If it's the former then this shouldn't be a problem, right? If it's the later, would documenting that JSON fields with null
values will not be returned when the data is queried help?
Another way we could handle this, if I'm understanding the problem correctly, is to not set those fields at all for defined fields. Since we still know about this field, we can still return it as NONE
on SELECT
s (or null
in JSON documents). For schema-less tables where this field is not defined, we explicitly set it to null
so that we can return it when queried.
from surrealdb.
As a result, an index on a field doesn't necessarily know the type of that field, or know that all values in that field are of a certain type.
Would it make sense to handle cases were the index does know differently from cases where it doesn't? This is what Rust does.
For fields with a defined type that translates to None::<T> == None::<T>
so a uniqueness index can compare these and treat them as equal. To filter out NONE
values, one would have to IGNORE NONE
, as per your first option.
For fields with no defined type that translates to None::<T> == None::<U>
. We can't presume to know if they are equal or not. Just like Rust refuses to compile these by default, how about refusing to index them? When one tries to define an index on an undefined type field we return a descriptive error telling them that they need to define the field's type first.
We can even extend this by warning users when they define indexes on types that the database knows to not have size restrictions or are not small so they don't shoot themselves in the foot by indexing large fields. That is, warn when defining an index on fields that do not have ASSERT is::uuid($value)
, ASSERT is::domain($value)
etc.
from surrealdb.
@tobiemh Hi! May this not fully related to this problem, but even here I see wrong ID of record, when it create duplicate.
# The first person with no national_id will be accepted
> CREATE foo SET name = "John Doe";
[{"time":"266.761µs","status":"OK","result":[{"id":"foo:yki2758ba7dwzastfsk1","name":"John Doe","national_id":"NONE"}]}]
So, id is "foo:yki2758ba7dwzastfsk1"
# Any subsequent records without a national_id will be rejected
> CREATE foo SET name = "Jane Doe";
[{"time":"168.721µs","status":"ERR","detail":"Database index `national_id_idx` already contains `foo:4vgq4l8u4y211c3ekgn9`"}]
But here it refers to non existed id "foo:4vgq4l8u4y211c3ekgn9".
Thank you in advance for answer and for this awesome project!
from surrealdb.
Related Issues (20)
- Feature: Timeout for client if server is offline
- Feature: `DELETE` statement `LIMIT` clause support
- Feature: Error code support to differentiate errors
- Feature: drop variables HOT 4
- Bug: Cannot perform addition with '2022-07-03' and '18y' HOT 17
- Bug: `DELETE` gives strange `Cannot perform multiplication` error HOT 2
- Bug: Random record IDs provide less than 128 bits of randomness HOT 1
- Feature: Better EXPLAIN Details and Statistics
- Feature: Improve control flow in SurrealQL by introducing RETURN breaking and block expressions HOT 1
- Add REMOVE TABLE foo IF EXIST HOT 2
- Feature: Limit outgoing connections to certain hosts by Root, NS and DB HOT 3
- Bug: Edge inserted via INSERT cannot be queried with -> HOT 2
- Bug: (Rust Driver) Serializing trait objects using `typetag` crate fails HOT 2
- Bug: non-conforming fields are silently deleted
- Feature: Add more functions for supporting semver strings HOT 1
- Feature: Raise an error when query parameters are missing HOT 1
- Bug: Live query inconsistent behaviour HOT 3
- Feature: Searching through array of string or object
- Feature: Is there a feature to unnest array into rows ?
- Feature: Custom tokenizer and custom filter using function HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from surrealdb.