GithubHelp home page GithubHelp logo

Comments (9)

tobiemh avatar tobiemh commented on April 28, 2024 3

Haha @rushmorem this is an interesting discussion. Just before 1.0.0-beta.1 we had a VOID type too. It all got a bit confusing!

I'll just summarise why we have both a NONE and a NULL type, and then we can go from there.

Setting a field to NULL explicitly sets the field on the document (if the FIELD definition allows it) so that it exists, but its value is null (in JSON terms). We also need to deal with null values from JSON when working with JavaScript libraries.

Setting a field to NONE explicitly deletes a field on the document (if the FIELD definition allows it) so that it no longer exists on the record. This means the record takes up less space.

When searching for values we effectively want three different queries (and there may be an issue here, because currently NONE = NULL, but we do have an == operator to check on value type aswell).

  1. SELECT all records where the field doesn't exist at all, or does exist and the value is null.
  2. SELECT all records where the field doesn't exist at all.
  3. SELECT all records where the field exists and the value is null.

Current you can do this using...

  1. SELECT * FROM person WHERE email = NONE; or SELECT * FROM person WHERE email IS NONE;
  2. SELECT * FROM person WHERE email == NONE;
  3. SELECT * FROM person WHERE email == NULL;

I think it's also important to note, that although we can have schema-full tables, with fields which have specific types, we can also have schema-full tables without any specific type defined on a field, and schame-less tables where any field values can be inserted. As a result, an index on a field doesn't necessarily know the type of that field, or know that all values in that field are of a certain type.

from surrealdb.

tobiemh avatar tobiemh commented on April 28, 2024 1

Finally, instead of EMPTY I think we should use NONE for consistency. That is IGNORE NONE not IGNORE EMPTY. Also empty is ambiguous. For example, users might think that empty is referring to that particular type's representation of an empty value rather than NONE. For example, if that field is of type string they might think we are referring to "".

@rushmorem I agree with this. Couchbase has a MISSING keyword which can be used to find fields which exist but are empty, or fields which are not set (don't exist on the record). This is kind of what I meant with EMPTY, but I absolutely agree that users might expect "" to also equate to EMPTY!

from surrealdb.

tobiemh avatar tobiemh commented on April 28, 2024

Hi @rushmorem this is an interesting issue.

Looking at the FAQs for SQLite (https://sqlite.org/faq.html#q26), there's definitely two different opinions on this.

I'd like to continue supporting the current approach either way, so my thoughts are:

  1. Default to including null/none values, and add the ability to specify that unique indexes shouldn't include them...
DEFINE INDEX email ON TABLE user FIELDS email UNIQUE;
DEFINE INDEX email ON TABLE user FIELDS email UNIQUE IGNORE EMPTY;
  1. Default to not including null/none values, and add the ability to specify that unique indexes should include them...
DEFINE INDEX email ON TABLE user FIELDS email UNIQUE;
DEFINE INDEX email ON TABLE user FIELDS email UNIQUE INCLUDE EMPTY;

I personally prefer the first approach, as that is how it currently works, and it's easier to 'ignore' something from the default, rather than 'include' it. However, it would be great to get other thoughts on this.

from surrealdb.

rushmorem avatar rushmorem commented on April 28, 2024

Hi @tobiemh, yes NULLs are definitely an interesting topic to say the least. For the purpose of just getting the job done I think either approach is fine. However, I think the most suitable approach in terms of correctness and least surprising behaviour depends on the semantics we decide to go for. That is, whether we decide to adopt the SQL NULL or NONE with Rust's semantics.

According to this Stanford article:-

A boolean comparison between two values involving a NULL returns neither true nor false, but unknown in SQL's three-valued logic. [3] For example, neither NULL equals NULL nor NULL not-equals NULL is true. Testing whether a value is NULL requires an expression such as IS NULL or IS NOT NULL.

This means NULL values cannot be compared against each other and as such should be treated as different. If we translate this to Rust, it's analogous to comparing two Nones of different types, which is not possible by default. That is, None::<T> == None::<U> which doesn't even compile by default.

If we take this approach, option 2 becomes the most intuitive one. This is the mindset I opened this issue in.

On the other hand, if we choose to go with Rust semantics, given that we know a field is of the same type T, then NONE becomes comparable. That is None::<T> == None::<T> which is always true. That makes option 1 the most intuitive implementation.

Given that nulls in general, as they are implemented in SQL and other languages, are very confusing and error prone I would like to propose that we adopt Rust semantics of None and stick to that. Since we are not fully StructuredQL compliant anyway, I think that strategy will pay off for us. It can eliminate a whole lot of confusion. To benefit from this we should strictly stick to NONE and not recognise the SQL keyword NULL at all.

Finally, instead of EMPTY I think we should use NONE for consistency. That is, IGNORE NONE not IGNORE EMPTY. Also empty is ambiguous. For example, users might think that empty is referring to that particular type's representation of an empty value rather than NONE. For example, if that field is of type string they might think we are referring to "".

from surrealdb.

tobiemh avatar tobiemh commented on April 28, 2024

This also is related to another issue (of yours) and a comment that I have recently added... #73 (comment)

from surrealdb.

rushmorem avatar rushmorem commented on April 28, 2024

Thanks for the explanation @tobiemh! Seeing NULL mentioned in some places and NONE in others was confusing me. I thought perhaps you had started off using NONE but now want to move to NULL, which would be more familiar to people coming from StructuredQL. This explains it.

This situation is definitely confusing. I'm glad you got rid of VOID! 😄 It would be even more confusing now. Now let's close the loop and get rid of NULL. As per that article I linked earlier SELECT * FROM person WHERE email == NULL; shouldn't even be possible, only SELECT * FROM person WHERE email IS NULL;. This will likely confuse users.

When searching for values we effectively want three different queries

So if we get rid of NULL, that will reduce those queries to just one:

SELECT all records where the field doesn't exist at all

right? Coupled with the space savings of not having those fields on the document at all, that sounds awesome! Perhaps I'm missing something but I don't see any reason why one would want NULL over NONE in this case. After all, whether we explicitly set the field on the document or skip it altogether should be an implementation detail, right?

from surrealdb.

rushmorem avatar rushmorem commented on April 28, 2024

We also need to deal with null values from JSON when working with JavaScript libraries.

I can see how this can be a problem for fields or tables with no defined schema if we want to give users exactly the same documents they inserted (are we not potentially already altering the documents in some cases by casting them to our internal types?). I'm not familiar with how JavaScript handles missing fields. Doesn't it deserialise them to null anyway or is it something the developer would have to handle manually?

If it's the former then this shouldn't be a problem, right? If it's the later, would documenting that JSON fields with null values will not be returned when the data is queried help?

Another way we could handle this, if I'm understanding the problem correctly, is to not set those fields at all for defined fields. Since we still know about this field, we can still return it as NONE on SELECTs (or null in JSON documents). For schema-less tables where this field is not defined, we explicitly set it to null so that we can return it when queried.

from surrealdb.

rushmorem avatar rushmorem commented on April 28, 2024

As a result, an index on a field doesn't necessarily know the type of that field, or know that all values in that field are of a certain type.

Would it make sense to handle cases were the index does know differently from cases where it doesn't? This is what Rust does.

For fields with a defined type that translates to None::<T> == None::<T> so a uniqueness index can compare these and treat them as equal. To filter out NONE values, one would have to IGNORE NONE, as per your first option.

For fields with no defined type that translates to None::<T> == None::<U>. We can't presume to know if they are equal or not. Just like Rust refuses to compile these by default, how about refusing to index them? When one tries to define an index on an undefined type field we return a descriptive error telling them that they need to define the field's type first.

We can even extend this by warning users when they define indexes on types that the database knows to not have size restrictions or are not small so they don't shoot themselves in the foot by indexing large fields. That is, warn when defining an index on fields that do not have ASSERT is::uuid($value), ASSERT is::domain($value) etc.

from surrealdb.

Noc0r avatar Noc0r commented on April 28, 2024

@tobiemh Hi! May this not fully related to this problem, but even here I see wrong ID of record, when it create duplicate.

# The first person with no national_id will be accepted
> CREATE foo SET name = "John Doe";
[{"time":"266.761µs","status":"OK","result":[{"id":"foo:yki2758ba7dwzastfsk1","name":"John Doe","national_id":"NONE"}]}]

So, id is "foo:yki2758ba7dwzastfsk1"

# Any subsequent records without a national_id will be rejected
> CREATE foo SET name = "Jane Doe";
[{"time":"168.721µs","status":"ERR","detail":"Database index `national_id_idx` already contains `foo:4vgq4l8u4y211c3ekgn9`"}]

But here it refers to non existed id "foo:4vgq4l8u4y211c3ekgn9".

Thank you in advance for answer and for this awesome project!

from surrealdb.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.