GithubHelp home page GithubHelp logo

Comments (43)

miguelzapaton avatar miguelzapaton commented on June 2, 2024 3

...
Only thing left: does this break the API or not? (a la semver defintion)

@gabywh Great resume!

Maybe the logic of semver is simply not the right one for schema versioning.

A very recommended read:

https://snowplowanalytics.com/blog/2014/05/13/introducing-schemaver-for-semantic-versioning-of-schemas/

from eu-dcc-schema.

dirkx avatar dirkx commented on June 2, 2024 2

from eu-dcc-schema.

jschlyter avatar jschlyter commented on June 2, 2024 2

The schema is a formal specification of the business rules and it is correct that the schema is not the arbiter. However, the schema is the technical control and changing the allowed syntax for property to a more permissive format is indeed a breaking change as validating a dob´ such as 2011with schema 1.0.0 will fail. Following semver, it's fine to removeformat: date`, but we have to bump the major version.

Code generators use the JSON schema to generate their data model, and when they find format: date they interpret that as a ISO8601 date per RFC3339 (section 5.6, full-date). Trying to parse 2011 or 2021-05 with such code will fail.

We either move to schema v2 or continue to require a full date. I'm fine with either, but the change has to be done correctly.

from eu-dcc-schema.

martin-lindstrom avatar martin-lindstrom commented on June 2, 2024 1

Imo there's no breaking change in this case as dob it is still type: 'string' (it was never a real date type) and the change is fully backwards compatible.

It most certainly was a real date type. And most of the code generated from the schema expects a real date.

from eu-dcc-schema.

ryanbnl avatar ryanbnl commented on June 2, 2024 1

This regex will match dates from 1900 to 2099 in the ISO 8601 acceptable formats (YYYY, YYYY-MM, YYYY-MM-DD)

^([1|2][0|9][\d]{2}|[1|2][0|9][\d]{2}-(01|02|03|04|05|06|07|08|09|10|11|12)|[1|2][0|9][\d]{2}-(01|02|03|04|05|06|07|08|09|10|11|12)-(01|02|03|04|05|06|07|08|09|10|11|12|13|14|15|16|17|18|19|20|21|22|23|24|25|26|27|28|29|30|31))$

I have a large set of test cases should you wish to use it. Let me know :)

from eu-dcc-schema.

gabywh avatar gabywh commented on June 2, 2024 1

Such a breaking change of the schema will require changes to all existing implementations. The schema uses format date and therefore requires a full date as YYYY-MM-DD.

no

This will make the schema version go to 2.0.0 as the change is not backwards compatible.

no

It was found that it is somewhat common for dates of birth to not be exactly known; e.g. when just a year is known; or the UNHCR style "summer" or "winter" (in NL - about 70k people). As a result - it is common for passports not to contain these. Some countries round this to 01-01 and/or 06-01 -- but not all do. A variation with just the year and month known - but not the date is also happening.

I think you mean "day" and not "date" here in

but not the date is also happening

?

The goal of the DOB is binding with the passport. As the passport ALSO does NOT contain these - the regex needs adjustment.

Yes. Also once again tech people losing focus: the JSON schema is not the final arbirter here, despite the temptation for coders to consider it as such.

From the README.md https://github.com/ehn-digital-green-development/ehn-dgc-schema/blob/next/README.md

The schema in conjunction with business rules (which may also be specific to a Member State) shall ensure conformity to the EU legislation, whereby the burden of conformity lies with the business rules and the DGC JSON schema plays a supporting role in allowing the data to be serialized and deserialized in a flexible, yet structured, manner

from eu-dcc-schema.

martin-lindstrom avatar martin-lindstrom commented on June 2, 2024

So you mean that a date of birth only containing the year should be allowed? When was this decision made?

Informing @sondaica also...

from eu-dcc-schema.

jschlyter avatar jschlyter commented on June 2, 2024

Such a breaking change of the schema will require changes to all existing implementations. The schema uses format date and therefore requires a full date as YYYY-MM-DD.

This will make the schema version go to 2.0.0 as the change is not backwards compatible.

from eu-dcc-schema.

sondaica avatar sondaica commented on June 2, 2024

How is that according to the regulation? It states date of birth.

from eu-dcc-schema.

sondaica avatar sondaica commented on June 2, 2024

So then we maybe should use regexp according to ICAO Machine Readable Travel Document recommendation with the use of X for unknown elements. For example YYYY-XX-XX when only YYYY is known.

from eu-dcc-schema.

martin-lindstrom avatar martin-lindstrom commented on June 2, 2024

We also need to remove the "format": "date" from the definition of dob.

This small change will have rather big impact on all implementations. And I agree with @jschlyter that this is a breaking change.

from eu-dcc-schema.

miguelzapaton avatar miguelzapaton commented on June 2, 2024

This one should work for all and is fully backwards compatible:


"dob": {
      "title": "Date of birth",
      "description": "Date of Birth of the person addressed in the DGC. ISO 8601 date format restricted to range 1900-2099",
      "type": "string",
      "oneOf": [
        {
          "format": "date",
          "pattern": "^(?:((19|20)\\d{2}-\\d{2}-\\d{2})|((19|20)\\d{2}))$",
          "examples": [
            "1979-04-14",
            "1979"
          ]
        },
        {
          "pattern": "^((19|20)\\d{2}-XX-XX)$",
          "examples": [
            "1979-XX-XX"
          ]
        }
      ]
    },

from eu-dcc-schema.

jschlyter avatar jschlyter commented on June 2, 2024

@miguelzapaton that is better, but since the date format is no longer mandatory it is still a breaking change. I see no other option than to move to 2.0.0 (or break the semantic version scheme).

from eu-dcc-schema.

miguelzapaton avatar miguelzapaton commented on June 2, 2024

@miguelzapaton that is better, but since the date format is no longer mandatory it is still a breaking change. I see no other option than to move to 2.0.0 (or break the semantic version scheme).

If format: 'date' is removed then a date like 1979-99-44 would be validated correctly.

Imo there's no breaking change in this case as dob it is still type: 'string' (it was never a real date type) and the change is fully backwards compatible.

oneOf would no longer be needed and it could be defined like this (untested!):

"dob": {
      "title": "Date of birth",
      "description": "Date of Birth of the person addressed in the DGC. ISO 8601 date format restricted to range 1900-2099",
      "type": "string",
      "pattern": "^(?:((19|20)\\d{2}-\\d{2}-\\d{2})|((19|20)\\d{2})|((19|20)\\d{2}-XX-XX))$",
      "examples": [
        "1979-04-14",
        "1979",
        "1979-XX-XX",
      ]
    },

To avoid wrong dates like 1979-99-44 the first regexp part could still be strengthened.

Here you can find an extreme example:

https://stackoverflow.com/questions/15491894/regex-to-validate-date-format-dd-mm-yyyy

You could even try to find the exact regexp that corresponds to the definition of format: 'date' to make it semantically equal:

https://json-schema.org/draft/2020-12/json-schema-validation.html#rfc.section.7.3.1

Miguel

from eu-dcc-schema.

gabywh avatar gabywh commented on June 2, 2024

Such a breaking change of the schema will require changes to all existing implementations. The schema uses format date and therefore requires a full date as YYYY-MM-DD.

This will make the schema version go to 2.0.0 as the change is not backwards compatible.

this is not breaking back compat

from eu-dcc-schema.

gabywh avatar gabywh commented on June 2, 2024

This regex will match dates from 1900 to 2099 in the ISO 8601 acceptable formats (YYYY, YYYY-MM, YYYY-MM-DD)

^([1|2][0|9][\d]{2}|[1|2][0|9][\d]{2}-(01|02|03|04|05|06|07|08|09|10|11|12)|[1|2][0|9][\d]{2}-(01|02|03|04|05|06|07|08|09|10|11|12)-(01|02|03|04|05|06|07|08|09|10|11|12|13|14|15|16|17|18|19|20|21|22|23|24|25|26|27|28|29|30|31))$

I have a large set of test cases should you wish to use it. Let me know :)

Stop. This is going completely the wrong way here!
Once again: the JSON is not the final arbiter.

from eu-dcc-schema.

gabywh avatar gabywh commented on June 2, 2024

I'll re-state Postel's law once more: "generous with what you accept, strict with what you send"

from eu-dcc-schema.

asitplus-pteufl avatar asitplus-pteufl commented on June 2, 2024

@gabywh agree with all the things above, but not with the aspect that the schema should only play a supporting rule. The schema is the schema, it is the minimum structure that is defined, having a fuzzy landscape for the schema is very difficult to handle. after the min. structure is established, the business rules apply.

from eu-dcc-schema.

miguelzapaton avatar miguelzapaton commented on June 2, 2024

III) What means "backwards compatibility"exactly - I think there are different perspectives:

While staying at the same major version, it must be possible to verify all future payloads with any earlier versions of the schema.

from eu-dcc-schema.

gabywh avatar gabywh commented on June 2, 2024

So then we maybe should use regexp according to ICAO Machine Readable Travel Document recommendation with the use of X for unknown elements. For example YYYY-XX-XX when only YYYY is known.

which is not valid ISO8601 and there breaks EU eHN spec

from eu-dcc-schema.

miguelzapaton avatar miguelzapaton commented on June 2, 2024

III) What means "backwards compatibility"exactly - I think there are different perspectives:

While staying at the same major version, it must be possible to verify all future payloads with any earlier versions of the schema.

So if it is like this, new minor schema versions can only be stricter with respect to existing properties or property-enhancing (as additionalProperties is true by default)?

from eu-dcc-schema.

gabywh avatar gabywh commented on June 2, 2024

@gabywh agree with all the things above, but not with the aspect that the schema should only play a supporting rule.

That is its role here, as defined right from the beginning.

The schema is the schema,

umm.. yes agree

it is the minimum structure that is defined,

yes agree

having a fuzzy landscape for the schema is very difficult to handle.

what "fuzzy landscape" ? The schema is defined as it is.

after the min. structure is established, the business rules apply.

yes agree

from eu-dcc-schema.

gabywh avatar gabywh commented on June 2, 2024

III) What means "backwards compatibility"exactly - I think there are different perspectives:

While staying at the same major version, it must be possible to verify all future payloads with any earlier versions of the schema.

So if it is like this, new minor schema versions can only be stricter with respect to existing properties or property-enhancing (as additionalProperties is true by default)?

semver is clear about when to increment major version: "MAJOR version when you make incompatible API changes"
IOW: do you need to re-write the client?
In this case no, we still use ISO8601 date. The valid cases that were valid before will still be valid. The relaxation of the spec in the schema means that some cases that were previously invalid will now be considered valid - which is the whole purpose of relaxing the dob regex

from eu-dcc-schema.

gabywh avatar gabywh commented on June 2, 2024

The schema is a formal specification of the business rules and it is correct that the schema is not the arbiter.

In which case it cannot be the formal specification of the business rules, by definition.

However, the schema is the technical control

Which means what? I make the decision in my business rules what I am willing to generate / accept.

and changing the allowed syntax for property to a more permissive format is indeed a breaking change as validating a dob´ such as 2011with schema 1.0.0 will fail. Following semver, it's fine to removeformat: date`, but we have to bump the major version.

Code generators use the JSON schema to generate their data model, and when they find format: date they interpret that as a ISO8601 date per RFC3339 (section 5.6, full-date). Trying to parse 2011 or 2021-05 with such code will fail.

Yes, this the internal contradiction that @miguelzapaton pointed out: it says it accepts ISO8601, but then drops it to only full-date i.e. a subset of ISO8601. That the tooling will always interpret as full date is what @miguelzapaton already pointed out (but unfortunately almost that post was edited away, I'll restore it). So if we really want to keep with minimal changes to tooling, then we drop date - which returns us to my * original * spec of string + regex only.

from eu-dcc-schema.

gabywh avatar gabywh commented on June 2, 2024

original post from @miguelzapaton which for unfortunately was almost completely edited away:


Three more thoughts regarding this issue and the related PR:

I) JSON Schema property definitions like the following have an "inner conflict" as they let one (code generators) expect an ISO 8601 date and thwart it in the next line by an incompatible regex:

"type": "string",
"format": "date"
"pattern": " .. some regex that allows non ISO 8601 compatible values .."

-> either "format": "date" has to be removed or the pattern has to be adapted to be compatible

II) Values like "1979" or "1979-10" can be parsed as dates but they will most likely be treated as "1979-01-01T00:00:00.000Z" or "1979-10-01T00:00:00.000Z"

-> If a passport states only "1979", a DGC with "1979-01-01" seems contra productive to me

III) What means "backwards compatibility"exactly - I think there are different perspectives:

a) A JSON data artefact created with a former JSON Schema revision still validates successfully with a later revision

-> When "format": "date" is removed and the pattern is still compatible there's no breaking change here

b) A piece of software that handles JSON data artefacts created with a former JSON Schema revision will still be able to handle artefacts created with a later revision

-> When "format": "date" is removed and the pattern doesn't enforce a compatible date format then there's probably a breaking change here

c) ??

from eu-dcc-schema.

gabywh avatar gabywh commented on June 2, 2024

Summary on this one so far:

  1. We need to relax the date spec to allow only year, regardless.

  2. Code generation tooling interprets date as full-date even if a full date is not present and that although rfc3339 also defines date-fullyear which would be more suitable in this case.

This looks more like a code generation / tooling issue to me. Assuming you are using ajv for validation then indeed ajv-formats maps date to full-date and has this hard-coded as a regex for full-date in the typescript, so without changing source this can't be modified directly (and I also think this isn't a good approach for other reasons). However, the ajv mechanism can be extended using keyword-with-code-generation-function which may be an option. Yet this is a tool-specific fix. Similarly, the ajv option allowDate also doesn't help us much here.

We can either push this out to tooling and say "tooling" issue, or attempt to accomodate the current state of tooling support and try to make lives easier for implementers. I go for the latter, thus dropping format:date and back to my original format of string + regex only.

Only thing left: does this break the API or not? (a la semver defintion)

from eu-dcc-schema.

gabywh avatar gabywh commented on June 2, 2024

We either move to schema v2 or continue to require a full date. I'm fine with either, but the change has to be done correctly.

Requiring full date is not an option

Schema version I didn't touch at all in this PR, assuming it would come up as a topic.

from eu-dcc-schema.

jschlyter avatar jschlyter commented on June 2, 2024

Requiring full date is not an option

Then I suggest move the schema to v2.0.0 and do anyOf (string with pattern ^(19|20)\d\d$, string with format:date and pattern ^(19|20)\d\d-\d\d-\d\d).

from eu-dcc-schema.

gabywh avatar gabywh commented on June 2, 2024

Requiring full date is not an option

Then I suggest move the schema to v2.0.0

am still not convinced...

and do anyOf (string with pattern ^(19|20)\d\d$, string with format:date and pattern ^(19|20)\d\d-\d\d-\d\d).

you would probably want something like ^(19|20)\d\d-\d\d(-\d\d)? as that last regex (or {0,1} instead of ? if preferred) as we still need to cater for YYYY-MM (or repeat as two patterns in anyOf(), and also be consistent with $ too here)

from eu-dcc-schema.

ryanbnl avatar ryanbnl commented on June 2, 2024

Ahh now I get what @gabywh and @miguelzapaton are saying: the JSON schema is the foundation for everything else. It must be designed to avoid changes. Then the business rules build on top of it - the business rules are much easier to change, so all of the messy real-world things (like different countries using different representations of missing dates) can go there.

It's not designed to be used by an arbitrary code generator.

The schemas do need cleaning up - as Gaby suggests, removing the confusingly detailed specifications (the regex, format of the date) would remove lots of confusion for the developers here.

from eu-dcc-schema.

gabywh avatar gabywh commented on June 2, 2024

and do anyOf (string with pattern ^(19|20)\d\d$, string with format:date and pattern ^(19|20)\d\d-\d\d-\d\d).

simplify to
string with pattern (19|20)\d\d(-\d\d){0,2}`

from eu-dcc-schema.

gabywh avatar gabywh commented on June 2, 2024

A very recommended read:

https://snowplowanalytics.com/blog/2014/05/13/introducing-schemaver-for-semantic-versioning-of-schemas/

From above link (thanks @miguelzapaton) and very concretely, I would like to propose we move to using: https://snowplowanalytics.com/blog/2014/05/13/introducing-schemaver-for-semantic-versioning-of-schemas/#schemaver
as this just fits a lot better (at least, IMHO) for the task at hand.
I can update README.md / wiki FAQ where appropriate

from eu-dcc-schema.

miguelzapaton avatar miguelzapaton commented on June 2, 2024

I think there's one important difference:

The guys of SNOWPLOW don't seem to target offline working systems.

If this is still a requirement for DGC systems (?) it would also require the "forward compatibility" that I understand @jschlyter described with:

While staying at the same major version, it must be possible to verify all future payloads with any earlier versions of the schema.

So I'm not sure if schemaver does adjust 100% to the needs either as schemaver's main concern is the "backwards-compatibility between the new schema and existing data" and not the "forward-compatibility between an older schema and future data".

So instead of using semver, schemaver, whatever you could just write an own "schema versioning policy" in the FAQ, more or less like SNOWPLOW did in the description of schemaver (chapter 2):

https://snowplowanalytics.com/blog/2014/05/13/introducing-schemaver-for-semantic-versioning-of-schemas/

from eu-dcc-schema.

gabywh avatar gabywh commented on June 2, 2024

from eu-dcc-schema.

jschlyter avatar jschlyter commented on June 2, 2024

@gabywh if you generate code based on schema v1 it will not be able to read data with dob containing only a year, or year+month. Nor can schema v1 be used to validate such data.

However, one could handle this is a misinterpretation of the requirements and just do the change anyway with 1.1.0. At this point it is perhaps more important to move forward with a stable schema than to discuss.

from eu-dcc-schema.

miguelzapaton avatar miguelzapaton commented on June 2, 2024

@gabywh No trailing $ ?

https://regexr.com/5ti7a

https://regexr.com/5tgqa

from eu-dcc-schema.

jschlyter avatar jschlyter commented on June 2, 2024

Yes, we need the trailing $

from eu-dcc-schema.

dirkx avatar dirkx commented on June 2, 2024

Folks - no matter what we do - keep in mind that it is not unusual for people their Identity cards, passports and other civic (or medical) records to not have a full date of birth - but just the year (in The Netherlands; for about 70k people). Simply as it is not known.

It is perfectly possible for a technical standard to be a bit of an ill fit with reality.

But as we issue these DGC's to real people - we need to accommodate them.

from eu-dcc-schema.

jschlyter avatar jschlyter commented on June 2, 2024

@dirkx we agree, the discussion is whether we need to bump to schema v2 because of this breaking change. I suggest that we turn a blind eye and do v1.1.0.

from eu-dcc-schema.

gabywh avatar gabywh commented on June 2, 2024

@dirkx we agree, the discussion is whether we need to bump to schema v2 because of this breaking change. I suggest that we turn a blind eye and do v1.1.0.

Is not turning a blind eye and we need to do this properly rather than hack something together. An argument could even be made that this is a patch fix so we could go to 1.0.2 since the original regex was simply broken with respect to the incoming data. Thanks for your suggestion in any case, appreciate your input as something I can take into my own deliberations when making the final decision.

from eu-dcc-schema.

gabywh avatar gabywh commented on June 2, 2024

@gabywh if you generate code based on schema v1 it will not be able to read data with dob containing only a year, or year+month. Nor can schema v1 be used to validate such data.

Right. So this is about old generated code not handling a newer schema. So you generate the new code. You don't need to change the client side and you also do not need to change any data since the new generated code will also handle old schema data. That is what gives you the back-compatibility.

from eu-dcc-schema.

jschlyter avatar jschlyter commented on June 2, 2024

Nor can schema v1 be used to validate such data.

This is the vital part. If you've generated a DGC with schema version NEXT (with the relaxed date format), it is not accepted when validating with schema version 1.0.0. Verifiers who are written for schema version 1.0.0 will reject such a DGC unless updated. Hence, all clients doing payload validation must be updated with new code.

If the schema would have been backwards compatible payload from schema version NEXT would have been accepted by schema version 1.0.0.

from eu-dcc-schema.

miguelzapaton avatar miguelzapaton commented on June 2, 2024

Interested folks may have a look at my Proof of Concept for DGC JSON Data Validation with value-extended JSON Schemas:

https://github.com/miguelzapaton/ehn-dgc-schema-json-validator

After two quick validation runs with currently 507 DGC JSON test files you can also check the effect of the schema change from 1.0.1 to 1.1.0 there:

Validation run with schema version 1.0.1

https://github.com/miguelzapaton/ehn-dgc-schema-json-validator/blob/main/log/dgc-testdata-latest/validation_example_run_20210526_schema_version_1-0-1/validation_run_20210526_111304.log

Validation run with schema version 1.1.0

https://github.com/miguelzapaton/ehn-dgc-schema-json-validator/blob/main/log/dgc-testdata-latest/validation_example_run_20210526_schema_version_1-1-0/validation_run_20210526_111214.log

The log file sizes already give an idea of what happened ... less validation errors :)

A diff shows all the details..

from eu-dcc-schema.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.