GithubHelp home page GithubHelp logo

Comments (15)

alexhenrie avatar alexhenrie commented on June 10, 2024

I would love that. Can you submit a MIME type application on our behalf? There are no security issues.

from specifications.

charles-plessy avatar charles-plessy commented on June 10, 2024

After double-checking the information before starting the registration, I have extra questions or comments:

  • Required/Optional Parameters: as the GFF3 specification does not define a default character set, I think that a charset parameter is required. RFC 6838 states that in absence of a default, the charset parameter should be required rather than optional.

  • Interoperability: are Windows CR LF and Macintosh CR line endings allowed, or only Unix LF?

  • Contact point: Sequence Ontology <[email protected]>?

from specifications.

alexhenrie avatar alexhenrie commented on June 10, 2024

I think the only sane thing to do is to treat text/gff3 the same as text/tab-separated-values, which requires the charset parameter and allows system-dependent line endings.

As far as I know the SourceForge mailing list is still the best generic contact email for the Sequence Ontology, but I am waiting to hear back from other contributors to see if there is a better option.

from specifications.

charles-plessy avatar charles-plessy commented on June 10, 2024

Hi Alex, did you hear back from your colleagues ?

from specifications.

alexhenrie avatar alexhenrie commented on June 10, 2024

@charles-plessy Sorry this took so long. We decided to move the mailing list from SourceForge to the University of Utah, so please use [email protected] as the contact address for the text/gff3 media type. Thanks!

from specifications.

charles-plessy avatar charles-plessy commented on June 10, 2024

Submitted ticked #1171656 to the IANA.

from specifications.

charles-plessy avatar charles-plessy commented on June 10, 2024

I had questions back from the IANA, one of which is whether Sequence Ontology is planning to submit more registrations in the future, and another about the magic string:

Magic number(s): This supposed to be a length and description of an octet sequence.
Since it's a text format, I think they probably don't actually need this, but if they want
to include it, this should be corrected. In the example in the specification, it's actually
a different string (ends in "3.2.1").

To be honest, I have only seen ##gff-version 3.2.1 in the spec and nowhere else. Is that a way to suggest that minor and patch versions can be added to the version number, or is that really the current version number of the spec? In any case, can I answer that only the major number matters for format detection ?

from specifications.

alexhenrie avatar alexhenrie commented on June 10, 2024

We are not planning to submit more registrations in the future.

##gff-version 3.2.1 is an example; you're supposed to replace the 2.1 with the current spec version. I have updated the examples to say ##gff-version 3.1.25 instead to eliminate the confusion. Nevertheless, you are correct that ##gff-version 3 is sufficient. Quoting from the spec:

The GFF version follows the format of 3.#.# in this spec. This directive must be present, must be the topmost line of the file. The version number always begins with 3, the second and third numbers are optional and indicate a major revision and a minor revision respectively.

from specifications.

alexhenrie avatar alexhenrie commented on June 10, 2024

Actually, come to think of it, in the future we might want an IANA registration for the GVF file format as well, which is almost identical to the GFF3 format.

from specifications.

charles-plessy avatar charles-plessy commented on June 10, 2024

Our submission was reviewed and we have two questions (slightly rephrased):

  1. Is the magic number (##gff-version 3) signature is only using ASCII subset and is constant irrespective of the encoding used in the rest of the file ?

  2. For the recommended encodings, as "Latin-1" nor "Unicode" are not valid charset parameter values, can they be changed to ISO-8859-1 and UTF-8 respectively ?

For question 1) if we answer yes, then the consequence is that for encodings that do not have backwards compatibility with ASCII, we can not detect the gff3 media type via this definition of its magic number. Given that it is rare to use them in bioinformatics, I assume that it is not a big problem. (Note that "magic numbers" are binary.)

For 2) I assume we all agree that ISO-8859-1 and UTF-8 are correct.

from specifications.

alexhenrie avatar alexhenrie commented on June 10, 2024
  1. The "magic number" should be treated the same way as in text/xml:

    Magic number(s): None.

    Although no byte sequences can be counted on to always be
    present, XML MIME entities in ASCII-compatible character sets
    (including UTF-8) often begin with hexadecimal 3C 3F 78 6D 6C
    ("<?xml"), and those in UTF-16 often begin with hexadecimal FE
    FF 00 3C 00 3F 00 78 00 6D 00 6C or FF FE 3C 00 3F 00 78 00 6D
    00 6C 00 (the Byte Order Mark (BOM) followed by "<?xml"). For
    more information, see Appendix F of [XML].

  2. I have removed the reference to "Latin-1" from the spec. UTF-8 is now the only explicitly recommended character encoding, although any character encoding is still valid.

from specifications.

charles-plessy avatar charles-plessy commented on June 10, 2024

Good morning,

and good news: the text/gff3 media type has been registered:

https://www.iana.org/assignments/media-types/text/gff3

from specifications.

alexhenrie avatar alexhenrie commented on June 10, 2024

That's great news! Thank you for all your effort, Charles. I just bumped the GFF version up to 1.26 so that we have a nice point of reference for exactly what the specification said when the media type was registered.

from specifications.

alexhenrie avatar alexhenrie commented on June 10, 2024

Actually, there's a typo in the media type registration:

GFF3 data in ASCII-compatible character sets (including UTF-8) often begin with hexadecimal 23 23 67 66 66 2d 76 65 72 73 69 6f 6e 20 33 ("##gff-version3").

The hex sequence is correct, but the plaintext equivalent is missing the space before the 3. Is that something that can be corrected, or is this now set in stone?

from specifications.

charles-plessy avatar charles-plessy commented on June 10, 2024

Thanks a lot for spotting my typo. It has been promptly corrected by the IANA!

from specifications.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.