Comments (15)
I would love that. Can you submit a MIME type application on our behalf? There are no security issues.
from specifications.
After double-checking the information before starting the registration, I have extra questions or comments:
-
Required/Optional Parameters: as the GFF3 specification does not define a default character set, I think that a
charset
parameter is required. RFC 6838 states that in absence of a default, thecharset
parameter should be required rather than optional. -
Interoperability: are Windows
CR LF
and MacintoshCR
line endings allowed, or only UnixLF
? -
Contact point:
Sequence Ontology <[email protected]>
?
from specifications.
I think the only sane thing to do is to treat text/gff3 the same as text/tab-separated-values, which requires the charset
parameter and allows system-dependent line endings.
As far as I know the SourceForge mailing list is still the best generic contact email for the Sequence Ontology, but I am waiting to hear back from other contributors to see if there is a better option.
from specifications.
Hi Alex, did you hear back from your colleagues ?
from specifications.
@charles-plessy Sorry this took so long. We decided to move the mailing list from SourceForge to the University of Utah, so please use [email protected]
as the contact address for the text/gff3
media type. Thanks!
from specifications.
Submitted ticked #1171656 to the IANA.
from specifications.
I had questions back from the IANA, one of which is whether Sequence Ontology is planning to submit more registrations in the future, and another about the magic string:
Magic number(s): This supposed to be a length and description of an octet sequence.
Since it's a text format, I think they probably don't actually need this, but if they want
to include it, this should be corrected. In the example in the specification, it's actually
a different string (ends in "3.2.1").
To be honest, I have only seen ##gff-version 3.2.1
in the spec and nowhere else. Is that a way to suggest that minor and patch versions can be added to the version number, or is that really the current version number of the spec? In any case, can I answer that only the major number matters for format detection ?
from specifications.
We are not planning to submit more registrations in the future.
##gff-version 3.2.1
is an example; you're supposed to replace the 2.1
with the current spec version. I have updated the examples to say ##gff-version 3.1.25
instead to eliminate the confusion. Nevertheless, you are correct that ##gff-version 3
is sufficient. Quoting from the spec:
The GFF version follows the format of 3.#.# in this spec. This directive must be present, must be the topmost line of the file. The version number always begins with 3, the second and third numbers are optional and indicate a major revision and a minor revision respectively.
from specifications.
Actually, come to think of it, in the future we might want an IANA registration for the GVF file format as well, which is almost identical to the GFF3 format.
from specifications.
Our submission was reviewed and we have two questions (slightly rephrased):
-
Is the magic number (
##gff-version 3
) signature is only using ASCII subset and is constant irrespective of the encoding used in the rest of the file ? -
For the recommended encodings, as "
Latin-1
" nor "Unicode
" are not valid charset parameter values, can they be changed toISO-8859-1
andUTF-8
respectively ?
For question 1) if we answer yes, then the consequence is that for encodings that do not have backwards compatibility with ASCII, we can not detect the gff3 media type via this definition of its magic number. Given that it is rare to use them in bioinformatics, I assume that it is not a big problem. (Note that "magic numbers" are binary.)
For 2) I assume we all agree that ISO-8859-1
and UTF-8
are correct.
from specifications.
-
The "magic number" should be treated the same way as in text/xml:
Magic number(s): None.
Although no byte sequences can be counted on to always be
present, XML MIME entities in ASCII-compatible character sets
(including UTF-8) often begin with hexadecimal 3C 3F 78 6D 6C
("<?xml"), and those in UTF-16 often begin with hexadecimal FE
FF 00 3C 00 3F 00 78 00 6D 00 6C or FF FE 3C 00 3F 00 78 00 6D
00 6C 00 (the Byte Order Mark (BOM) followed by "<?xml"). For
more information, see Appendix F of [XML]. -
I have removed the reference to "Latin-1" from the spec. UTF-8 is now the only explicitly recommended character encoding, although any character encoding is still valid.
from specifications.
Good morning,
and good news: the text/gff3 media type has been registered:
https://www.iana.org/assignments/media-types/text/gff3
from specifications.
That's great news! Thank you for all your effort, Charles. I just bumped the GFF version up to 1.26 so that we have a nice point of reference for exactly what the specification said when the media type was registered.
from specifications.
Actually, there's a typo in the media type registration:
GFF3 data in ASCII-compatible character sets (including UTF-8) often begin with hexadecimal 23 23 67 66 66 2d 76 65 72 73 69 6f 6e 20 33 ("##gff-version3").
The hex sequence is correct, but the plaintext equivalent is missing the space before the 3. Is that something that can be corrected, or is this now set in stone?
from specifications.
Thanks a lot for spotting my typo. It has been promptly corrected by the IANA!
from specifications.
Related Issues (20)
- Code chunk formatting problem HOT 6
- Questions/Clarifications for next GFF3 version
- Clarification of GFF3 "Programmed frameshift" example HOT 6
- Encoding of GFF3 files HOT 1
- why there is no gene_id, transcript_id in the example gff file given in .md file? HOT 2
- Gap attribute - CIGAR description - dead link
- Citing the GFF3 spec
- Is there a single canonical validator, or multiple implementations? HOT 14
- Allow a mapping between labels and ontology term IDs in the header of GFF
- Difference between phase and frame is unclear in the GFF3 spec HOT 4
- Clarification on the use of the sequence region directive. HOT 2
- gff3 header delimiter
- Trailing semicolons at GFF3 attributes should be avoided or ignored? HOT 1
- Can't access "Ontology Associations and DB Cross References" files HOT 2
- SO subclasses of "match" incorrect HOT 1
- Protein to nucleotide matches over introns HOT 1
- Phase missing in example section GFF3
- Multiple sequences in a single file
- What is the "reserved meaning" of an ampersand (&) in column 9?
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from specifications.