whatwg / infra Goto Github PK

View Code? Open in Web Editor NEW

115.0 36.0 92.0 355 KB

Infra Standard

Home Page: https://infra.spec.whatwg.org/

License: Other

Makefile 0.08% HTML 99.92%

whatwg infra standard

infra's Introduction

This repository hosts the Infra Standard.

Code of conduct

We are committed to providing a friendly, safe, and welcoming environment for all. Please read and respect the Code of Conduct.

Contribution opportunities

Folks notice minor and larger issues with the Infra Standard all the time and we'd love your help fixing those. Pull requests for typographical and grammar errors are also most welcome.

Issues labeled "good first issue" are a good place to get a taste for editing the Infra Standard. Note that we don't assign issues and there's no reason to ask for availability either, just provide a pull request.

If you are thinking of suggesting a new feature, read through the FAQ and Working Mode documents to get yourself familiarized with the process.

We'd be happy to help you with all of this on Chat.

Pull requests

In short, change infra.bs and submit your patch, with a good commit message.

Please add your name to the Acknowledgments section in your first pull request, even for trivial fixes. The names are sorted lexicographically.

To ensure your patch meets all the necessary requirements, please also see the Contributor Guidelines. Editors of the Infra Standard are expected to follow the Maintainer Guidelines.

Tests

Tests are an essential part of the standardization process and will need to be created or adjusted as changes to the standard are made. Tests for the Infra Standard can be found in the infra/ directory of web-platform-tests/wpt.

A dashboard showing the tests running against browser engines can be seen at wpt.fyi/results/infra.

Building "locally"

For quick local iteration, run make; this will use a web service to build the standard, so that you don't have to install anything. See more in the Contributor Guidelines.

Formatting

Use a column width of 100 characters.

Do not use newlines inside "inline" elements, even if that means exceeding the column width requirement.

<p>The
<dfn method for=DOMTokenList lt=remove(tokens)|remove()><code>remove(<var>tokens</var>&hellip;)</code></dfn>
method, when invoked, must run these steps:

is okay and

<p>The <dfn method for=DOMTokenList
lt=remove(tokens)|remove()><code>remove(<var>tokens</var>&hellip;)</code></dfn> method, when
invoked, must run these steps:

is not.

Using newlines between "inline" element tag names and their content is also forbidden. (This actually alters the content, by adding spaces.) That is

<a>token</a>

is fine and

<a>token
</a>

is not.

An <li> element always has a <p> element inside it, unless it's a child of <ul class=brief>.

If a "block" element contains a single "block" element, do not put it on a newline.

Do not indent for anything except a new "block" element. For instance

 <li><p>For each <var>token</var> in <var>tokens</var>, in given order, that is not in
 <a>tokens</a>, append <var>token</var> to <a>tokens</a>.

is not indented, but

<ol>
 <li>
  <p>For each <var>token</var> in <var>tokens</var>, run these substeps:

  <ol>
   <li><p>If <var>token</var> is the empty string, <a>throw</a> a {{SyntaxError}} exception.

is.

End tags may be included (if done consistently) and attributes may be quoted (using double quotes), though the prevalent theme is to omit end tags and not quote attributes (unless they contain a space).

Place one newline between paragraphs (including list elements). Place three newlines before <h2>, and two newlines before other headings. This does not apply when a nested heading follows the parent heading.

<ul>
 <li><p>Do not place a newline above.

 <li><p>Place a newline above.
</ul>

<p>Place a newline above.


<h3>Place two newlines above.</h3>

<h4>Placing one newline is OK here.</h4>


<h4>Place two newlines above.</h4>

Use camel-case for variable names and "spaced" names for definitions, algorithms, etc.

<p>A <a for=/>request</a> has an associated
<dfn export for=request id=concept-request-redirect-mode>redirect mode</dfn>,...

<p>Let <var>redirectMode</var> be <var>request</var>'s <a for=request>redirect mode</a>.

infra's People

Contributors

Stargazers

Watchers

Forkers

jakearchibald aubakirova shekyan xfq tobie kleopatra999 domfarolino bigboned55 jebcat1982 homemaker1963 jyasskin burtharris equalsjeffh omunroe-com trowbotham dalavancloud 0xrustlang proxenet al-arz jacrites81 amonfire jaerae76 ms2ger hiramtibbit nicks1986 andreubotella canepole90 cane4044 vanessawilson0701 aphillips christian7877 meghlakhan36 therbendo polinar68 hixio-mh xirdigh portablehead c3333 martinthomson acidburn0zzz global-localhost global19 global19-atlassian-net dimberd dwayne45 magicianred bthuntercn kimstacy oflenake bosappyahoo manny27nyc is2ei cryptomoneybotz sitedata eem1919 aaronmedel1987 seanpm2001 seanwallawalla-forks oriblish skalarfeilds79 shekita88 nando4512 https-onlinedeal4unow-com cricket01 surfndez fantasai tabatkins nshcmitz86 dlrobertson bluefire32 grcspace311 bocoup slayer94 aykutbulut katoqiioo tiffbooth cxslucyfer forest-im miketaylr snowwolfjay jcolebeyond02 khunphyo24 yoavweiss bleken marietorres655 khl0de rami-daoud jofernmorais alexanderalonso890 mightb1

infra's Issues

Tuples

We use tuples in a couple places and they're very much like immutable ordered sets. The syntax is typically (element1, element2).

Needs a logo!

Ideas:

Something indicating "foundations" (a house? A lego-ish building block?)
Something very abstract (examples)

Provide a way to iterate/get values of a map

Bikeshed complains if a variable is unused, which happens if you iterate over a map but don't use the key.

The can be worked around with <var ignore>, but maybe it's better to have an explicit way to handle just values? There's already a way to get just the keys.

Control flow in algorithms

Definitions for "abort these steps" and "abort these sub-steps" would be useful (unless they're considered bad practice and should be replaced by "return" and "throw," in which case a note saying so would be great).

In particular, while the meaning of "abort these steps" is obvious when it's in the top-level of steps, it's not super explicit what it means when nested.

Similarly, does "abort these sub-steps" return control to the set of steps right above it, or to the caller of the algorithm?

Define list/truncate

[=list/Truncate=] |list| to [=list/size=] |n|.

or even:

[=list/Truncate=] |list| to |n|.

Is a lot more readable than:

[=list/Remove=] all items from |list| except the first |n|, so that |list|'s [=list/size=] is now |n|.

Lists should contain items, not elements

Otherwise having a list of elements is confusing ("each element of the queue is an HTML element").

Suggestions for the typography section

First, maybe separate out block-level styles (definition, requirement, explanation through CSS fragment; maybe also switches) from inline styles (defining instance through variables).

For inline styles, in general all of these would benefit from examples. Maybe multiple constructs per example.

This one I'm less sure about... But I think phrasing like

Other code fragments are marked up like this.

is a bit less good than

Other code fragments are marked up in monospace

with an example showing the actual usage. Otherwise it's kind of like the infrastructure standard is violating itself, by using the monospace style for things that are not actually code fragments :P

Control flow for loops

It would be great to have a less awkward way of phrasing to run a loop for the next item: https://dom.spec.whatwg.org/#concept-event-listener-inner-invoke. Basically something like "continue".

Move Web IDL conversions to Web IDL

Infra should not depend on Web IDL IMO; it should stick with things that are universally applicable to all specs, including non-Web IDL-based ones.

Byte sequences

We should also mention that byte sequences can be represented using 0x00 0xFF syntax and maybe flush out the whole concept a bit more with examples and such.

Credit: @foolip.

Comparison

Split from #6. I'd rather not define a case-sensitive match as to me that seems something that an "equal" or "is" operation would also cover, which we already use far more.

URL uses "equal" to define comparison operations for URL and host structs. Should we use "equal" as well here to define it for strings? Or maybe allow both equal and is?

Suggestion:

Allow both "is" and "equal"
Define them for strings (code points; works for JavaScript and scalar value strings) and byte sequences
Use dfn and accept that not all callers will use that (for now)

Do we need this for other data types?

Define increment

Define increment as:

Set |i| to |i| + 1;

So you can say:

[=Increment=] |i|.

or alternatively:

[=Increment=] |i| by 1.

Data structures section

Distinct from data types, I think.

In all cases we want clear instructions and examples around the verbiage for adding to/removing from/looking up in the collection.

Known used types:

Map (see module map)
List
- Ordered
- Can be indexed into, maybe with some notation
- Maybe re-use ES's notation for "list literals"? Or not; we don't so far.
- Easy conversion to/from Web IDL sequences, as explained in Web IDL somewhat informally.
Set (see ... CustomElementsRegistry? Not sure, that's kind of a map with lots of keys)
- Can also be ordered (see DOMTokenList); default to insertion order
- In an ordered set, does adding something that already exists replace, or does it remove and append at the end?

""For each key → value of map""

https://infra.spec.whatwg.org/commit-snapshots/f817d690ee9f1a7556805d4796a6ebbfe6eb127f/#map-iterate

"For each key → value of map"

The "For each" links to [=list/for each=], rather than [=map/for each=] as intended.

Criptografia WhatsApp

https://infra.spec.whatwg.org/commit-snapshots/8e8d83d4035e82b82e007ec26c1feecb565fb871/

Add basic JavaScript types

We should add undefined, null, and boolean (true, false). We haven't made much type-value distinction thus far so I'm not quite sure how to formulate this. Anyone ideas?

Tracking vector tracking

I'm not sure where exactly we'd want to put this. Thoughts?

Operation to map/transform lists

I frequently want to build a new list based on an old one by modifying the old list's elements using substeps. Perhaps:

Let newList be the result of transforming each item of oldList through the following steps:

Return item + 2.

as shorthand for:

Let newList be a new list.

For each item of oldList:

Append item + 2 to newList.

"Return" could be "include" or "append" or some other term.

The text I'm proposing isn't much shorter, but it keeps the logical operation in a single step instead of spreading it across 2.

Define string size

In particular for JavaScript string, see #73, we need something like code-unit length from HTML (and then remove that from HTML and use our new concept).

Either we define size and for JavaScript string it's the number of code units and for scalar value string it's the number of scalar values, or size is always code points and we have code-unit size just for JavaScript strings. The latter is probably slightly better since it makes it more explicit?

Use of "one of"

When referencing one of the items from a list, is using "and" or "or" more accurate one over the other? Or they don't matter much? Both of them are being used.

One of "uninstantiated", "errored", or "instantiated", used to prevent reinvocation of ModuleDeclarationInstantiation on modules that failed to instantiate previously.

If header list contains a header whose name is one of If-Modified-Since, If-None-Match, If-Unmodified-Since, If-Match, and If-Range, ...

If origin’s host component matches one of the CIDR notations 127.0.0.0/8 or ::1/128

A job is an abstraction of one of register, update, and unregister request for a service worker registration.

Deal with list[n] access where n is negative or => size

We should probably say that it's not possible or actually define what it would return. If we return something it would have to be value like "none" or some such, that doesn't mean anything else and needs to be explicitly dealt with.

byte sequence backtick representation handling of C0 controls

https://infra.spec.whatwg.org/#byte-sequences

In this section is the text:

Byte sequences with bytes in the range 0x00 to 0x7F, inclusive, can alternately be written as a string, but using backticks instead of quotation marks, to avoid confusion with an actual string.

This is intended for showing ASCII byte sequences as strings, but ignores that control characters such as NUL, escape, newline, etc. are not printable or would mess up the display (or show as tofu boxes and cannot be discerned). I'd suggest making the range go from 0x20 to 0x7F instead.

Is the backtick byte sequence representation really that useful anyway?

Using generics for bytes / code units / code points

See #1 for some discussion on code units.

A term like "ASCII digit" and others like it are equally meaningful for all three primitives, since the primitives are defined as integers. Should we define these terms as generics so they can apply to each primitive?

Alternative we could change the phrasing, e.g., "An ASCII digit is a byte, code unit, or code point in the range 0x30 to 0x39, inclusive." This would also require slight tweaking of how we define "byte" and "code point".

Avoid "easy", "simply"

These words do not add anything but risks having the reader feel dumb if they don't understand something supposedly simple.

Record-like data structure

I'd like URL record, request, and response to just be some data structure so you can more easily address their members.

They're basically maps with fixed keys or what JavaScript calls records. The values are mostly mutable still, but thus far they don't have things similar to methods.

Move algorithms into its own top-level section

I think there's enough there to warrant that now.

Sketch out prose for algorithm definitions.

It would be lovely if we could agree upon a standard way of describing algorithms in specs. For instance, it's helpful to understand expected inputs and outputs, but there's no commonly shared way of spelling those out. Some examples:

WebAuthn has note blocks describing inputs: https://w3c.github.io/webauthn/#makeCredential, and describes outputs in prose.
CSP describes both inputs and outputs in prose, usually in the form 'Given a request’s cryptographic nonce metadata (nonce) and a source list (source list), this algorithm returns "Matches" if the nonce matches one or more source expressions in the list, and "Does Not Match" otherwise:'.
ECMAScript describes inputs but not outputs: "The abstract operation PerformEval with arguments x, evalRealm, strictCaller, and direct performs the following steps:"

And so on. It would be great if we could align this to enough of an extent that we could start building tooling support for the callsite as well.

Are Web IDL sequences lists?

Or do you convert lists to Web IDL sequences?

I think the big difference is that as defined here, lists can contain abstract things. Whereas Web IDL sequences can only contain things which are properly part of the Web IDL type system.

Maybe what we want to do here is state something like "often we use lists in a place that expects sequences, or treat sequences like lists. This kind of implicit conversion is OK, as long as the type systems match up."

Define character as alias of code point or stop using it

We're currently using the term character to define syntax. We should probably stop doing that and use the syntax we outline for code points. Slightly weird to be informal here while we require much more of others.

See also #6 on the topic of whether or not to stop using character altogether as something that means code point all or some of the time (it seems somewhat silly to make it mean code point for something where Unicode says the code point is a non-character, but not out of the question).

What was the motive in choosing algorithm description approach?

Infra Standard defines pseudocode and algorithm description approach that is different from what I usually find in academic papers or classic books like Introduction to Algorithms. Standard also notes that described algorithms aren't intended to be performant:

Conformance requirements phrased as algorithms or specific steps may be implemented in any manner, so long as the end result is equivalent. (In particular, the algorithms are intended to be easy to follow, and not intended to be performant.)

What was the motive in choosing this specific style of algorithms description?

"May" in a note

implementations may optimize based on the fact that the order is not observable.

Describe the switch construct

<dl class=switch>

Define numbers (waiting on Number / BigInt)

Should we define numbers and there various notation schemes? (Mathematical operators?)

It might also make sense to define null as being roughly analogous to JavaScript's null and a good initial value for variables.

Rethink strings

I need to study the various dependencies of strings and figure out what we want to do. It seems there's a couple kind of strings that probably need to be distinguished and named somehow:

JavaScript strings - each code point is in the range U+0000 to U+FFFF
scalar value strings - each code point is a scalar value
byte strings - each code point is in the range U+0000 to U+00FF
ASCII strings - each code point is an ASCII code point
strings - each code point is a code point (I don't think we really have this in the platform even though Encoding defines this kind of string; we have a variant of this where valid surrogate pairs are treated as their own code point)

Term to reference internal concepts

For discussion, we tend to call definitions of spec concepts a concept and their fields an internal slot which is an ECMAScript's specification device. Can we clarify terminologies to reference these internal spec definitions?

Add VoidFunction

Several specs end up needing to define IDL that accepts a function which is called for its side effects, which means they use something like:

callback VoidFunction = void ();

It would be nice to have this defined in Infra so we didn't have to worry about colliding global names for this trivial concept.

Define string sorting by "code unit order"

Background: whatwg/url#199

We want to update specs to be unambiguous about this, so I think infra is a good place to define it. It should include some examples (similar to the URLSearchParams WPTs).

Editorial issues I noticed that I need to write a PR for

At least one "must" in a note.

List shouldn't use contents, but just refer to items consistently.

Which WebIDL types are maps?

Like #14, but plausible WebIDL maps include at least dictionaries, objects, and records. I care in order to iterate over them.

Define pairs

In #79 @mikewest brought up pairs. I think we should consider defining them as a special case of tuples (fixed size of two) with their own / syntax.

I also think the <dfn> convention he mentions there is expected, but I'm not sure how to put <dfn> conventions into prose.

Replace for lists

This is a thing DOM does. Reasonable?

https://dom.spec.whatwg.org/#concept-element-attributes-replace step 4.

Define initializing variables and setting them?

I.e., let and set. We could do that with <dfn>, but I don't think we want to require documents to link instances.

Add "context object"

I think we should move https://dom.spec.whatwg.org/#context-object to Infra. But consider renaming it at the same time, as has been suggested somewhere, "this" is probably less confusing than "context". HTML uses "this element" etc (without cross-referencing) in some places.

Publishing

Logo, see #8
infra.spec.whatwg.org domain, requires @Hixie (put it under "annevankesteren")
Twitter acount
Blog post

Immediately after publishing:

Get into Shepherd
Start PRing various specs to use these concepts
Update biblio.json
Update the https://spec.whatwg.org/ index (requires @Hixie)

Anything else? Please modify this list.

"An ASCII lower alpha is a code point in the ran..."

https://infra.spec.whatwg.org/commit-snapshots/208e4e04632d0c8514a8b5f26f99c8472d7e836d/#example-code-point-notation

An ASCII lower alpha
is a code point in the range U+0041 to U+005A, inclusive.

An ASCII upper alpha
is a code point in the range U+0061 to U+007A, inclusive.

U+0041 is Latin Capital Letter A
U+005A is Latin Capital Letter Z
U+0061 is Latin Small Letter a
U+007A is Latin Small Letter z

So the first range should be ASCII upper alpha; the second range should be ASCII lower alpha.

Mention how to convert between strings and byte sequences

I.e. by using the Encoding Standard. See e.g. w3c/webauthn#258

Add stacks and queues

I remember now that HTML uses these for custom elements and more. After or as part of #7.

Something like:

A list is sometimes called a stack or a queue. These are just other names for list, but come with their own conventional terminology.

To push onto a stack is to...

To pop from a stack is to...

To enqueue from a queue is to...

To dequeue from a queue is to...

Also be sure it's clearly defined what happens when you pop or dequeue from an empty stack/queue.

While true / loop until break

See https://github.com/whatwg/fullscreen/pull/72/files#r101013520 for a need for this.

Do we have cases like this in other specs, and what do they say?

String / byte sequence instance manipulation

For a byte sequence it probably makes sense, e.g., https://fetch.spec.whatwg.org/#concept-method-normalize (and also the uppercase/lowercase operations), but for strings it might be a little unexpected given JavaScript. We do it all over though so maybe we should just make that a little bit more clear.

Tracker for things to move here

https://html.spec.whatwg.org/#encoding-terminology
- code unit, character?, Unicode character?, code-unit length
https://html.spec.whatwg.org/#case-sensitivity-and-string-comparison
- case-sensitive comparison, prefix match
- defaulting of string comparisons to case-sensitive
https://html.spec.whatwg.org/#common-parser-idioms
- many things already done but under different names; HTML will need updating
- White_Space characters, control characters, uppercase/lowercase hex digits
- Algorithms like "collect a sequence of characters" and friends
- Update: everything moved except White_Space and control characters
https://html.spec.whatwg.org/#numbers
- Maybe only the definitions, not the parsing algorithms? Since the parsing algorithms seem kind of HTML specific?
https://html.spec.whatwg.org/#dates-and-times ??? wait for a second consumer?
https://html.spec.whatwg.org/#colours honestly this feels like it should go in some CSS spec?
https://html.spec.whatwg.org/#space-separated-tokens (concepts left in HTML, parsing moved here)
https://html.spec.whatwg.org/#comma-separated-tokens (concepts left in HTML, parsing moved here)
https://dom.spec.whatwg.org/#ordered-sets
- Has redundancies with a few other things
https://html.spec.whatwg.org/#namespaces
- shared between HTML and DOM it seems
MIME type stuff at https://html.spec.whatwg.org/#resources
- Moved to MIMESNIFF
https://html.spec.whatwg.org/#terminology
- Definition of "or"