GithubHelp home page GithubHelp logo

spec's Introduction

The Rust Specification

The Spec and the Rust Reference

** The Rust specification is currently being developed as part of the Rust Reference.

The t-spec team, in consultation with interested parties from the Rust Project, made the decision on a go-forward plan that makes the Rust Reference the source of truth and the specification will be based on that content.

It is currently unclear what happens to both this repository and the Rust reference once we are at a steady-state of having a usable specification. It is likely we will consolidate the two repositories into one that ends up being called the specification. Stay tuned for more information on that.

License

The Rust Specification is distributed under the terms of both the MIT license and the Apache license (Version 2.0).

See LICENSE-APACHE and LICENSE-MIT for details.

spec's People

Contributors

chorman0773 avatar ehuss avatar joelmarcey avatar m-ou-se avatar pnkfelix avatar rust-lang-owner avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

spec's Issues

Policy: What can be left undefined?

Should we ever allow anything to be left undefined?
If so, should the particular undefined parts be explicitly stated, and how?

  • ehuss's preference is to never leave anything undefined, but there will definitely be some gray areas that make that difficult.

Add contributor guidelines

It is usually best practice to include a CONTRIBUTING.md file with guidelines for contribution. What should those guidelines be? Do we want to steer people wanting to write new content to first work with the team? Probably should point them at the authoring guide, but what else do they need to know?

Policy: Use of Minirust/DSLs to specify Semantics

I'd like for us to have a full discussion about the use of Minirust and other DSLs as a normative part of the specification - how much of the spec should be written in Minirust or another DSL vs. prose, and when it's allowed to be used in place of prose.

The Operational Semantics Team (or members thereof) should be involved in this discussion as applied to the Dynamic Semantics chapter.

Markdown style guide

I think it would be good to have some guidance on preferences for how the Markdown text is written. These are some suggestions, many of which are from the Reference.

  • Restrict sources to .md or .html. We should make intentional decisions to include other types of sources as a team.

  • Prevent #![feature] in examples (the spec should not be documenting unstable)

  • Reject CRLF

    • This might have complications based on Windows' users core.autocrlf setting.
  • Reject tab

  • File must end with a newline

  • Lines must not end in spaces

    • This is important because trailing spaces are significant in Markdown, but I think that is a flaw.
  • Avoid double blank lines

  • Do not use indented code blocks (3+ backticks code blocks instead)

  • Code blocks should have an explicit language

  • Use ATX-style headings (not Setext)

  • Use sentence case for headings

  • Line wrapping
    Word wrapping column? 80? 90? 100? No wrapping? No preference? Semantic linefeeds?
    If not semantic linefeeds, then what is the policy for PRs? Should they rewrap any content they touch?

    • ehuss's preference: Semantic linefeeds are extremely helpful for dealing with diffs. Essentially, just once sentence per line.
  • Link kinds

    • There are different ways to link to things, such as reference links, or inline links.
      ehuss's preference is to only use reference links.
      Inline links disrupt the flow of the source text, so it can be recommended to avoid them.
      Use reference link shortcuts if appropriate.

      How should the references be organized? Within the section they are first used? At the bottom of the file? Should they be sorted?

      • ehuss's preference: Either at the bottom of a section, or at the bottom of the file, it doesn't really matter. Keep it sorted (case-insensitive?), which can help if there is a long list.
  • Always use relative links, to .md

    • Using .md relative links allows it to work in GitHub's rendering.
    • For now, we will not be able to do relative links to std and other Rust docs. These are important for properly supporting offline viewing of the Rust documentation. Also, our link validation only works with relative links. We may want to investigate better tooling for that.
  • Use smart punctuation instead of Unicode characters.
    Use --- for em-dash instead of the unicode character.

  • List style

    • Bullet lists: prefer -, *, or +?
      • ehuss's preference: Author's choice.
    • Ordered lists: Prefer 1. or 1)
      • Prefer repeating 1 so you don't have to renumber things? Or prefer to use the numbers 1, 2, 3, ...?
      • ehuss's preference: Author's choice.
  • Defining abbreviations

    <abbr title="dynamically sized types">DSTs</abbr>

    These are very helpful to people who may not know all abbreviations.

    Also, I think we should avoid Rust-specific abbreviations if possible (like APIT).

Where should "lowering" from Rust to Minirust go.

I think this is a crucial question to the open PR for the layout chapter, as well as future chapters for the Dynamic Semantics chapter.

Where should we specify the lowering of a Rust program to Minirust, and where do we start talking about a concrete instance of the abstract machine, rather than the parameterizable AM.

The current layout chapter assumes both the lowering and the semantics of that lowering are part of the dynamic section (it talks about both layout, which is a property of the parameterized AM instance, and representation, which is part of the semantics of the resulting AM). The only other place that makes sense is Static Semantics, but that section seems to specifically be handling well-formedness and type system constraints, which wouldn't make it a good place either.

How to handle Editions?

There should be guidelines for how to handle Editions.

My recommendation: The primary text of the document should document the latest edition. When something is only available in a specific edition (like "async"), mention that up-front. When there is a change in behavior between Editions, edition-specific rules should be added which specify what the change is to the older edition relative to the present.

However, that is not a perfect solution. It can create some awkwardness (like "For the 2015 Edition, disregard rules X, Y, and Z, instead …").

Alternate approaches we struggled with in the Reference are:

  • Only document 2015 within the main text, and then anything added in future additions are "extensions" on top of that.
  • Any situation where there is a change in edition behavior is shown as an alternation ("in the 2015 Edition, .... In the 2018 Edition, ....").

I do not particularly like those options, though.

Guidelines or policy on grammar and style

Should there be guidelines or policy on English grammar and style? Some examples:

  • Should there be a preference for a specific external style guide, like AP Style, APA, Chicago, MLA, etc.?
  • Grammatic person guidelines? Voice?
  • Oxford commas.
    • ehuss's preference is yes.
  • Avoid slashes for alternatives ("program/binary"), use conjunctions or rewrite it ("program or binary")
    • ehuss's preference is yes.
  • Avoid qualifying something as "in Rust".
    • ehuss's preference is yes. Contributors to the Reference often start their sentences "In Rust, ...". That should always be removed. Almost every sentence in the spec is about Rust.
  • Should the spec be agnostic about whether code is "compiled" or "interpreted"? Should it avoid using the term "compile"?
    • ehuss's preference: I think it would be good to avoid talking about "the compiler" or "... is compiled to ..." unless absolutely necessary.
  • Any guidance on contractions?
  • Phrasing for editions should be "the 20xx Edition", not "Rust 2024", not "Edition 2024"

Example code guidelines

The following are some questions about including code samples in the text.
See https://github.com/rust-lang/reference/blob/master/STYLE.md#code-examples for the reference guide.

  • Will there be illustrative examples in the spec?
    • ehuss's preference: Yes, I think illustrations are very helpful.
    • Consider that some readers will skip or skim the text and look at the code first.
  • How often should there be examples? Are there any guidelines for when they should and should not be used?
  • If we have a separate testsuite, can some of those examples be offloaded to the testsuite?
    That is, instead of showing some examples inline, just allow the user to somehow view the testsuite for a specific rule.
  • Should there be naming conventions, such as avoiding nonsense terms like foo/bar/baz, and using realistic terms instead.
    • ehuss's preference: I would encourage not using nonsense terms, and try to use names that are illustrative of the concept whenever possible (example).
  • Should the examples prefer to be realistic of what a user would actually write?
    That can be difficult, since that often requires longer examples. Unrealistic or trivial examples can be confusing.
  • Are there guidelines for balancing length versus clarity? There are times when to illustrate some concept, it may require a significant amount of code. At what point is too much? mdBook supports hiding irrelevant portions of code, but that has limitations.
  • Should example code be inline within the text, or outline (like TRPL) and use includes? Inline is easier to author, outline is easier to test.
  • If using rustdoc (mdbook test) to test, then there should be guidelines on using tags like no_run, ignore, E errors, etc.
  • Should inline code samples be tested with mdbook test, or via a separate test infrastructure?
    mdbook test is very easy to use, and makes it easy to see the code in the source text. However, it is very limited in its capabilities.
    • ehuss's preference: I think we should (eventually) invest in a test infrastructure using compiletest or ui_test. We can perhaps have some kind of hybrid inline/outline model, but I don't know what that might look like.
  • Should examples be formatted with rustfmt?
    • Any non-default options?

Review policy

What is the policy for reviewing changes to the spec?

Policy: Implementation-specific rules?

Should we ever allow implementation-specific rules?
If so, how should that be represented?

In the past, there has been a sentiment from some in the project to never allow that. I largely agree with that. However, I think there will be gray areas that will be difficult to navigate with such a prohibition.

Policy: Referring to `rustc` lints

Generally the Reference has avoided ever mentioning rustc lints, but unfortunately there are a few lints that are really relevant and important to certain sections of the language.

  • ehuss's preference: Have a policy against referring to them except in extreme situations where an Editor's note provides some very useful context. For example unsafe_op_in_unsafe_fn, and in particular its relationship to the new Edition.

Specification Chapter/Topics

  • Front Matter
    • Introduction
    • Specification Scope
    • Terms and Definitions
  • Source code and Rust syntax tree (graph?) - T-lang
    • Lexing/tokenization (@m-ou-se)
    • Grammar, AST
    • Crates, modules, source files
    • Macro invocations
    • Macro expansion and conditional compilation
    • Name/Path resolution of (mod-level) items
  • Static semantics - mixed T-lang/T-types
    • type checking
    • associated item resolution
    • existential (impl Trait) resolution
    • borrow checking
    • unsafe checking
    • const eval - T-opsem?
    • type inference
  • Dynamic Semantics - T-opsem
    • high level expression form
    • pattern matching and binding
    • dyn traits and dynamic method dispatch
    • memory layout and value representation (@chorman0773)
    • low level (MIR-like) statement form
    • memory model (borrowing; atomics)
    • ABIs and FFI linkage (@chorman0773)
  • The Core library crate - T-libs-api (@pietroalbini ??)
    • builtin types' traits and methods
    • core::* items
    • alloc

Determine Specification Stakeholders

Who are the "stakeholders"; what is the purpose and role of these parties, and is that even the right term for that group? Do we want to rename "stakeholders"? (some suggested that on zulip)? Do we need all the stakeholders now, or are they based upon what we plan for the first version?

Policy: Linking to other documentation

Should there be a policy on what external documentation the spec will link or refer to?

There are some resources that could be useful to typical Rust developers, like TRPL, or other books or guides. However, having lots of links can be noisy.

There are also non-Rust references, which might be relevant (like other standards). But where do you draw the line?

What is the policy for linking to historical information, such as RFCs, PRs, GitHub issues, blog posts, etc? These sources are mostly static and not updated, and thus may provide outdated or incorrect information. However, as historical context they are useful.

  • ehuss's preference: Don't ever link to RFCs, PRs, GitHub issues, blog posts, and similar things. However, we inevitably will hit sticky issues where we know something is wrong, and not linking to the relevant issue is withholding useful information.

Policy: How to deal with bugs in rustc?

Should the spec document actual rustc behavior (including bugs)? Or should it document the intended behavior? Or both?

Related: If there is a known issue, should it be encouraged or discouraged to link to the GitHub issue?

  • ehuss's preference is to never link to issues.

Related: Should it have specific callouts for future-incompatible warnings?
How should these known problems be documented? Should it document how it is intended to work? What if the intention is not yet known?

Meta Policy: Adopt Semantic issue labels

I'd like to formally recommend that we adopt the semantic issue labels format used by the rest of the rust-lang org for labelling issues. It will probably be easier to migrate both labels and issues now while we have a limited set (vs. later when we have a few hundred), and the labels are fairly helpful in filtering things.

It would also make it easier to set up labeling from triagebot (I did that set up a bit ago for the unsafe-code-guidelines repo used by T-opsem), and allow contributors to more immediately identify information about the issue/PR. This can be useful particularily for PRs that touch the spec chapters, as making it easy for contributors to identify stylistic/maintenance changes vs. salient textual changes. With some work, it can also be useful to interface with the rest of the project, by raising potential issues in spec PRs to the relevant teams when something may be unclear (using standard labels like I-lang-nominated).

How do we validate the specification?

Do we solely rely on tests that live under rust-lang/rust ? Are there other validation mechanisms to consider in short and long term? (e.g. formal methods)

Custom colors needs adjusting for dark themes

The custom CSS changes the default styling of blockquotes to have a yellowish background color. This color does not come across very well in the dark themes (coal, navy, ayu). I think it should probably use a different color when using a darker theme.

Also, the color of inline code spans inside blockquotes will probably need adjustment as well.

image

Set up bot for t-spec team to approve PRs, etc.

Similar to rfcbot where you have checkboxes next to members of the t-spec team that get checked off for approval/review. This would probably be done only on certain types of issues - rfc like issues, maybe. That would need to be discussed.

Spelling

Should there be a policy to use American or British English spelling (or other similar style differences)? Some parts of the Rust project have standardized on American spelling. The ISO house style is Oxford (British).

Or just leave it up to the author? I can predict that being inconsistent will annoy or confuse some people.

Should there be tooling to validate spelling?

  • ehuss's preference: I think spell check tooling would be great, but most of the tools I have used have been underwhelming.

Policy: Target-specific rules

What should the policy be around documenting target-specific behavior?

There are several areas where we generally have to refer to targets, but should the spec go so far as to completely specify targets (I suspect no?). If not, then where do we draw the line? How do we say "yes" to one thing, and "no" to another?

Some examples of target-specific things from the reference:

  • Conditional-compilation keys (target_os and such)
  • target_feature
  • instruction_set
  • debugger_visualizer
  • Type layout
  • Linking and symbols
  • inline assembly
  • windows_subsystem
  • ABI's
  • main behavior (maybe)

CI style validator

I would recommend adding a style-checker in CI.
Here is the one used by the Reference, and I would recommend just grabbing that one and extending it with whatever style checks we want.

Test suite

I think it would be very helpful to have a test suite integrated with the spec.

Benefits

There are several reasons:

  • It serves as a validation for changes or regressions in rustc that may get out of sync with the spec.
  • It helps illustrate what the rules are trying to convey. This can service several audiences:
    • The spec author. During the process of writing the text, I have found that I usually write a large series of tests to validate my understanding.
    • The spec reviewer. This can help the reviewer validate their own understanding matches the code to the text, and to help check how complete the coverage is.
    • The spec reader. People often skip to examples first rather than reading the text, since they can understand things more quickly that way. Additionally, tests can help them validate their understanding of tricky concepts.

As an example, Ferrocene has a Traceability matrix report that shows connects to the rustc testsuite to sections within the spec. The Reference uses the rustdoc test functionality to verify examples within the source text.

Where are the tests?

My suggestion would be to have an independent test suite that lives with the spec (instead of, for example, linking to rustc's test suite). That has a few benefits:

  • The tests can be specifically tailored to illustrate concepts within the spec. Some rustc tests may not directly demonstrate some narrow concept or rule.
  • The tests can include comments and style that is approachable to a typical Rust programmer. Some rustc tests can be very terse, or unclear what they are testing.
  • More easily add tests. Pushing to a separate repository can be a pain.
  • Easier to review changes and additions to the spec. With the tests within a PR, the reviewer can directly see what is being changed without referencing an external site.
  • Prevents tests from getting out of sync. If rustc changes, renames, or removes tests, the spec will be constantly needing to chase those down. Additionally, a test could be changed to not cover what the spec expects it to cover.
  • Provides an independent validation of regressions in rustc. This has happened a very small number of times in the past.

Of course the major downside is actually writing the tests. However, I expect that authors will need to write test code anyways to verify their own understanding.

Viewing the tests

I would recommend making it easy to view the test files from the spec. I don't know exactly what that would look like, but maybe there could be some kind of link or icon next to each rule name, and that takes you to the tests linked with that rule? This will be hard to not make it too noisy.

Inline or outline tests

I expect there will be two kinds of tests:

  • Simple illustrations shown within the text to help the reader see some examples.
  • More complex or exhaustive tests to validate the behavior of specific rules, but not necessarily shown directly in the text.

"Inline" and "outline" have different meanings in different contexts:

  • How you write the code: Code samples to be shown to the user in the text can be inline via markdown code blocks, or outline via mdbook includes.
  • How the reader sees the code: Code that validates a rule can be shown inline in the text to the user (as illustrations), or can be placed in a separate test suite which requires the user to take some action to view (like the FLS test matrix).

Inline test benefits:

  • Easier to author. No need to create another file, think of a name for it, etc.
  • Easier to review. GitHub's PR preview would not show outline tests, and it can be difficult to correlate which test is being shown.

Outline test benefits:

  • Support multiple files and other more complex build requirements.
  • Easier to show a subset of a large code sample (using include "anchors").

I would recommend integrating the test suite with inline examples, to make it easier to author and review simple code snippets. Larger examples should use mdbook includes if desired (see how TRPL does this, and #19). However, most rules should not have their tests displayed in the text, but instead accessed through some relatively easy means (like a link). #19 covers what the guidelines should be for showing in-text examples.

Tooling

I would expect the tests to be roughly the same as what rustc uses. Whether that is ui_test or compiletest, I don't know. I'm only vaguely aware of ui_test, so I'm not sure what its limitations are.

Policy: Documenting *why* something works in a particular way

Should the spec ever mention why something works in a particular way?
In the Reference, we have generally avoided that, but I think that is a detriment to some readers, since just specifying the behavior of something can be extremely hard to understand why it matters, or how it is relevant to a Rust programmer.
That kind of information can provide useful and interesting context.

Graydon spoke highly of the Ada rationale (the 1979 version specifically), using it while working on Rust.

A very minor example: The type_length_limit attributes explains what it does, but not why it is there. I think it could be useful to have a note like: type_length_limit is used to prevent the compiler from hanging and to better deal with polymorphic recursion.

Regardless, I think it would be good to have a policy, as contributors sometimes include this information that we then need to tell them to remove. Guidelines could bring clarity on what we expect.

Determine and document the admonitions to use

The current mdbook plugin allows writing blockquotes with tags to indicate the type of admonition using a syntax similar to GitHub markdown (see docs). However, there is no restriction on the labels, and they map directly to CSS classes.

I would recommend deciding which kinds you want, decide on the CSS styling, and then change the plugin to reject any other kind (to check for typos).
I would also recommend creating guidelines for when to use the different kinds (note, warning, tip, etc.).

These should be documented in the authoring guide.

Policy: Heading rule style

Currently rule ids for headings have to be specified using a separate rule id, either before or after the heading. This has two primary issues:

  • The rule will be formatted separately from the header, and the rule style makes it seem like there should be attached content after (rather than a heading or another rule)
  • The Heading will be assigned a separate anchor id rather than the rule id.

I'd like for us to support rule-id's in headings, either using the standard cmark anchor id, or an extension for mdbook-spec.

@rustbot label +C-meta-policy

Output of the lexer

If the plan is to document the lexer separately from the grammar, there are choices to make about how to describe its output.

One choice is whether the lexer's output for punctuation should be described as consisting of fine-grained (single-character) tokens (like the ones procedural macros use), compound tokens (like the tt fragment specifier in macros-by-example uses), or some mixture.

If the lexer doesn't use compound tokens, there's then a choice of how to allow the lexer's clients to distinguish (say) && from & & in cases where it's necessary.

There's also a choice of whether lifetimes use fine-grained or compound tokens.

Clients of the lexer

There are three parts of Rust which consume the lexer's output:

  • the main Rust parser
  • the Procedural Macros system
  • the "Macros by example" system

The proc-macro system works in terms of fine-grained tokens; macros-by-example (when using the tt fragment specifier) works in terms of compound tokens.

The parser could be specified using either fine-grained or compound tokens, or a mixture of the two.

I think in practice this means that, whatever choice is made for the main description of the lexer, the "Lexing/tokenization" chapter should have additional text describing how to convert its output to the other form(s), which the chapters for macros could refer to.

Input to the parser

(I'm assuming that the intent is to accurately describe rustc's current behaviour, for the "descriptive" view of the spec.)

Using compound tokens

If the grammar is defined in terms of compound tokens, there are rules which need to match both a single-character token and a compound token beginning with that character:

  • && in BorrowExpression, ReferencePattern, and ReferenceType

  • || in ClosureExpression

  • < in many places (I think everywhere it appears, except comparison expressions)

  • > in many places (I think everywhere it appears, except comparison expressions)

Brian Leibig's grammar gives a reasonable idea of how this approach might end up.

With this approach, the description of how the parsed source is converted to an AST would also have to deal this complication.

Using fine-grained tokens

If the grammar is defined in terms of fine-grained tokens, its rules will sometimes need to make a distinction depending on whether punctuation tokens were separated by whitespace.

For example:

  • a << b must be accepted as a left-shift expression, but a < < b must not
  • a && b must be taken as a LazyBooleanExpression, not treated as ambiguous with a & &b
    • but a &- b must be treated as equivalent to a & -b

I think that for <, >, and | it's sufficient to know whether there is spacing before the next token, but the second example above shows that for & what matters is whether it's immediately followed by another &.

One approach might be to describe the lexer's output as including explicit whitespace tokens (and presumably introduce some convenience notation in the grammar so that they don't usually have to be mentioned).

Another approach might be to say that there are two distinct tokens used to represent each relevant punctuation character, as suggested by this comment:

You distinguish "> followed by >" as one token and "> followed by something else" as another kind of token.

A third approach might be to use some form of attribute grammar, and say that the input tokens have attributes like "Spacing" and "Is immediately followed by another &".

With any of these approaches, I think the description of how the parsed source is converted to an AST would remain reasonably simple.

Rejected joined forms when using fine-grained tokens

There are some cases where parses which might naturally be allowed must be rejected when punctuation characters are not separated by whitespace.

I know of the following:

  • SomeStruct { field:::std::i32::MAX } must be rejected, not treated as equivalent to SomeStruct { field: ::std::i32::MAX }

  • fn f(s:::std::string::String) must be rejected, not treated as equivalent to fn f(s: ::std::string::String)

I expect there are more of a similar sort.

This is perhaps an argument for using a compound token for :: even if fine-grained tokens are used in other cases.

There's also this:

  • a <- b must be rejected, not taken as equivalent to a < -b

but perhaps that's really analogous to a && b (an ambiguity is being resolved in favour of an obsolete unstable feature).

Input to procedural macros

Procedural macros see fine-grained Punct tokens, which have a Spacing property which indicates whether there was whitespace before the following token.

Lifetime-or-label ('foo) is represented as a Punct token with joint spacing followed by an Ident token.

The documentation as of Rust 1.76 has this to say about Punct's Spacing property:

Joint

[…] in token streams parsed from source code, the compiler will only set spacing to Joint in the following cases.

  • When a Punct is immediately followed by another Punct without a whitespace. E.g. + is Joint in += and ++.

  • When a single quote ' is immediately followed by an identifier without a whitespace. E.g. ' is Joint in 'lifetime.

This list may be extended in the future to enable more token combinations.

Alone

[…] In token streams parsed from source code, the compiler will set spacing to Alone in all cases not covered by the conditions for Joint above.
E.g. + is Alone in + =, +ident and +().
In particular, tokens not followed by anything will be marked as Alone.

Input to "Macros by example" macros

By-example macros using the tt fragment specifier see the following combinations of punctuation as compound tokens:

  <=
  ==
  !=
  >=
  &&
  ||
  ..
  ...
  ..=
  ::
  ->
  <-
  =>
  <<
  >>
  +=
  -=
  *=
  /=
  %=
  ^=
  &=
  |=
  <<=
  >>=

Lifetime-or-label ('foo) is represented as a single token.

Sources of information

The current rustc implementation

(As of rustc 1.76)

The lower-level lexer in rustc_lexer emits fine-grained tokens for punctuation (but compound tokens for lifetime-or-label). It emits explicit tokens representing whitespace.

The higher-level lexer in rustc_parse::lexer emits compound tokens for punctuation. It represents whitespace as 'Spacing' information describing the relationship to the following token.

The parser breaks those compound tokens up again where necessary.

(As I understand it, there is consensus that ideally rustc would switch to using fine-grained tokens internally.)

The breaking-up takes place in the following functions in rustc_parse, which might be used to audit for cases where a grammar based on compound tokens would need special cases.

  • expect_and()
  • expect_or()
  • eat_lt()
  • expect_lt()
  • expect_gt()
  • eat_plus()

In particular, rustc takes care to split += when the + appears in generic bounds, but I think that isn't currently making any difference (I don't think a following = can ever parse).

Currently maintained documentation

The Reference

The Lexical structure chapter of the Reference uses compound tokens for punctuation and lifetime-or-label.

The grammar used in the Reference is written in a mixed style.

In most places it's written as if it's working with fine-grained tokens; notation like && can be taken as representing two & tokens without intervening whitespace.

In three cases (BorrowExpression, ReferencePattern, and ClosureExpression) it lists both the fine-grained and the compound tokens explicitly, sometimes with an explanation in the text.

It treats LIFETIME_OR_LABEL as a single token.

The Ferrocene spec

The Ferrocene spec's lexical description and grammar are close to the Reference's.

The lexical part was forked from the Reference sometime in late 2021, and has no interesting additions.

Its grammar doesn't include the separately-listed compound tokens that the Reference has.

Older formalisations

Rustypop

Rustypop from 2016 used fine-grained tokens to deal with <<, >>, &&, and ||.

Otherwise it used compound tokens (in particular, for <=, <<= <-, >=, >>=, and ::).

Its README describes its approach to dealing with multiple-punctuation symbols in the grammar.

Brian Leibig's grammar

Brian Leibig's grammar (last updated in 2017) uses compound tokens throughout.

It's the only grammar of that sort I've found that tries to deal with all cases where compound tokens need to be accepted as if they were multiple fine-grained ones.

wg-grammar

The wg-grammar grammar from 2019 appears to have used compound tokens for both punctuation and lifetimes.

I don't think that project got as far as dealing with the resulting complications. rust-lang/wg-grammar#47 has a little discussion.

rust-lang/wg-grammar#3 includes extensive discussion on how the lexer might be modelled.

One result of that discussion was https://github.com/CAD97/rust-lexical-spec , which would have output fine-grained punctuation tokens and explicit whitespace tokens.

Policy on unstable features

We should write down the policy for dealing with unstable features. I would recommend using something similar to what the Reference has:

This book also only serves as a reference to what is available in stable Rust.
For unstable features being worked on, see the [Unstable Book].

In particular, I think the main branch of the spec should only include what is stabilized in the master branch of rustc.

Policy: Should the spec annotate or have callouts based on Rust release versions?

Should the spec ever mention Rust release versions, or differences between versions, or when something was introduced.

My preference is to never mention Rust versions. I think it generates clutter, which can balloon to a large amount of text. (Imagine if you annotated the entire document with description of what has changed and in which version, would likely end up marking up the majority of the document.)

Policy on referring to `rustc` and other implementations

I think there should be a policy on if and when it is acceptable to refer to rustc.

One option is to have a policy of never referring to it directly. However, I think that would result in a loss of information or assistance to Rust users. There are some gray areas where there is a close coupling between the language and rustc, and having a small tip or note can save some readers considerable time trying to find the relationships.

If we do allow it, I would recommend having a policy of avoiding it as much as possible, and only including it when there is a high level of relevance, and only as a side-note or tip.

See also #32 about referring to bugs.

There is also a closely related concern about whether or not anything should ever be "implementation specific", see #29.

Some examples from the Reference:

  • rustc currently only allows the clippy and rustfmt tool attributes. It has not yet been decided if the tool attribute space should be extensible (and if so, how).
  • The feature attribute links to the Unstable book.
  • The description of user extensible cfg values mentions the way to do that is via rustc --cfg.
  • The test cfg is set via rustc --test.
  • Identifiers that start with an underscore are a convention to silence an unused warning in rustc.
  • The dynamic and static C-runtime mentions rustc -C target-feature=+crt-static
  • A note about the hazards of using #[inline] and the relationship to how rustc works.
  • How target-features and target-cpu's are related via rustc options.
  • Note about how you can find the available lints in the lint attribute docs.
  • Note about what #[deprecated] does in rustc.
  • Defaults for recursion_limit and type_length_limit.
  • Mentions of certain hazards, like overflowing literals will generate a warning by rustc.
  • Lots of documentation related to #[link] needs to cross-link with the rustc docs.
  • The behavior of unwinding across nounwind abi uses an illegal instruction in rustc.
  • An explanation of the requirement to use extern alloc; due to the way things work.
  • How to write a proc-macro with Cargo (this shouldn't be in the spec, but there should be some place where we have that kind of documentation).
  • Note about security concerns of proc-macros.
  • Source files discuss filesystems specifically, but in theory there could be other ways that source is loaded. How rustc-specific is that?

Support glossary auto-links

It would be nice to make it easier to link to the glossary.

Ferrocene has ferrocene_autoglossary which gathers the list of terms in the glossary, and then automatically creates links whenever those terms appear in the text. It also checks for glossary terms that are not mentioned anywhere.

I'm uncertain if that is how we want it to work, but seems worth exploring.

Settle on a documentation format

  • What output formats do we want?
  • What capabilities do we need in the source format (e.g. in terms of interlinking, other base capabilities, etc.?)

Typst is a document format written in Rust.

Policy: Use of Minirust/DSLs to specify Semantics

I'd like for us to have a full discussion about the use of Minirust and other DSLs as a normative part of the specification - how much of the spec should be written in Minirust or another DSL vs. prose, and when it's allowed to be used in place of prose.

The Operational Semantics Team (or members thereof) should be involved in this discussion as applied to the Dynamic Semantics chapter.

Policy: How to deal with undecided rules?

There will likely be many cases where the language team (or other teams) have not decided on how a particular part of the language should work, and we (the spec authors) know that there is ambiguity. How should we handle that?

I think waiting for a response from the lang team (or other teams) could significantly slow things down, and make it difficult to make progress.

Should the spec specifically mention things that are not yet resolved? Should we just not mention them at all, and track those via GitHub issues?

Codify working relationships

  • What's the working relationship between the editor, the team, and others with respect to the editing workflow?
  • What's the relationship between the spec / spec-team and the Rust Reference?

Policy: Should deprecated things be called out?

Should the spec ever mention that some language construct is deprecated? For example, things that we know are deprecated, but will not be removed in an Edition, or whose replacement is not yet implemented or certain.

Similarly, if something is removed or changed in an Edition, should the Edition-specific docs mention that a particular thing is deprecated?

  • ehuss's preference is to not mention that for Edition-specific changes. Non-edition changes are a bit more difficult to make a judgement on.

Guidelines on a glossary

We should have some guidelines on what goes into the glossary.
We may also consider deferring to external sources for terms of art instead of trying to define things ourselves. For example, DADS, ISO 2382, FOLDOC, etc.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.