tlaplus / rfcs Goto Github PK

RFCs for changes to the TLA+ specification language

License: MIT License

rfcs's Introduction

Overview

This repository hosts the core TLA⁺ command line interface (CLI) Tools and the Toolbox integrated development environment (IDE). Its development is managed by the TLA⁺ Foundation. See http://tlapl.us for more information about TLA⁺ itself. For the TLA⁺ proof manager, see http://proofs.tlapl.us.

Versioned releases can be found on the Releases page. Currently, every commit to the master branch is built & uploaded to the 1.8.0 Clarke pre-release. If you want the latest fixes & features you can use that pre-release.

Use

The TLA⁺ tools require Java 11+ to run. The tla2tools.jar file contains multiple TLA⁺ tools. They can be used as follows:

java -cp tla2tools.jar tla2sany.SANY -help  # The TLA⁺ parser
java -cp tla2tools.jar tlc2.TLC -help       # The TLA⁺ finite model checker
java -cp tla2tools.jar tlc2.REPL            # Enter the TLA⁺ REPL
java -cp tla2tools.jar pcal.trans -help     # The PlusCal-to-TLA⁺ translator
java -cp tla2tools.jar tla2tex.TLA -help    # The TLA⁺-to-LaTeX translator

If you add tla2tools.jar to your CLASSPATH environment variable then you can skip the -cp tla2tools.jar parameter. Running java -jar tla2tools.jar is aliased to java -cp tla2tools.jar tlc2.TLC.

Developing & Contributing

The TLA⁺ Tools and Toolbox IDE are both written in Java. The TLA⁺ Tools source code is in tlatools/org.lamport.tlatools. The Toolbox IDE is based on Eclipse Platform and is in the toolbox directory. For instructions on building & testing these as well as setting up a development environment, see DEVELOPING.md.

We welcome your contributions to this open source project! TLA⁺ is used in safety-critical systems, so we have a contribution process in place to ensure quality is maintained; read CONTRIBUTING.md before beginning work.

Copyright History

Licensed under the MIT License

rfcs's People

Contributors

Stargazers

Watchers

rfcs's Issues

Proposal (PlusCal): `action` keyword to write any TLA+ expressions WITHOUT generating an UNCHANGED statement.

See PR at tlaplus/tlaplus#455

Personal notes (everybody please ignore):
lemmy/PageQueue@4ab1ebb
lemmy/PageQueue@07d2ea5
tlaplus/tlaplus#536

Remove CHOOSE and recursive operators from TLA+ in favor of new FoldSet language primitive

@konnov:

[It] makes me wonder, whether we need CHOOSE, if we assume that we are working only with finite sets (as both TLC and Apalache do). I have an impression that “FoldSet” together with the other set operators of TLA+ could be powerful enough to replace reasonable recursive operators and CHOOSE over finite sets.

@xxyzzn:

I think we do. How would you write a recursive definition of the cardinality of a set without being able to choose a single element from the set? Since specs are Boolean-valued formulas, it's possible that one could rewrite any spec written in terms of Cardinality using only its properties and removing its definition. (Stephan might know if that's always possible.) But it would produce unreadable specs.

In practice, you can get pretty far using CHOOSE to avoid recursive definitions--e.g., for defining the maximum of a set of numbers or even the length of a sequence. This is useful because engineers who have never used a functional programming language have a hard time dealing with recursion."

@konnov:

ight. What I meant is that we could have FoldSet as a primitive operator in the set of operators as an alternative to bounded CHOOSE and recursive operators. Currently, FoldSet is defined as follows in [1], I just inlined the auxiliary definition:

RECURSIVE FoldSet(_,_,_)
FoldSet( Op(_,_), v, S ) ==
  IF S = {} THEN v
  ELSE LET w == CHOOSE x \in S: TRUE IN
    LET T == S \ {w} IN FoldSet( Op, Op(v, w), T )

We would still have to impose a fixed but unknown order of iteration over a set. Having FoldSet as a primitive, which means that we would have to define its semantics without using CHOOSE and recursive operators, we could define bounded CHOOSE as follows:

MyChoose(S, P(_)) ==
    LET Iter(res, elem) ==
        IF res[1]
        THEN res
        ELSE <<P(elem), elem>>
    IN    
    LET res == FoldSet(Iter, <<FALSE>>, S) IN
    res[2]

The good thing about FoldSet is that it is obviously terminating and one still can write plenty of things with iteration. I know that FoldSet would not be expressive enough to replace all terminating recursive operators. I guess, we would not be able to express the Ackermann function with it. But I have always been curious about how powerful a single transition of a state machine should be.

In practice, you can get pretty far using CHOOSE to avoid recursive definitions--e.g., for defining the maximum of a set of numbers or even the length of a sequence. This is useful because engineers who have never used a functional programming language have a hard time dealing with recursion.

I agree that CHOOSE can help a lot when someone is stuck. It is just that recursive operators and CHOOSE have been a pain point in Apalache from the beginning. I understand that this is how TLA+ was designed. The question is whether you absolutely need CHOOSE and recursive operators to write a good and understandable specification.

@xxyzzn:

The fundamental goal of TLA+ is not to provide tools for finding bugs. It's to teach people a better way to think about systems. Tools are needed to accomplish that, but they are a means not an end. We were recently told that Amazon has realized this, and that they want people to use TLA+ because it improves the way they think.

The reason TLA+ does that has to do with its simplicity and elegance. CHOOSE is a TLA+ primitive and FoldSet isn't because CHOOSE is simpler and more elegant than FoldSet. [...]

@lemmy:

For what it's worth, out of the ~20 TLA+ projects I sampled (a search for TLA+ language) on Github, about 1/3 use recursive operators.

Proposal: allow Unicode alternatives for module delimiters

Instead of:

---- MODULE Demo ----
EXTENDS Foo, Bar
---------------------
Spec == TRUE
=====================

We can allow Unicode/DOS box-drawing characters to have it resemble the generated LaTeX modules:

┌─── MODULE Demo ───┐
EXTENDS Foo, Bar
─────────────────────
Spec ≜ TRUE
└───────────────────┘

This translation is easily performed by TLAUC, and for as-you-type translation if somebody types ---- it can be replaced with ┌──┐ and if they type ==== it can be replaced by └──┘. The user can then copy/paste these characters to get a line of the desired length, and remove the end caps if they only want an internal delimiter.

Proposal to add Unicode support to the TLA+ language standard and tools

TLA+ Unicode Support Proposal

Motivation

TLA+ specifications can be translated into a "pretty-printed" form with LaTeX, but this is not how developers experience them when writing a spec. Within the past decade, UTF-8 has become so widely supported that any program limited to ASCII can be seen as deficient. Supporting Unicode in TLA+ provides two main benefits:

Greater inclusivity of cultures where English is not the dominant language
Improved readability while writing a spec

Proposed Changes

Allowing a broad but restricted set of Unicode codepoints in identifiers, as in id == ...
Supporting with a finite set of LaTeX-like \name symbols in identifiers, for example to indicate Greek alphabet characters
Allowing arbitrary Unicode codepoints in strings
Allowing specified alternative Unicode symbols for various keywords and operators

Challenges

SANY does not currently support Unicode, and previous attempts to add it were met with difficulty
It is difficult (although not impossible) to switch between ASCII and Unicode spec representations while maintaining the vertical alignment of conjunction & disjunction lists (henceforth called "jlists")
As opposed to ASCII which basically looks good in any monospace font, not all commonly-used monospace fonts have aesthetically appealing Unicode codepoint renderings; nor do they render most Unicode codepoints with a fixed width
Calculation of column position is important when parsing jlists; however, calculating the column position of a character is significantly more difficult in UTF-8 than ASCII, since Unicode codepoints have variable byte length and some codepoints (such as accent modifiers) have zero displayed width

Required Work

Decide on canonical Unicode equivalents of ASCII keywords & operators (proposal here)
Decide whether ASCII symbols should have a single canonical Unicode equivalent, or accept multiple possible codepoints (example: accept both ⇸ and ⥅ for -+->)
Decide which subset of Unicode codepoints should be admitted in identifiers (ex. CJK character sets)
Decide on the finite set of LaTeX-like symbols admitted in identifiers, and their Unicode equivalents (ex. \sigma and σ)
Create a tool to convert specs between ASCII and Unicode while maintaining jlist structure (implementation using tree-sitter grammar complete)
Create tools for various popular editors to rewrite ASCII symbols to their Unicode equivalents as the user types them (implemented for Neovim)
Update syntax highlighting tools to accept Unicode symbol alternatives (implemented in tree-sitter-tlaplus)
Update the SANY and TLAPM parsers to accept Unicode symbols

Prior Work & Discussion

Recent discussion associated with tree-sitter grammar [link]
Original thread by Ron Pressler several years ago [link]
Toolbox beta release with Unicode support [link]
Issue tracking prior work on the tlaplus tools [link]
Vim plugin which displays symbols in Unicode instead of rewriting them [link]

Proposal: combine set filter and set map language constructs

I often find myself wanting to both filter and map a set. The way to do this in TLA+ is currently:

op == { x \in { f(x) : x \in S} : p(x) }

op == { f(x) : x \in { x \in S : p(x) } }

I think it would be nicer to make this a single operation:

{ f(x) : x \in S : p(x) }

One possible semantic issue is that set mapping supports multiple quantifier bounds:

op == { f(x, y) : x \in S, y \in P }

while set filtering only supports a single quantifier:

op == { x \in S : p(x) }

because after all, what would { x \in S, y \in P : p(x, y) } even mean?

Fortunately having a map operation ensures that these bounds will coalesce into a single stream of elements. However, it does make things more difficult when trying to define the semantics of this combined map/filter operation, since you can't easily decompose it into a set map then a set filter. What does this mean, for example?

{ f(x, y) : x \in S, y \in P : p(x, y) }

It cannot be easily written in terms of the existing map and filter constructs. I believe the translation would have to be something like this:

op == {
  f(x, y) : <<x, y>> \in {
    <<x, y>> \in
      {<<x, y>> : x \in S, y \in P}
    : p(x, y)
  }
}

So it would be a map that wraps the multiple quantifier bounds in a tuple, nested inside the filter that recovers their names with a tuple destructuring, nested inside the map that recovers their names using tuple destructuring.

Support `|S|` as alternative to `Cardinality(S)`

I and others occasionally wish to write the more concise |S| instead of Cardinality(S). Perhaps, this is a feature that can be piggybacked on the Unicode-related changes, although |S| will be a breaking change for all existing parsers.

Proposal to resolve grammar ambiguities

At this point there are three known ambiguities in the TLA+ grammar, where ambiguity is defined as "syntax requiring unreasonable amounts of lookahead to disambiguate". This proposal hopes to resolve these ambiguities in favor of keeping parsing simple, which has the added benefit of not requiring any changes to the current working of the tree-sitter grammar or SANY beyond possibly improved error messages. Philosophically, we might ask whether we want to add complexity to the language specification by defining these as special cases of general rules, or add complexity to TLA+ parsing by requiring these cases be handled according to a straightforward reading of the language spec.

The ambiguities are as follows:

The use of /\ or \/ in nonfix form, as in /\(a, b) (see tlaplus/tlaplus#637 and tlaplus-community/tree-sitter-tlaplus#4)
Infix operators (+), (-), and (/) which conflict with calling an operator with higher-order parameters as in f(+), f(-), and f(/) (see tlaplus/tlaplus#625 and tlaplus-community/tree-sitter-tlaplus#5)
The inclusion of the block comment start token (* as a valid sequence of characters in the language, as in f(*) or (*(a, b)) (see tlaplus/tlaplus#626 and tlaplus-community/tree-sitter-tlaplus#6)

The proposal to disambiguate these is as follows:

Disallow use of /\ and \/ in nonfix form; users can use \land(a, b) or \lor(a, b) in ASCII implementations of TLA+ if they really wish to use these operators in nonfix form.
Treat any contiguous string of characters (+), (-), or (/) as an infix operator symbol; users can use the +, -, and / operators as higher-order parameters by surrounding them with spaces as in f( + ), f( * ), and f( / ).
Treat these character sequences as the beginning of a block comment; users can add spaces to avoid them being parsed as such, as in f( * ) and ( *(a, b)).

The justification for the proposals is as follows:

Since nonfix operators are only useful when referring to operators from other modules, and /\ and \/ are defined in TLA+ builtins so cannot be redefined in other modules, there is no real use for this feature. Allowing their use in nonfix form would considerably complicate the already-complicated logic for parsing conjunction & disjunction lists.
Higher-order parameters are rarely the only parameter to an operator. The cost of disambiguating these tokens at the parser rather than lexical level would be extremely high given this niche application.
Similar justification to (2).

Accepting this proposal will close all issues linked above.

[Meta] Specify the TLA+ enhancement process

What should the TLA+ enhancement process look like?

What kind of change "requires" an RFC (where do bugs stop)?
How to include the voices of the community at large?
Who gets to vote?
How is related work organized and tracked that potentially spans multiple organizations/companies?
Do we want to and, if yes, how to unite research- and engineering-focused efforts (research doesn't want to be scooped)?
...

Goals

Foster wider adoption of TLA+ (the idea of high-level specs) in industry and education
Provide a way for the community to express their support for enhancements
Consistency of the TLA+ ecosystem
Guarantee "vendor neutrality"
Inclusiveness
Foster research
...

TODO

Lay out rules of engagement => code of conduct

Publish fixed version of TLA+ 2 grammar spec

On the TLA+ Version 2 webpage you can download a TLA+ 2 BNF grammar, appropriately written in TLA+ itself. However, I found a number of bugs in this grammar as I was developing the tree-sitter grammar. This resulted in much painstaking testing of SANY & examination of the JavaCC grammar to see what actually counts as valid syntax. I have finally taken the effort to backport my findings to the TLA+ 2 BNF grammar itself. If accepted, this updated spec should be provided for download instead of the current one. This will be a great boon to any future writers of TLA+ tooling, as the month or so I spent figuring out the "true" language spec was by far the least pleasant part of writing the tree-sitter grammar.

You can see the updated spec here, with comments added on each change (as requested by Leslie): https://github.com/tlaplus-community/tlaplus-standard/blob/main/grammar/TLAPlus2Grammar.tla

You can see the actual changes in this PR: tlaplus/tlaplus-standard#1

I received signoff on these changes from Leslie Lamport via email. He suggested also putting it through the foundation RFC process.

Proposal to add `?` postfix operator

The ? symbol is not currently defined as an operator in TLA+ 2. It was defined as an infix operator in TLA+ v1 (see grammar). I believe it would be useful to define ? as a postfix operator. TLA+ does not have many user-definable postfix operators, and this seems a natural fit for several use cases. Two which come to mind:

Case 1: checking for null

Specs often have a "null" model type which needs to be checked:

---- MODULE Test ----
CONSTANTS Key, Value
VARIABLE kvs
NoValue == CHOOSE v : v \notin Value
Increment(key) ==
  /\ kvs[key] /= NoValue
  /\ kvs' = [kvs EXCEPT ![key] = @ + 1]
=================

The ? operator could be a nice for this:

val ? == val /= NoValue
Increment(key) ==
  /\ kvs[key]?
  /\ kvs' = [kvs EXCEPT ![key] = @ + 1]

Case 2: indicating value is optional

The progenitor of this idea was when I was working with the TLA+ grammar itself. It has constructs like:

set_literal == tok("{") & (Nil | CommaList(G.Expression)) & tok("}")

This is kind of hard to read. At first I wanted to define an operator Optional, but it could be written more nicely as:

rule ? == Nil | rule
set_literal == tok("{") & CommaList(G.Expression)? & tok("}")

I'm sure others can come up with even more creative uses.

[tool] Pretty Print to HTML

TLA+ has a great pretty-printer to TeX (tla2tex), but HTML is becoming a de-facto document standard, especially for content shared online. HTML also has other advantages, such as the ability to automatically add hyperlinks from symbols to their definitions, and allow for collapsing and expanding proofs. The existing tla2tex code already contains most of the necessary parsing and typesetting pre-processing (like alignment), and could serve as a basis for an HTML pretty-printer. A prototype already exists.

Copied from: https://github.com/tlaplus/tlaplus/blob/master/general/docs/contributions.md#pretty-print-to-html-difficulty-easy-skills-java-html. Originally proposed by @pron.

Proposal to add module package management & versioning to TLA+

Currently there are four sources of external TLA+ specifications that can be imported:

The standard modules (Sequences, FiniteSets, Naturals, etc.)
Modules distributed with the toolbox (we might call these the extended standard modules, like Randomization.tla)
The community modules
The TLAPS modules

Recent experience validating the TLA+ examples repo has shown that changes in these dependencies are a source of bitrot. We should do what every other language has done and come up with a package management scheme where:

Specs can declare a dependency on specific versions of each of these categories, independently
- In the case of the community modules, every single spec should be independently versioned & addressable
Specs included as dependencies can themselves recursively have specific versions of dependencies

There are a number of possible ways this could be achieved. The simplest would be to have an optional tlaplus.config file in the spec root directory encoding key/value pairs of module name and version number in some standard config format (json, toml, yaml, whatever). Tools would then be responsible for acquiring and using the correct module when they are run. This will have a number of challenges:

Currently releases of the standard & extended-standard modules are coupled to releases of TLC. They should be decoupled, and whatever version of TLC is run should be able to use specified versions of the standard & extended-standard modules. It would also be necessary for TLAPS to implement this for its included modules.
Diamond-shaped dependency version conflicts. There are a number of ways of handling this like namespace unification (C# style) where one of the dependencies is forced to use the version of the other and errors might manifest at runtime, or the node.js method where multiple versions of the same dependency can coexist.
Hosting & update of community modules. This is currently done with a git repository. We can continue using this until the number of specifications become unwieldy or we start hitting github traffic limits (both unlikely to occur for quite a while). The easiest method of bolting versioning onto the community modules would be to add a top-level versioning.json file associating some version number of each module with a specific git commit hash. This has the drawback that updates to community modules require two commits, one to update the module and another to update the versioning.json file (since the commit hash of the first can't be known ahead of time). The versioning.json file would also need to pin dependency versions.