estree / estree Goto Github PK

The ESTree Spec

License: Other

estree javascript ast parsing specification

estree's Introduction

The ESTree Spec

Once upon a time, an unsuspecting Mozilla engineer created an API in Firefox that exposed the SpiderMonkey engine's JavaScript parser as a JavaScript API. Said engineer documented the format it produced, and this format caught on as a lingua franca for tools that manipulate JavaScript source code.

Meanwhile JavaScript is evolving. This site will serve as a community standard for people involved in building and using these tools to help evolve this format to keep up with the evolution of the JavaScript language.

AST Descriptor Syntax

The spec uses a custom syntax to describe its structures. For example, at the time of writing, 'es2015.md' contained a description of Program as seen below

extend interface Program {
    sourceType: "script" | "module";
    body: [ Statement | ImportOrExportDeclaration ];
}

ESTree Steering Committee

Copyright and License

Licensed under Creative Commons Sharealike.

Philosophy

Suggested additions and modifications must follow these guidelines:

Backwards compatible: Non-additive modifications to existing constructs will not be considered unless immense support is in favor of such changes. (eg. #65)
Contextless: Nodes should not retain any information about their parent. ie. a FunctionExpression should not be aware of if it's a concise method. (eg. #5)
Unique: Information should not be duplicated. ie. a kind property should not be present on Literal if the type can be discerned from the value. (eg. #61)
Extensible: New nodes should be specced to easily allow future spec additions. This means expanding the coverage of node types. ie. MetaProperty over NewTarget to cover future meta properties. (eg. #32)

Acknowledgements

ESTree has benefited from the contributions of many people over the years. We'd like to thank these folks for their significant contributions to this project:

Sebastian McKenzie (Babel), Kyle Simpson (@getify), Mike Sherov (Esprima), Ariya Hidayat (Esprima), Adrian Heine (Acorn), Dave Herman (SpiderMonkey), Michael Ficarra (@michaelficarra).

estree's People

Contributors

Stargazers

Watchers

Forkers

mikesherov dmitrysoshnikov dherman rreverser arianra marquisknox caridy benjamn constellation nzakas gibson042 taijiweb leebyron btmills austinfath tuchida rictic cybernetics msand toshok jibesh cos jasonlaster jamalahmedmaaz briandipalma amsardesai hzoo simudream brianhartsock lhorie imbhargav5 jairajs89 alfhub jnan77 bthallion jmm xiaohan2013 logchi bryanjos amasad shinout mngogo gkz iamkritika zenhumany zevenzeng 4u wabain pilagod xiemaisi justinfagnani sizappaaigwat langri-sha sheweichun danharper mickael9 universal-it-systems gyaneman ayato1995 nitinreddy3 jdvorak dead-claudia poudelprakash nitin42 davidyaha loganfsmyth languageplayground m1nd muglug aladdin-add olsonpm not-an-aardvark cpcallen jamiebuilds futpib lemures-t rapidhere varenytskyi seanastephens ukutaht zzsoszz eoghanmcilwaine victorhom hyj1991 vickkyy jlozano jebcat1982 undefinedlee philippevienne akiou reganbell dobringanev wjaywjay matthewmueller mysticatea robsimmons dhbaird-bbg valaxy wooodhead adrianheine

estree's Issues

Formalize SpreadElement

Estree currently doesn't define SpreadElement, which is used by esprima and acorn in ArrayPatterns and ArrayExpressions.

Example:

// [a, ...b] = x;
{
  "type": "AssignmentExpression",
  "operator": "=",
  "left": {
    "type": "ArrayPattern",
    "elements": [{
      "type": "Identifier",
      "name": "a"
    }, {
      "type": "SpreadElement",
      "argument": {
        "type": "Identifier",
        "name": "b"
      },
    }],
  },
  "right": {
    "type": "Identifier",
    "name": "x"
  }
}

Edit:

I'm not suggesting/implying that it has to be SpreadElement or that it has to be used in both cases, just we have to properly define it (them) for ArrayPatterns and ArrayExpressions.

Distinguish yield vs yield*

Currently YieldExpression does not tell the difference between yield and yield*.

Support new.target

Apologies if this is already in the spec; I didn't see it anywhere. Mozilla is implementing this now; the current patch converts new.target to this:

{ type: "NewTargetExpression" }

https://bugzilla.mozilla.org/show_bug.cgi?id=1141865

Super should not be an Expression

In fact, it should be an node and we need to extend member expression and call expression to allow Super at that location.

In ImportSpecifier, the two Identifiers can have overlapping location information

For example:

import {a} from "a";

Will parse into a ImportSpecifier with duplicate imported and local Identifiers. They would hold overlapping location information.

`kind` field for Literal

It is very useful for basic type checking, and IMO is a bit less hackish (and easier) than typeof node.value. Here's my idea:

interface Literal <: Node, Expression {
  type: "Literal";
  value: string | boolean | null | number | RegExp;
  // addition, the type
  kind: "string" |
    "boolean" |
    "null" |
    "undefined" |
    "number" |
    "regexp";
}

Versioning

We've been introducing many changes without shared guiding principals in what feels to me like a hurry. I'd like to use GitHub version tags to indicate stability, and in particular to identify some commit as documenting the greatest spec subset shared amongst pre-ESTree technologies (think of it as analogous to DOM0) to make clear the impact of changing various aspects on a scale from "unnoticeable" to "break the web".

Further, we should make such versioning semantic and document a process for releasing new major/minor/patch increments.

Zero-indexed lines and columns

We're in the process of speccing out error objects for the Broccoli build tool (which powers Ember CLI, among other things), and we'd like to solicit your opinion on whether Broccoli should make lines and columns 0-indexed or 1-indexed:

err.broccoliInfo.firstLine // 0-indexed? 1-indexed?
err.broccoliInfo.firstColumn // 0-indexed? 1-indexed?

The choice here mostly affects whether Broccoli's API is easy to use and understand - plugins will simply add or subtract 1 to convert between conventions as necessary.

My first intuition was to make line and column both 1-indexed, because that's what we want to display. But @sebmck pointed out to me that ESTree (and thus Esprima and Acorn) make line 1-indexed and column 0-indexed. This seems a bit surprising to me; and it's not universal - e.g. libsass uses 1-indexed line and column, and coffee-script uses 0-indexed line and column. But if ESTree's "1-indexed line and 0-indexed column" thing is common enough, it might be worth copying for us.

So I'd like to know:

How common is ESTree's "1-indexed line and 0-indexed column" convention in and outside of JavaScript land?
Do you have any opinion on which convention Broccoli should use?
Are there any conventions for whether ranges are inclusive or exclusive (e.g. represented by firstLine, firstColumn, lastLine, lastColumn)?

Shift initiative

I'm interested in the groups opinion on the Shift AST, which is heavily influenced my SpiderMonkey but breaks compatibility by reducing the number of invalid programs that the AST may represent.

I wonder if the goal of ESTree is simply to have the community standardize the SpiderMonkey AST, our if it aims to address issues and build a refined and more convenient AST for ECMAScript tooling.

.raw property of literal nodes, or something similar

Discussed some in #6. The raw property of literal nodes is an Esprima extension.

I have a specific use case for the raw property, or at least for more information than is provided by value: in asm.js, the distinction between an int literal and float literal is significant and distinguished by the type system. Writing an asm.js validator in JS requires the ability to distinguish between e.g. 17.0 and 17, but the value property does not distinguish these.

OTOH, @michaelficarra points out that AST nodes might be produced by tools that don't want to have to specify the raw source. A few options I can see:

just leave raw unspecified and treat asm.js as a special case that is adequately served by Esprima's extended behavior
add a spec for additional node data to be produced by parsers but optional for other tools, and include raw in that
specify just the coarse-grained "type" of the lexeme, like type: "int" | "float" | "string" | "boolean" | "null" | "RegExp"
specify finer-grained lexical class information, something like type: "hex" | "octal" | "decimal" | "float" | "boolean" | "null" | "RegExp"

I am hesitant to generalize based on the one use case of asm.js. Without knowing of other use cases, I think the most conservative step is just to document the existing practice of raw in an optional spec.

Thoughts?

Destructuring assignment failed in live demo

None of the folowing work at http://esprima.org/demo/parse.html

[a, b] = [1, 2]
[a, b, ...rest] = [1, 2, 3, 4, 5]
{a, b} = {a:1, b:2}

https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Operators/Destructuring_assignment

May I join?

I would like to be a member of this effort. I have on several occasions in the past tried to build support in the tooling community around similar efforts, most notably with this:

https://github.com/getify/concrete-syntax-tree

It didn't really get very far, but I continued working on such changes in my own fork of esprima since then. I would very much like to represent the needs of a tree format that can (optionally) preserve the concrete syntax elements (whitespace, comments, extraneous ( and `) pairs, etc).

Merging procedure

We need a more formal ratification process beyond the current informal one of just getting a few people to 👍.

Questions:

How are voting rights going to be handled? Who are the necessary people required to approve a change or addition?
How are we going to handle disagreements if a consensus can't be achieved? Majority rules?
What exactly is a participating member?

Standardize new RegExp literal AST format

In Esprima, /foo/gi now gives:

{
  type: "Literal",
  value: /foo/gi,
  regex: {
    pattern: "foo",
    flags: "gi"
  }
}

The advantage of having the separate regex property is that even in environments that don’t support a given flag (e.g. the u flag) the regex can still be represented in the AST. The value property is null in such a case.

See https://code.google.com/p/esprima/issues/detail?id=557#c4.

cc @sebmck @michaelficarra

Proposal: estree/tests

Now that we have most of ES6-related issues resolved, it became obvious that every implementor benefits from working together on a single spec, as developers can focus on making tools better, cleaner and faster instead of resolving interop issues.

I think that next step towards making interop easier would be creating repo under ESTree org with automated tests with exported API that could be used by any spec-compliant parser. That way, any representation that was already agreed in this repo, will be reflected in tests, and parsers could use them either as devDependency or git submodule for making sure that everyone parses things in the same way or throws error when needed (that said, tests should be still flexible enough to allow different implementors to throw different error messages and not to include parser-specific node extensions).

Thoughts?

/cc @mikesherov @ariya @dherman

Acorn: AssignmentPattern/RestElement vs Esprima: defaults/rest

Currently acorn uses AssignmentPattern and RestElement while esprima uses a defaults array and rest property.

function foo(bar = "", ...items) {}

Esprima

{
  type: "FunctionDeclaration",
  params: [{ type: "Identifier", name: "bar" }],
  defaults: [{ type: "Literal", value: "" }],
  rest: { type: "Identifier", name: "items" },
  body: {
    type: "BlockStatement",
    body: []
  }
}

Acorn

{
  type: "FunctionDeclaration",
  params: [
    {
      type: "AssignmentPattern",
      left: { type: "Identifier", name: "bar" },
      right: { type: "Literal", value: "" }
    },
    {
      type: "RestElement",
      argument: { type: "Identifier", name: "items" }
    }
  ],
  body: {
    type: "BlockStatement",
    body: []
  }
}

I had to adapt 6to5 to the acorn behaviour and I must say in my opinion it makes it far easy to work with.

Better define source location expectations

I'm noticing that there are some slight location differences between Esprima and Acorn.

Consider the following:

function bar(a) {
    switch (a) {
        case 2:
            break;
        case 1:
            break;
        //no default
    }
}

In case it's not obvious, I have an intentional empty line at the end of that code (automatically inserted by my editor at the end of files).

In this case, Esprima says that the program range is [0, 133] while Acorn says the range is [0, 134].

For locations, the start is the same but the end is different. Esprima says the program ends on { column: 1, line: 9 } while Acorn says the program ends on { column: 0, line: 10 }.

From these two differences, it seems that Acorn is including the trailing whitespace as part of the program while Esprima is not. I'm not sure which is correct, but it would be nice to have consistency. :)

ASTs should be JSON compatible

Literal#value appears to be the only potentially incompatible property, and even there string/boolean/null are naïvely covered. Regular expressions are not valid JSON, but that's already been dealt with as well. So we're left with numbers, which fortunately are specified as double-precision 64-bit IEEE 754 values and fit comfortably within JSON's infinite precision—the singular exception being numerically-expressed values greater than approximately 2^1024, which are infinite in Javascript.

Embracing 27d0e17 would probably mean introducing NumberLiteral <: Literal with properties like number: { infinite: true } or number: { precise: ".2e309" }/number: { from: "0xFFF…FFF" }/etc. (Non-exhaustive) alternatives include:

formalizing raw
eliminating the wrapper object (e.g., infiniteNumber: boolean instead of number: { infinite: boolean })
closing this ticket to let each toolchain make independent decisions about this edge case

At any rate, though, the topic has been raised frequently enough that I considered it worth documenting.

About correctness: let+const VariableDeclaration vs Statement

Sorry if this was already discussed, I couldn't find anything about it.

Currently a Declaration is a Statement, and VariableDeclaration is extended to represent let and const as well:

interface Declaration <: Statement { }

interface VariableDeclaration <: Declaration {
    type: "VariableDeclaration";
    declarations: [ VariableDeclarator ];
    kind: "var";
}

extend interface VariableDeclaration {
    kind: "var" | "let" | "const";
}

That allows the following AST (or similar), which is not a valid JS program:

 // if (x) let y = 42;
{
  "type": "IfStatement",
  "test": {...},
  "consequent": {
    "type": "VariableDeclaration",
    "declarations": [...]
    "kind": "let",
  },
  "alternate": null,
}

But of course a parser could easily detect this and throw accordingly.

I am trying to get an idea about the direction/principles of estree. Is a simple API favored over correctness/spec compliancy and is correctness offloaded to parser implementations?

Or is this something that should be fixed?

Concise methods node

Esprima currently represents concise methods as Property whose value is a FunctionExpression. This is pretty confusing because ES5 style properties with function values are represented the same way. Even though the method flag is true, that still means the FunctionExpression node represents different syntax in each situation, and that can cause errors such as eslint/eslint#1677

Include JSX Extension

I thought I'd open the discussion around move the JSX extension to the ESTree repo. It could go into an experimental extensions section since it is not part of the spec nor any active proposal. However, it is fairly stable and implemented by several downstream parsers.

Currently this extension lives in the JSX spec repo:

https://github.com/facebook/jsx/blob/master/AST.md

It would be convenient for tooling implementors to have a single place to look at.

Class AST modification proposals

Collected those from comments to #19, some thoughts on what we could change until it's too late:

1. `Class::body::body` -> `Class::body::methods`

interface ClassBody <: Node {
    type: "ClassBody";
    body: [ MethodDefinition ];

How about renaming this to methods instead? For BlockStatement, we can't change it, but here I think we could fix the classStmt.body[0].body[1].value chain into smth more meaningful.

2. `""` in `MethodDefinition::kind`

interface MethodDefinition <: Node {
    type: "MethodDefinition";
    key: Identifier;
    value: FunctionExpression;
    kind: "" | "get" | "set";

How about using "method" here instead of empty string?

3. `constructor`

Do you think we need to parse constructor as separate node type (i.e. ConstructorDefinition or smth) or at least separate MethodDefinition::kind or do we continue to treat it as regular method and leave AST processors to check method.key.type === 'Identifier' && method.key.name === 'constructor'?

Feels inconvenient as it has different semantics than regular method.

Thoughts?

@dherman @ariya @sebmck @michaelficarra

[ES6] TemplateLiteral

SPEC:

https://people.mozilla.org/~jorendorff/es6-draft.html#sec-template-literal-lexical-components
https://people.mozilla.org/~jorendorff/es6-draft.html#sec-template-literals

We need to represent both "tagged" or "untagged" template literals.

For example:

raw`x${y}z`

should have:

raw as an expression
x as a template literal element
y as an expression
z as a template literal element

SpiderMonkey Parser API:

Not implemented

Esprima-harmony:

interface TemplateElement <: Node {
  type = "TemplateElement",
  value: {
    raw: string,
    cooked: string
  },
  tail: boolean
}

interface TemplateLiteral <: Expression {
   type = "TemplateLiteral",
   expressions: [Expression],
   quasis: [TemplateElement]
}

interface TaggedTemplateExpression <: Expression {
  type = "TaggedTemplateExpression",
  tag: Expression,
  quasi: TemplateLiteral
}

Acorn:

Same as Esprima-harmony

Shift-API:

Kinda bad naming...

interface TemplateString : Expression {
  attribute Expression? tag;
  attribute (Expression or TemplateLiteral)[] elements;
};

interface TemplateLiteral : Node {
  attribute string value;
}

AssignmentPattern is a wrong name

Today I realized that in ES6 spec we have an AssignmentPattern, and it is a totally different thing:

AssignmentPattern in ES6 means the pattern that can be put on the left hand side of an "=" assignment.
AssignmentPattern in estree means a BindingElement that has a Initializer.

Furthermore, AssignmentPattern in estree has nothing to do with assignments, except for that we use single equal sign to denote default value, which can also be a part of an assignment...

Proposal: {Pattern|Binding}With{Default|Init}, Default{Pattern|Binding}, Defaulted{Pattern|Binding}...

Thoughts about ObjectPattern (considering ES7)

The nodes that are accepted by ObjectPattern as properties are currently rather loosely defined (no type):

interface ObjectPattern <: Pattern {
    type: "ObjectPattern";
    properties: [ { key: Literal | Identifier, value: Pattern } ];
}

An advantage is that Property nodes can be reused here (and in fact, that's what esprima and acorn are doing). However, if the ES7 object property spread proposal, gets accepted, ObjectPattern (and ObjectExpression) has to be updated to allow this new node. esprima-fb currently uses SpreadProperty.

As soon as different kinds of nodes are allowed, it will be necessary to distinguish between them. This can be done by checking which properties the AST node has. However, since most parsers use Property inside patterns, tools may just check whether the node is a Property, in which case they would make themselves dependent on how that specific parser represents properties inside patterns.

It may be useful to define the nodes that are allowed inside properties more strictly.

Define spec in JSON or other format

Instead of simply defining the spec in Markdown documents, it could be defined in JSON or another format. The Markdown documents could be generated from this data.

Tools could easily consume the spec
The spec used by tools and the written spec would never diverge
Tools could automatically be tested against the spec, without having to write separate tests
All nodes, whatever the ES version, could be included, and different spec documents for ES5, ES6, etc. could be generated automatically
Varying documentation could be generated based off of the data, eg. maybe spec.md doesn't need to display any examples, but for online documentation, they could be included

Eg.

{
    "IfStatement": {
        "properties": {
            "test": "Expression",
            "consequent": "Statement",
            "alternate": "Statement | null"
        },
        "es": 1,
        "syntax": "if ($test) $consequent [else $alternate]",
        "description": "An `if` Statement",
        "example": "if (x) { foo(); }"
    },
    ...
}

I am currently using my own (outdated, not yet updated for ES6) version of something like this for my tool http://www.graspjs.com/ for both traversal, syntax of query languages, and documentation. It would be nice, for me and other tool creators, to simply be able to link to the spec and always to up to date (along with the other benefits I listed above).

Import Specifiers: slightly inconsistent between Acorn and Esprima

Currently, here's the output of each import declaration type (in pseudo-CSON for ease of typing). I also have relevant bugs within this repo about each, if I can find one.

Default import (import foo from 'bar'):

# Acorn
type: 'ImportSpecifier'
id: Identifier('foo')
name: null
default: true

# Esprima
type: 'ImportDefaultSpecifier'
id: Identifier('foo')

# SpiderMonkey
type: 'ImportSpecifier'
id: Identifier('default')
name: Identifier('foo')

Named import (import {foo} from 'bar'):

# Acorn
type: 'ImportSpecifier'
id: Identifier('foo')
name: null
default: false

# Esprima
type: 'ImportSpecifier'
id: Identifier('foo')
name: null

# SpiderMonkey
type: 'ImportSpecifier'
id: Identifier('foo')
name: Identifier('foo')

Namespace import (import * as foo from 'bar'):
Bugs: #9

# Acorn
type: 'ImportBatchSpecifier'
name: Identifier('foo')

# Esprima
type: 'ImportNamespaceSpecifier'
id: Identifier('foo')

# SpiderMonkey
# (unsupported)

Aliased import (import {foo as bar} from 'bar'):

# Acorn
type: 'ImportSpecifier'
id: Identifier('foo')
name: Identifier('bar')
default: false

# Esprima
type: 'ImportSpecifier'
id: Identifier('foo')
name: Identifier('bar')

# SpiderMonkey
type: 'ImportSpecifier'
id: Identifier('foo')
name: Identifier('bar')

I do wonder if the following would be better, a common interface for all of the import specifier types. There are a few backwards-compatibility breaks, but each of them are in areas where Acorn and Esprima already deviate. They are also noted explicitly inline. Any others are truthy/falsy compatible (just as Acorn and Esprima generally are).

// This is a virtual superclass, purely for clarification.
// No .default field, since it's generally redundant.
interface ImportSpecifierNode <| Node {
  type: string;
  id: Identifier;
  name: Identifier;
}

interface ImportDefaultSpecifier <| ImportSpecifierNode {
  type: "ImportDefaultSpecifier"; // Easier to test, consistent with Esprima
  id: Identifier;
  name: null;
}

interface ImportSpecifier <| ImportSpecifierNode {
  type: "ImportSpecifier";
  id: Identifier;
  name: null; // null if and only if it's not aliased.
}

// This can also be ImportBatchSpecifier -- see my comment on issue #9 in this
// repo.
interface ImportNamespaceSpecifier <| ImportSpecifierNode {
  type: "ImportNamespaceSpecifier";
  id: Identifier; // Esprima has better consistency here.
  name: null;
}

This would make for the following examples (using above pseudo-CSON):

# import foo from 'bar';
type: 'ImportDefaultSpecifier'
id: Identifier('foo')
name: null

# import {foo} from 'bar';
type: 'ImportSpecifier'
id: Identifier('foo')
name: null

# import {foo as bar} from 'bar';
type: 'ImportSpecifier'
id: Identifier('foo')
name: Identifier('bar')

# import * as foo from 'bar';
type: 'ImportNamespaceSpecifier'
id: Identifier('foo')
name: null

WDYT?

Module specifiers do not inherit from Node

ExportSpecifier, ImportSpecifier, ImportDefaultSpecifier, and ImportNamespaceSpecifier do not inherit directly or indirectly from Node. Is this intentional? I looked through #11 and #35 but couldn't find any discussion to indicate that that would be the case.

I investigated and found that jquery/esprima#1149 will attach loc to all four node types when requested. The ast-types implementation does not have modules as they are currently specified, but it does have specifiers inherit from an abstract Specifier type, which itself inherits from Node.

Normalization of line-endings in template literals

The spec calls for special processing (normalization) on line-endings in template literals:

http://people.mozilla.org/~jorendorff/es6-draft.html#sec-static-semantics-tv-s-and-trv-s

See specifically the note at the end of that section, which says:

<CR><LF> and <CR> LineTerminatorSequences are normalized to <LF> for both TV and TRV. An explicit EscapeSequence is needed to include a <CR> or <CR><LF> sequence.

So, I have several questions relating to how (if at all?) the AST spec deals with this:

Does the parser do all this normalization before creating the AST, or does the AST need to preserve the actual information in the code so it's handled post-AST (like in interpretation/code-gen/etc)?
If the parser handles the normalization (changing occurrences of U+000D and U+000DU+000A to U+000A) before producing the tree, then should it do that for both the node value and the raw?

My instinct would say that raw should preserve the original U+000D or U+000DU+000A sequences (pre-normalization). However, the spec says that the template literal's raw value is post-normalization, so perhaps the parser/AST should also normalize its raw? Will it be confusing if the AST raw property and the template literal raw property don't match?

But that would mean that you couldn't completely faithfully recreate a JS file that had such line-endings mixed into its template literals. That seems like a bad thing.
The spec says that an actual \r or \r\n escape sequence in the string is not normalized, only the U+000D / U+000DU+000A values themselves. However, the human-readable representation of the AST (which is often JSON stringification) would represent a U+000D value from the code as \r. So how would you tell the difference? Would a \r actually show up as \\r instead?

+@allenwb @RReverser

Add a license

An explicit license would be ideal to ensure that those writing tools around estree could do so without the risk of violating any copyright.

Acorn: ImportBatchSpecifier vs Esprima: ImportNamespaceSpecifier

/cc @RReverser @caridy @IMPinball

MemberExpression is not allowed inside Patterns

With current spec, following code examples cannot be represented correctly:

[...obj.prop] = arr
[this.x] = arr
{a: this.x} = other
...and others

Thinking what is the best way to fix this.

Is it adding Pattern as parent to MemberExpression? Or will it be confusing in contexts like VariableDeclarator.id and Function.params[]?

Destructuring Defaults Spec

Basically an issue to make sure the latest discussion from this thread is included in the docs:
https://bugzilla.mozilla.org/show_bug.cgi?id=932080

File node

So in order for Program to qualify for visiting in the common node-parent pattern, a File node is required in order to wrap Program so it can be visited. This is a node used by Babel and recast etc. Would there be any objection to including this in the ESTree specification? It doesn't necessarily have to be compulsory and can be purely optional.

Module declarations not allowed in Program.body

After the discussions in #38 and #39, module declarations inherit directly from Node instead of Declaration. Because Program.body is defined as [ Statement ], this has the effect of disallowing module declarations in Program.body.

What is the best solution? A couple possibilities:

Define Program.body to be [ Statement | ImportDeclaration | ExportNamedDeclaration | ExportDefaultDeclaration | ExportAllDeclaration ], and add a note saying module declarations are only valid when sourceType is "module".
Create an abstract ModuleDeclaration type from which module declarations inherit, and define Program.body to be [ Statement | ModuleDeclaration ], adding a note saying ModuleDeclaration is only valid when sourceType is "module".
Something else entirely?

I'm happy to put together a PR after a decision is made.

Do we need SuperExpression?

For this, there's ThisExpression in the AST.

Do we need a SuperExpression to represent calls to super?

Eliminate empty .guardedHandlers

I filed a premature drive-by fix to this in this failed PR but we need consensus first. I propose we eliminate the empty .guardedHandlers from the spec since it serves no purpose -- the deprecated spec includes the full definition and meanwhile tools can harmlessly produce an empty .guardedHandlers array and be spec-compliant so it seems likes a harmless fix.

SourceLocation and byte position

I think that it would be good to extend Position to include an offset
function Position() {
this.line = 1; // number >= 1
this.column = 0; // number >= 0
this.offset = 0; // number >= 0
}

Or to create a new type OffsetPosition (or another better name) which extends Position with an offset. Then we get start and end bytes with minimal disruption and without creating more objects (esprima ranges seems nasty for memory).

Extend ExpressionStatement to indicate a directive prologue

In the current syntax tree, a directive such as "use strict" will appear as ExpressionStatement. A tool that consumes the tree and needs to be aware of the strict mode will have to perform an extra step to figure out whether such an ExpressionStatement is representing a directive or not.

In this proposal, a new flag directive is added to ExpressionStatement. The value is the representation of the directive prologue in the source text.

The current state:

> esprima.parse('"use strict"').body[0]
{ type: 'ExpressionStatement',
  expression: 
   { type: 'Literal',
     value: 'use strict' } }

If this proposal is implemented, it becomes:

> esprima.parse('"use strict"').body[0]
{ type: 'ExpressionStatement',
  expression: 
   { type: 'Literal',
     value: 'use strict', },
  directive: 'use strict' }

Additional resources:

ECMAScript 5.1 Section 14.1 on Directive Prologues and the Use Strict Directive
Previous Esprima discussion: https://code.google.com/p/esprima/issues/detail?id=330.
SpiderMonkey issue: https://bugzilla.mozilla.org/show_bug.cgi?id=791294
See discussion on jquery/esprima#1006 for more information

ArrowExpression vs ArrowFunctionExpression

Currently both esprima and acorn use ArrowFunctionExpression.

Import{Batch,Namespace}Specifier identifier field name

Note: this issue has nothing to do with the title of the actual node itself. Any discussion of that is better suited to issue #9.

Here's a quick visual description of the problem.

# Acorn
type: 'ImportBatchSpecifier'
name: Identifier('foo')

# Esprima
type: 'ImportNamespaceSpecifier'
id: Identifier('foo')

Shouldn't we pick either .id or .name?

Firefox, Safari, and Chrome all use 1-based column numbers -- estree should too

Or somehow convince everyone else to use 0-based columns (if it doesn't break the web).

Tracking early proposals

Some early proposals will need to be parsed before they have been completely finalised for use in linters and most notably transpilers. To ensure interoperability an AST needs to be agreed upon prior to their finalisation.

How should these early proposals be regulated?

Allow literal as method names in MethodDefinition.

https://github.com/estree/estree/blob/master/es6.md#methoddefinition

key: Identifier;

MethodDefinition's key is Identifier.

However, in the specification has become a PropertyName.
https://people.mozilla.org/~jorendorff/es6-draft.html#sec-method-definitions

Therefore, it is possible to use NumericLiteral and StringLiteral.
https://babeljs.io/repl/#?experimental=true&evaluate=true&loose=false&spec=false&code=class%20Foo%20{%0A%20%20123%28%29%20{%0A%20%20}%0A%20%20%0A%20%20%27abc%27%28%29%20{%0A%20%20}%0A}

Add example documents

A while ago when I was working on esformatter-phonetic, I was frustrated at the lack of examples in the SpiderMonkey API documentation. To resolve my frustration and potential for error, I wound up creating ecma-scopes to programmatically determine when a scope is lexical/block level.

https://github.com/twolfson/esformatter-phonetic

https://github.com/twolfson/ecma-variable-scope

https://github.com/twolfson/ecma-scopes

I considered going all the way and writing a full suite for every basic case as examples for future developers. However, realized how large a mountain that was. With estree, it seems like a great place to keep examples along side core documentation. Would you be interested in adding such examples?

One example of the proposed examples would be:

Identifier

A name for a reference or property of an object. Given the following example:

var foo = 'bar';
function baz(xyz) { }
foo.wat;
foo.huh();

Each of foo, baz, xyz, wat, and huh are considered identifiers. An example of the AST representation is:

{
  type: 'Identifier',
  name: 'foo'
}

Concrete Syntax in tree

I'd like to open up the discussion of what our various options are for preserving so called "Concrete Syntax" elements in the parse tree format.

What are Concrete Syntax elements?

A simplified definition of "concrete syntax" is any stuff that could appear in a source code program which, when it's parsed, would be otherwise discarded information that an AST (as it stands now) does not represent.

That specifically means that things which are reliably inferable from an AST's nodes are not concrete syntax. For example:

var a = 2,
    b = (a + 2) * 3;

Here, the ( ) around a + 2 is not represented in the AST, because the structure of the tree, combined with operator precedence rules, absolutely implies that it must exist, and moreover a + 2 * 3 would have been a different tree structure. Thus, the ( ) in the original program can be reconstructed from the AST reliably, so it need not be stored separately. It is not concrete syntax.

However:

var a = 2,
    b = a + (2 * 3);

This program includes a pair of ( ) that would already be implied by the tree structure and operator precedence, and thus would also not be stored.

But, critically, they also would not be re-generated when the tree was reconstituted. Why? Because it's impossible to know from the tree alone if the ( ) was really there, or just implied. And code generation takes the conservative path and doesn't make up ( ) where it's not sure they were there, so it leaves them out.

Herein we see that the original ( ) are not abstract syntax, but concrete, and to preserve them is going to require some other solution.

Examples of concrete syntax:

extraneous whitespace (the non-significant kind)
comments
extraneous ( ) (used primarily for readability more than functionality)
... other?

The Rabbit Hole

Just how deep does this rabbit hole go? In the following snippet, every single /*x*/ comment represents a location where whitespace and/or comments can optionally appear as concrete syntax, in addition of course to the fact that some of those locations require significant whitespace, such as /*4*/:

/*1*/class /*2*/ Foo /*3*/ extends /*4*/ Bar /*5*/ {
   /*6*/constructor /*7*/ (/*8*/x /*9*/ = /*10*/ "hello" /*11*/) /*12*/ {
      // ..
   }

   /*13*/ static /*14*/ bam /*15*/ ( .. ) {
      // ..
   }
}

var /*16*/{ /*17*/ x /*18*/: /*19*/ { /*20*/ y /*21*/: /*22*/ z /*23*/ = /*24*/ 2 /*25*/ } } =
   /*26*/new /*27*/ Foo /*28*/ ( (/*29*/) /*30*/ => /*31*/ { .. } /*32*/ ) /*33*/;

Your imagination can probably take it from there. There's a whole slew of complex ES6 syntax forms which implies a deep rabbit hole of nooks and crannies where we need to be able to preserve information (in some way) that our normal approach to AST doesn't currently preserve.

Consider the tree structure for the arrow function expression, for example... how and where could we represent /*29*/ and /*30*/? /*31*/ and /*32*/ are a little clearer.

Why Concrete Syntax?

Why would we want to preserve all these pieces of concrete syntax? There's several use-cases:

Any tool which performs localized transformations on a source code file, which doesn't want to change everything, but only a targeted specific thing. For example, a tool that does nothing but rewrite all variable names to uppercase versions (for whatever silly reason).

The goal of this tool is not to affect anything else about the program, such as formatting and comments, as those might still be important to the author of the file.
Fully configurable automated code formatting: not just rule based, like "always put a space between an ( ) on a call" or stuff like that, but more fine-grained rules, where a tool might parse a source program and produce a tree structure with very specific information in it about how the resultant code should be re-generated.

For example, it may automatically insert comments for each parameter in a function declaration with some sort of annotations about how and where the param is lexically used, etc. Or a code-style painter may "repaint" a file with spaces for indentation vs tabs, or may insert spaces for alignment and indentation with tabs, etc etc.

I could go on speculating, but I'll just leave it that there are definitely cases for tools which want to be able to preserve concrete syntax wherever it appears. Since concrete syntax by definition cannot be inferred, the parser and data structures that come out of it must be able to do so. Moreover, this information must be something a code generator can receive and use.

How?

Here's the part where all the bikeshedding is going to happen.

To date, conversations around this topic have happened many times that I've been privy to, and there's never been any kind of consensus on how to approach solving it. I have my ideas, but I'm only going to suggest them as a possible starting point proposal, not that it has to be this way.

CST-as-AST Proposal

I believe we should have one unified tree structure, which has optional -- what I call "extras" -- annotations (and in a few limited cases, nodes) in it which represent the necessary hooks for preserving these concrete syntax elements. In other words, I propose that there be no difference between an AST and a CST (concrete syntax tree), other than the absence or presence of CS elements in the tree.

Any tool which currently produces ASTs is a tool that's producing CSTs by default, but which just happens to not actually be keeping any of the CS elements yet. These tools can start keeping the CS elements, but still have the same style of tree they're producing, just with extra info in them.

Any tool which consumes ASTs is a tool that's already consuming CSTs by default, but which is just ignoring any CS elements which may be present. These tools can start using the CS elements they find.

It turns out that most of the places where we need to preserve CS elements can be added as additional properties (again, I call it "extras", with extra sub-names like "before", "inside", and "after" for positioning), which means that there would be zero impact to the existing tools that use such a tree format.

Tree producing tools (parsers, transpilers, etc) could just not produce these annotations, but things still work fine. Tree consuming tools continue to consume the trees as they currently do, and just ignore the these extras as they currently do.

I believe this has the most minimal impact to the existing tool ecosystem, and thus the easiest path to wider adoption by more tools.

Downside

There will be some minor places where the node structure will have to be slightly different to accommodate some of the trickier cases of CS positioning.

For example, anonymous function expressions that have an id of null means that we don't have an object value in that node to attach any extras annotations to in the function/*1*/() position.

If we could represent an anonymous name entry instead of id: null as an object like id: { extras: .. }, this will mean we have a hook to annotate those extras.

This does represent a slight breaking change to the format, but it's not a major sweeping new tree format, and will require on the whole just a small bit of extra handling care. These necessary node structure breaking changes are minor and very few for the pre-ES6 tree structure (aka SpiderMonkey).

The new ES6 forms definitely add more places where we should consider tree structure from the beginning which are amenable to attaching these annotations. Since there's not already an established standardized ES6 tree format, I think it's not too late for us to consider these concerns as we specify the ES6 node forms.

What would an Incremental Parsing api look like?

Esprima just closed a 3 year old ticket on Incremental parsing:

Once we have a better idea how to implement this, we'll restart the topic

Meanwhile in Shift Parser,

What about the interface? Do you want to change all the lIst in the AST to generators?

What do people think such an API might look like? What would be an ideal outcome for an incremental AST?

singular .handler property?

Back in 2012 I suggested a backwards-incompatible tweak to the TryStatement API to have a handler property instead of handlers, and it looks like Esprima's change didn't match it. So now SpiderMonkey and Esprima do not behave compatibly for TryStatements.

The change would have been a more reasonable API but is it way too late for implementations to change to match it?

If we can't change implementations, then the spec should have handlers instead of handler and I'll file a SpiderMonkey bug to match that behavior.

Add flag on Program node to indicate module code?

It would be nice to be able to tell if a given Program represents module code or not. The Shift AST actually has two different nodes Script and Module to represent the source type. It's probably too big of a change to go for a separate node in ESTree, however, what about a property on Program? Maybe sourceType as "script" or "module"?

JSON format

TLDR: are you interested in providing an official json format of the specification? I have made one, it lives here: https://github.com/kamicane/nodes/blob/master/spec.json

I recently started working again on my ast walk / validate / select library, nodes. I needed to update the es6 syntax since I was using a really old version from the mozilla website. I used to generate it with functions, but after a (short) discussion with @benjamn, i decided to try and make the spec into an easily consumable json format, that I now use to generate the (fake) multiple inheritance at runtime.

It would be fantastic if I could require the spec.json file directly from estree.

The only real problem I found with the specification (regarding using it to automatically generate classes from ast objects) lies with AssignmentProperty. Since it inherits from Property, but has "Property" as a type, it is impossible to know, from the ast, which one is it.

estree / estree Goto Github PK

estree's Introduction

The ESTree Spec

AST Descriptor Syntax

ESTree Steering Committee

Copyright and License

Philosophy

Acknowledgements

estree's People

Contributors

Stargazers

Watchers

Forkers

estree's Issues

1. Class::body::body -> Class::body::methods

2. "" in MethodDefinition::kind

3. constructor

SPEC:

SpiderMonkey Parser API:

Esprima-harmony:

Acorn:

Shift-API:

Identifier

What are Concrete Syntax elements?

The Rabbit Hole

Why Concrete Syntax?

How?

CST-as-AST Proposal

Downside

Recommend Projects

Recommend Topics

Recommend Org

Jobs

1. `Class::body::body` -> `Class::body::methods`

2. `""` in `MethodDefinition::kind`

3. `constructor`