acornjs / acorn Goto Github PK

View Code? Open in Web Editor NEW

10.2K 174.0 841.0 6.19 MB

A small, fast, JavaScript-based JavaScript parser

JavaScript 99.89% Shell 0.01% HTML 0.11%

acorn's Introduction

Acorn

A tiny, fast JavaScript parser, written completely in JavaScript.

Community

Acorn is open source software released under an MIT license.

You are welcome to report bugs or create pull requests on github.

Packages

This repository holds three packages:

acorn: The main parser
acorn-loose: The error-tolerant parser
acorn-walk: The syntax tree walker

To build the content of the repository, run npm install.

git clone https://github.com/acornjs/acorn.git
cd acorn
npm install

Plugin developments

Acorn is designed to support plugins which can, within reasonable bounds, redefine the way the parser works. Plugins can add new token types and new tokenizer contexts (if necessary), and extend methods in the parser object. This is not a clean, elegant API—using it requires an understanding of Acorn's internals, and plugins are likely to break whenever those internals are significantly changed. But still, it is possible, in this way, to create parsers for JavaScript dialects without forking all of Acorn. And in principle it is even possible to combine such plugins, so that if you have, for example, a plugin for parsing types and a plugin for parsing JSX-style XML literals, you could load them both and parse code with both JSX tags and types.

A plugin is a function from a parser class to an extended parser class. Plugins can be used by simply applying them to the Parser class (or a version of that already extended by another plugin). But because that gets a little awkward, syntactically, when you are using multiple plugins, the static method Parser.extend can be called with any number of plugin values as arguments to create a Parser class extended by all those plugins. You'll usually want to create such an extended class only once, and then repeatedly call parse on it, to avoid needlessly confusing the JavaScript engine's optimizer.

const {Parser} = require("acorn")

const MyParser = Parser.extend(
  require("acorn-jsx")(),
  require("acorn-bigint")
)
console.log(MyParser.parse("// Some bigint + JSX code"))

Plugins override methods in their new parser class to implement additional functionality. It is recommended for a plugin package to export its plugin function as its default value or, if it takes configuration parameters, to export a constructor function that creates the plugin function.

This is what a trivial plugin, which adds a bit of code to the readToken method, might look like:

module.exports = function noisyReadToken(Parser) {
  return class extends Parser {
    readToken(code) {
      console.log("Reading a token!")
      super.readToken(code)
    }
  }
}

acorn's People

Contributors

Stargazers

Watchers

Forkers

p01 pvdz mishoo kukawski koto abraidwood dineshkummarc jasonsanjose diogogmt arian mrcarlberg sandywu keeyipchan keeyip zsjforcn slavah benekastah eztierney hellcoderz tml cythrawll chiehwen loarabia b-studios sqs brenmar ryanackley wjx dangoor karlbohlmark c9 runsun chaosim monjaro yessky tonylukasavage nvdnkpr l0g1k apatil sanyaade-teachings seanocat btmills bellis dwardin justinkempton nangal saksoy technomage masrud samhou1988 csnw wing9405 alexmckenley chughes87 differentmatt catamorphic vivekkiran conradirwin incerpt nmaier basicer angleman netvarun cecchi spion spielmeistercom rreverser johannesherr rwaldron jormoderlaila jordemoderkonsultation nodejstw xiemaisi dabbler0 tcr mrennie ezuniga rich-harris watilde simudream fitzgen kossnocorp plng zucatti baishuiz ikarienator sporeventexplosion gratex jaredly sanderspies hzoo marcominetti geppy metaworx vbmaarten robertomalatesta happibum jamiebuilds jmm curtisz

acorn's Issues

Feature Request: add new visitors to a recursive walker

Could a slight modification to acorn.walk.make be made that allows for the creation of new walker visitors?

I have some code in a known structure I'm trying to parse and wanted to handle these parts of the AST specially.

In essence, I would like to be able to write

acorn.walk.recursive(ast, state, {
  ObjectExpression: function(n, st, c) {
    if (..some rule..) c(n.properties[0], st, 'SpecialPropertyIdent')
  },
  SpecialPropertyIdent: function(n, st, c) {
    ...
  }
});

Right now, I have to make a walker and then attach my new visitors onto it later, which seems a little inconvenient.

Issue with range information and parenthesis

I am in the process of porting the PaperScript parser of Paper.js over to using Acorn.js, using the range information in the AST to directly modify the original source, instead of modifying the AST and converting it back to code. I decided to do so for tow reasons: To preserve line numbers in syntax errors when evaluating the resulting code, and also to keep total code size down, since I can then skip the inclusion of Escodegen.

Things are working pretty well already, but I have discovered that the range information is a little off in certain cases:

In the following statement, I am getting a wrong start offset in the range, excluding the inner opening parenthesis:

if ((1) === 1) {
}

If I log the substrig of the whole BinaryExpression to the console, I get:

1) === 1

Instead of

(1) === 1

This seems to be a bug, right?

strict mode?

Line 17: "strict mode"

Isn't it supposed to be "use strict" ?

:-)

parser fails on "if(1)/x/"

The regex is not parsed correctly in this case: "if(1)/x/"

master ~/work/acorn $ bin/acorn regr003-google.js 
{
  "type": "Program",
  "start": 0,
  "end": 8,
  "body": [
    {
      "type": "IfStatement",
      "start": 0,
      "end": 8,
      "test": {
        "type": "Literal",
        "start": 3,
        "end": 4,
        "value": 1,
        "raw": "1"
      },
      "consequent": {
        "type": "ExpressionStatement",
        "start": 6,
        "end": 8,
        "expression": {
          "type": "Literal",
          "start": 6,
          "end": 8,
          "value": {},
          "raw": "x/"        <<<< missing first slash
        }
      },
      "alternate": null
    }
  ]
}

master ~/work/acorn $ git describe --tags
v0.1-6-g9a55d60

master ~/work/acorn $ cat regr003-google.js 
if(1)/x/

master ~/work/acorn $ esparse --raw regr003-google.js 
{
    "type": "Program",
    "body": [
        {
            "type": "IfStatement",
            "test": {
                "type": "Literal",
                "value": 1,
                "raw": "1"
            },
            "consequent": {
                "type": "ExpressionStatement",
                "expression": {
                    "type": "Literal",
                    "value": {},
                    "raw": "/x/"   <<<< not missing slash
                }
            },
            "alternate": null
        }
    ]
}

Single line comments

Comments that use the // syntax dont seem to show up in the AST even when trackComments is true

Parse failure on MooTools

MooTools 1.4.1 does not get parsed correct. It gives the following error:

"Unsyntactic break (2332:45)"

where line 2332 is

if (name == '*' && this.brokenStarGEBTN) break simpleSelectors;

Most likely simpleSelectors was not in the list of detect labels?

html comments are valid in JS

I've been having an interesting discussion over on literalizer issue #19 about whether HTML comments should be parsed as "valid JS".

If you check it, the following code is totally valid in most JS engines:

var a = 2;
<!-- a = 3;
--> a = 4;
console.log(a); // 2

More specific info from @mathiasbynens can be found on this StackOverflow question.

Briefly,  are treated as single-line comment markers. But, --> has to be the first non-whitespace, non-multi-line-terminator content on a line to be treated as such. This comes not from strict ES spec, but from the extended "Web ES" spec which JS engines in browsers adhere to.

Even node.js handles these, since it uses V8.

However, it appears that most JS tools (acorn, esprima, traceur, etc) don't handle such things.

Why does this matter?

var a = 1, b = 1; a <!--b;

That code would be parsed differently by these tools than by the JS engines. That's a "bad thing"™.

So, should(n't) JS tools adhere to what the browser engines will do with JS rather than the pure academic ES spec?

Moreover, what happens if some tools allow these, and some don't. Is that a healthy thing, or will it hurt tool interop and thus we should all be unanimous one way or the other?

Infinite loop when parsing expression with parens and slashes

Open http://ternjs.net/doc/demo.html
Select all and paste in this code:
```
a!
/ (b//c)
```
Put the cursor after ) and hit control space

Result:
Infinite loop in Acorn's readWord1(). tokPos has become NaN somehow, so the loop never makes any more progress.

Support for some ES6 features

I would like to parse some js with acorn but it makes use of some es6 features supported by the spider monkey engine. Specifically the use of const and let. I made a kind of quick'n dirty patch to implement support for both and it seems to be working fine. I was wondering if there was any interest in merging such changes. If so I can clean up my patch and submit a PR.

BlockStatement followed by RegExp starting with `=`

The following valid program fails to parse:

{}/=/

Discovered by esfuzz generative testing.

RangeError: Maximum call stack size exceeded on a seemingly simple file

Parsing a file with a single variable declaration initialized to a big string causes RangeError to be raised.

var acorn = require("./acorn.js");
var fs = require("fs");
fs.readFile("./test/jquery-string.js", function(err, data) {
    if (err) throw err;
    acorn.parse(data);
});

master ~/work/acorn $ node test1.js 

/home/denis/work/acorn/acorn.js:804
        return finishToken(_string, String.fromCharCode.apply(null, rs_str));
                                                        ^
RangeError: Maximum call stack size exceeded
    at readString (/home/denis/work/acorn/acorn.js:804:57)
    at getTokenFromCode (/home/denis/work/acorn/acorn.js:646:14)
    at readToken (/home/denis/work/acorn/acorn.js:692:15)
    at next (/home/denis/work/acorn/acorn.js:936:5)
    at eat (/home/denis/work/acorn/acorn.js:1013:7)
    at parseVar (/home/denis/work/acorn/acorn.js:1370:19)
    at parseStatement (/home/denis/work/acorn/acorn.js:1251:14)
    at parseTopLevel (/home/denis/work/acorn/acorn.js:1073:18)
    at Object.exports.parse (/home/denis/work/acorn/acorn.js:42:12)
    at /home/denis/work/acorn/test1.js:5:11

master ~/work/acorn $ node --version
v0.8.20

master ~/work/acorn $ git describe --tags 
v0.1-3-g782259b

Passing --stack-size=1000 to node seem to fix it, but esprima handles the file without any errors without any additional options, so I'm not sure if that's expected behavior or a bug.

npm doesn't have 0.0.2 yet

https://npmjs.org/package/acorn currently gives 0.0.1 rather than the current version.

Safe tokenizer API

Is there any hope to get a safe (as in, that won't reset the internal state in some cases)
tokenizer with Esprima-compatible output?

I would love to be able to compare Esprima's tokenizer() function that
I'm working on at https://github.com/espadrine/esprima

The basic API that I'm looking for is:

tokenizer(inputSource :: String, options :: Object)
:: Array of {type :: String, value :: String, (optional loc field for line information)}

That would output all the tokens in one shot.

Obviously, we can discuss how to deal with /.
Tim Disney seems to have found a good way to work on it at https://github.com/mozilla/sweet.js/wiki/design

Quick Question: Rebuilding the source

Hi Marijn,

I've done a quick search and couldn't find anything, essentially I'm looking for a walker which is the reverse of parse, fn(Object) -> String

Any clues 😄 ?

AST serialization

First, thank you for such a great project.
Is there any way to serialize resulting AST, e.g. after code instrumentation?

TryStatement followed by RegExp starting with `=`

The following valid programs fail to parse:

try{}catch(e){}/=/

try{}finally{}/=/

try{}catch(e){}finally{}/=/

Discovered by esfuzz generative testing.

Code generation?

Not an issue, just a question. Do you know of any projects that can do code generation based on an acorn AST?

Thanks,

Chris

Warning with npm 1.2.21 (node 0.10.7)

npm install acorn

npm http GET https://registry.npmjs.org/acorn
npm http 304 https://registry.npmjs.org/acorn
npm WARN package.json [email protected] No repository field.
npm WARN package.json [email protected] 'repositories' (plural) Not supported.
npm WARN package.json Please pick one as the 'repository' field

`locations` option doesn't work with `parse_dammit`

I can get it to work with parse, but not parse_dammit. By "doesn't work" I mean that the loc object is not included on any nodes.

I'd be happy to attempt to fix this myself, but I'm curious to know if this is currently supposed to work or not.

acorn_loose on really broken code parsing

Awesome lib, thanks so much for it.

I want to handle really broken code parsing, and tried parsing below with parse_dammit.

getDummaryLoc(dummy) throws an error as there is no valid token and I have enabled options.locations

I don't want to fiddle with this code as I'm not worthy, so won't even attempt a pull request.
In the meantime I modified my version to skip setting loc.end if loc.start == undefined

function

** =

function a()
{
}

Can not load acorn in a web worker

Using acorn in a web worker:

importScripts("thirdparty/acorn/acorn.js");

results in the following error: Uncaught ReferenceError: window is not defined

Tag version 0.3.2

0.3.2 appears to be the latest version. It'd be good to tag in on Github so it can be installed through Bower. Thanks!

Only half the exported tokTypes are underscore-prefixed.

See discussion at 6fe1239

fails with "Unsyntactic continue" when continue target is closer than closest loop

Smallest failing case

for(;;){ a: continue a; }

These related programs both parse fine

for(;;){ a: break a; }

a: for(;;){ continue a; }

Discovered by esfuzz generative testing.

prefix increment/decrement of dynamic/static member access of regexp

These valid programs all fail to parse:

++/a/[a]

--/a/[a]

++/a/.a

--/a/.a

Discovered by esfuzz generative testing.

Using range instead of start end

In the interest of standardized format, how about using range to denote the node location instead of start and end pair? I proposed this to Mozilla, see https://bugzilla.mozilla.org/show_bug.cgi?id=745678. Dave Herman seems to agree to that, see also http://calculist.org/blog/2012/07/03/tweaking-the-javascript-ast-api/.

`throw \n 1;` is a parse error

Currently acorn accepts the following program which should result in a parse error (throw is a restricted production)

throw
1

Feature request: parse multiple files into a single AST

This is useful in UglifyJS to compress multiple files and generate a proper source map.

Basically, the parser would receive a Program node and will append new statements to its body instead of creating a new Program. The tricky part is that besides start/end the "loc" property needs also to contain "source" (specified in the SpiderMonkey parser API).

member access to `in` member and division

Each of these valid programs fail to parse:

a.in / b

a.in /= b

Successful esprima parse: http://esprima.org/demo/parse.html?code=a.in%20%2F%20b%0Aa.in%20%2F%3D%20b

This is the first parser bug found by my WIP fuzzer, esfuzz. Hopefully more will be on the way.

guardedHandlers

Not sure if this is a bug or I'm missing something. According to https://developer.mozilla.org/en-US/docs/SpiderMonkey/Parser_API
TryStatement is always expected to have "guardedHandlers" property.

But on this link https://bugzilla.mozilla.org/show_bug.cgi?id=742612 "guardedHandlers" is marked as optional. The latter document could be outdated though.

So, esprima always adds empty array for "guardedHandlers" should acorn do this too?

Escodegen assumes this property always present in AST.

FunctionDeclaration followed by RegExp starting with `=`

The following valid program fails to parse:

function a(){}/=/

Discovered by esfuzz generative testing.

Reserved words are too restricted in ES5 Strict Mode

Reserved word restrictions are overzealous in Strict Mode, e.g. this should be valid:

'use strict';
object.static();

but produces an error:

SyntaxError: The keyword 'static' is reserved

The distinction is between IdentifierName and Identifier in the grammar. Strict mode forbids using reserved words wherever Identifier appears in the grammar, but otherwise reserved words should be allowed, e.g. property accessors.

Node types should be publicly visible

To operate on the syntax tree, it would be helpful to have an enum of node types. Currently these types are string literals.

acorn annotated source code page broken after adding <!-- support

http://marijnhaverbeke.nl/acorn/

That page breaks half-way through, in that the <!-- character in the code is interpreted as an actual HTML comment, thereby hiding the rest of the page's content.

See the view-source:

Bugs in the parser

Hello!

I am using acorn on a project I am working on, and in the process I found a few bugs. I also have some quick fix for some of them, so I am submitting them to you.

I am reporting all of them here, but if you prefer me to open separate issues, please tell me and i will be happy to do that.

However, the bugs I found are:

an if statement build like this: if(expression)throw"Error";else do_something_else() throws an unexpected token" error.
REASON: acorn "closes" the token after the closed quotes, so the semicolon is interpreted as an EmptyStatement, which causes the IfStatement to close. Thus, the "else" token becomes "unexpected".
FIX: Adding in the function finishNode(node, type) (line #924) the line if(type === "ThrowStatement") eat(_semi) fixed the issue for me
the string "something\u0026bsomething else" throws "Bad character escape sequence".
REASON: readInt(radix, len) with radix=16 (Hexadecimal) stops reading only if code is not an Hex digit. So, to the unicode char "ampersand" (\u0026) is added also the next char ("b", which is a valid hex char), causing the number "0026b" to be an invalid unicode char.
FIX: if a len parameter is passed to readInt(radix, len) (line #674), read only "len" chars.
What I did was adding in that function an if(len!=null).
Then, in the if branch, changing the for(;;) to for (var i = 0; i < len; i++).
In the else branch, instead, I left the for(;;) as it was before.
statement labeled with the same name as another one throws "Label <label_name> is already declared".
REASON: in javascript (at least in google-chrome browsers) it's allowed to have the same label name for different statements, if they are not nested. In acorn this seems forbidden.
I am sorry but I don't have a fix for this yet.

Hope this helps,
thank you for your work, I found it really useful!!

Cheers,
Luca

acorn.walk, descending into MemberExpression.property

The base walker code for MemberExpression is

 base.MemberExpression = function(node, st, c) {
    c(node.object, st, "Expression");
    if (node.computed) c(node.property, st, "Expression");
  };

Is there any reason why to descend into node.property only when the node is marked as "computed" ([...] accessor)?

Comments are reported twice when using strict mode.

This code should log the comment "Comment" only once, as it does without the 'use strict' statement.

var content = "\n\
function plop() {\n\
    'use strict';\n\
    /* Comment */\n\
}";

acorn.parse(content, {
  onComment: function (block, text) {
    console.log(text);
  }
});

Tracking whether comments are same line or not

When 'un-parsing' the ast, it would be nice to know whether comments begin on a new line or not. Right not it is not possible to tell. The following code:

var acorn = require('acorn');
var code1 = 'funcall();//comment same line\n';
var code2 = 'funcall();\n//comment next line\n';
var ast1 = acorn.parse(code1,{trackComments:true});
var ast2 = acorn.parse(code2,{trackComments:true});
console.log(JSON.stringify(ast1));
console.log(JSON.stringify(ast2));

outputs:

{"type":"Program","start":0,"end":10,"body":[{"type":"ExpressionStatement","start":0,"end":10,"expression":{"type":"CallExpression","start":0,"callee":{"type":"Identifier","start":0,"end":7,"name":"funcall"},"arguments":[],"end":9,"commentsAfter":["//comment same line"]}}]}
{"type":"Program","start":0,"end":10,"body":[{"type":"ExpressionStatement","start":0,"end":10,"expression":{"type":"CallExpression","start":0,"callee":{"type":"Identifier","start":0,"end":7,"name":"funcall"},"arguments":[],"end":9,"commentsAfter":["//comment next line"]}}]}

Adding "strict" causes locations to be off by the number of newlines after the "strict"

this seems to be caused by the logic that re-reads tokens after strict for pedantic tests.

function setStrict(strct) {
strict = strct;
// not re-reading the tokens for pedantic tests fixes this problem
// tokPos = lastEnd;
// skipSpace();
// readToken();
}

Question

Is it thread safe?
I mean is it safe to parse different pieces of code at the same time?

location data is affected by using the parser inside an `onComment` handler

Not sure if it affects the accuracy of the location data but it does cause some loc.end's to be null.

acorn_loose doesn't correctly parse while(a-->0){}

This happens because lastEnd isn't updated which breaks newline.test(input.slice(lastEnd, tokPos)) check at
https://github.com/marijnh/acorn/blob/b1623b10c13225ae64cf68315960fb8118463efa/acorn.js#L613

Problems with occurrence of 'self' in code.

In paper.js we are loading acorn.js through the vm.runInContext(source, context, uri); method. Executing it this way leads to the following error:

ReferenceError: self is not defined
    at ../../lib/acorn.js:26:7
    at ../../lib/acorn.js:27:3
    at vm.createContext.include (……/node_modules/paper/src/node/index.js:45:6)
    at core/PaperScript.js:19:7
    at vm.createContext.include (……/node_modules/paper/src/node/index.js:45:6)
    at new <anonymous> (paper.js:126:7)
    at paper.js:32:13
    at Context.vm.createContext.include (……/node_modules/paper/src/node/index.js:45:6)
    at Object.<anonymous> (……/node_modules/paper/src/node/index.js:51:9)
    at Module._compile (module.js:456:26)

Replacing 'self' with 'this' solves the issue, and should still work in browsers too. Would this make sense?

paperjs/paper.js#205

Doesn't detect invalid labelled break statements correctly

dance:{;}
while(false) {break dance;}

Produces an AST in Acorn, but in Chrome, Firefox and Esprima it complains of a syntax error that the label dance is not found.

Parse `123..toString(16)` says "Invalid number"

Should parse equivalent to (123).toString(16).

Semicolons after variable declarations

Hi Marijn,

I noticed that for var x = 0;, the variable declaration's end location does not include the semicolon:

> acorn.parse("var x = 0;")
{ type: 'Program',
  start: 0,
  end: 10,
  body: 
   [ { type: 'VariableDeclaration',
       start: 0,
       end: 9,
       declarations: [Object],
       kind: 'var' } ] }

That has been an issue for me when using acorn to break JavaScript code up into top-level statements, because the statements in the output list lack semicolons after variable declarations even when they were present in the original code.

I started a pull request, but noticed that other tests in the suite treat dropping the semicolon as correct. For example, the test for var x /* comment */; wants the variable declaration to end at column 5.

Should I change the other tests to expect the semicolon?

Thanks.

Parse error: "\0" in "strict" mode

In strict mode it doesn't like "\0":

"use strict";
var x = "\0";

errors out with:

/.../acorn/acorn.js:158
    throw new SyntaxError(message);
          ^
SyntaxError: Octal literal in strict mode (2:9)

Invalid use of tokLineStart

In readToken_plus_min, you have the following line of code:

if (next == 45 && input.charCodeAt(tokPos + 2) == 62 && lastEnd < tokLineStart) {

The problem is tokLineStart has no meaning unless options.locations is on, and that check is not being done here. So either you have to always update tokLineStart irrespective of options.locations or find another way to make this test.

Track comments misses comments within an expression

The following example:

var acorn = require('acorn');
var code1 = 'funcall(1/mid expression comment/,2);';
var ast1 = acorn.parse(code1,{trackComments:true});
console.log(JSON.stringify(ast1));

outputs:

{"type":"Program","start":0,"end":39,"body":[{"type":"ExpressionStatement","start":0,"end":39,"expression":{"type":"CallExpression","start":0,"callee":{"type":"Identifier","start":0,"end":7,"name":"funcall"},"arguments":[{"type":"Literal","start":8,"end":9,"value":1,"raw":"1"},{"type":"Literal","start":36,"end":37,"value":2,"raw":"2"}],"end":38}}]}

Track comments duplicates a comment in commentsBefore and commentsAfter

If a comment shows up in commentsAfter on one node, it would be great if it did not show up in commentsBefore on another node. Duplicating comments makes 'unparsing' the ast with comments tricky, as you have to somehow track whether you emitted the comment yet or not. The following example:

ar acorn = require('acorn');
var code1 = 'funcall();/comment between statements/funcall2();';
var ast1 = acorn.parse(code1,{trackComments:true});
console.log(JSON.stringify(ast1));

outputs:

{"type":"Program","start":0,"end":51,"body":[{"type":"ExpressionStatement","start":0,"end":10,"expression":{"type":"CallExpression","start":0,"callee":{"type":"Identifier","start":0,"end":7,"name":"funcall"},"arguments":[],"end":9,"commentsAfter":["comment between statements"]}},{"type":"ExpressionStatement","start":40,"end":51,"commentsBefore":["comment between statements"],"expression":{"type":"CallExpression","start":40,"callee":{"type":"Identifier","start":40,"end":48,"name":"funcall2"},"arguments":[],"end":50}}]}

acornjs / acorn Goto Github PK

acorn's Introduction

Acorn

Community

Packages

Plugin developments

acorn's People

Contributors

Stargazers

Watchers

Forkers

acorn's Issues

Recommend Projects

Recommend Topics

Recommend Org

Jobs