GithubHelp home page GithubHelp logo

kach / nearley Goto Github PK

View Code? Open in Web Editor NEW
3.6K 45.0 232.0 2.58 MB

๐Ÿ“œ๐Ÿ”œ๐ŸŒฒ Simple, fast, powerful parser toolkit for JavaScript.

Home Page: https://nearley.js.org

License: MIT License

JavaScript 56.83% HTML 32.82% CSS 2.63% Nearley 7.72%
parser earley-algorithm earley-parser javascript nodejs nearley node parsing parsing-library

nearley's Introduction

nearley โ†—๏ธ

JS.ORG npm version

nearley is a simple, fast and powerful parsing toolkit. It consists of:

  1. A powerful, modular DSL for describing languages
  2. An efficient, lightweight Earley parser
  3. Loads of tools, editor plug-ins, and other goodies!

nearley is a streaming parser with support for catching errors gracefully and providing all parsings for ambiguous grammars. It is compatible with a variety of lexers (we recommend moo). It comes with tools for creating tests, railroad diagrams and fuzzers from your grammars, and has support for a variety of editors and platforms. It works in both node and the browser.

Unlike most other parser generators, nearley can handle any grammar you can define in BNF (and more!). In particular, while most existing JS parsers such as PEGjs and Jison choke on certain grammars (e.g. left recursive ones), nearley handles them easily and efficiently by using the Earley parsing algorithm.

nearley is used by a wide variety of projects:

nearley is an npm staff pick.

Documentation

Please visit our website https://nearley.js.org to get started! You will find a tutorial, detailed reference documents, and links to several real-world examples to get inspired.

Contributing

Please read this document before working on nearley. If you are interested in contributing but unsure where to start, take a look at the issues labeled "up for grabs" on the issue tracker, or message a maintainer (@kach or @tjvr on Github).

nearley is MIT licensed.

A big thanks to Nathan Dinsmore for teaching me how to Earley, Aria Stewart for helping structure nearley into a mature module, and Robin Windels for bootstrapping the grammar. Additionally, Jacob Edelman wrote an experimental JavaScript parser with nearley and contributed ideas for EBNF support. Joshua T. Corbin refactored the compiler to be much, much prettier. Bojidar Marinov implemented postprocessors-in-other-languages. Shachar Itzhaky fixed a subtle bug with nullables.

Citing nearley

If you are citing nearley in academic work, please use the following BibTeX entry.

@misc{nearley,
    author = "Kartik Chandra and Tim Radvan",
    title  = "{nearley}: a parsing toolkit for {JavaScript}",
    year   = {2014},
    doi    = {10.5281/zenodo.3897993},
    url    = {https://github.com/kach/nearley}
}

nearley's People

Contributors

airportyh avatar alexandertrefz avatar aliclark avatar alyssarosenzweig avatar aredridel avatar bandaloo avatar bates64 avatar bojidar-bg avatar cameronhunter avatar coolreader18 avatar corwin-of-amber avatar deltaidea avatar hardmath123 avatar heatherleaf avatar henry-alakazhang avatar jakesidsmith avatar jaukia avatar jcorbin avatar kach avatar kanef avatar kasbah avatar neunato avatar robroseknows avatar rwindelz avatar sandiegoscott avatar seiyria avatar simonhildebrandt avatar tjvr avatar vietlq avatar yafahedelman avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

nearley's Issues

Terminal comments

Comments at the end of the file don't work because they expect a \n.

2 parses when there should be 1

Thanks for your lightning quick response on my last bug report Hardmath123. Another suspected bug now:

The grammar below produces 1 parse for $a=1; but 2 parses for $a =1;

Unless I'm overlooking something in my grammar, I can't see why this would be the case.

program -> _ block {% function(d) { return d[1]; } %}
block -> (statement _):* {% function(d) { return ["block", d[0].map(function(s){return s[0];})]; } %}
statement -> expression _ ";" {% id %}
expression -> expression _ ("="|"=="|"!="|">"|"<"|"<="|">=") _ sum {% function(d) { return ["operation", d[0], d[2], d[4]]; } %} | sum {% id %}
sum -> sum ("*"|"/") product {% function(d) { return ["sum",d[0],d[1],d[2]]; } %} | product {% id %}
product -> product ("*"|"/") exp {% function(d) { return ["product",d[0],d[1],d[2]]; } %} | exp {% id %}
exp -> unaryoperation "^" exp {% function(d) { return ["exp",d[0],d[1],d[2]]; } %} | unaryoperation {% id %} # this is right associative!
unaryoperation -> unaryoperation _ ("++"|"--") {% function(d) { return ["unaryoperation",d[0],d[2]]; } %} | mapoperation {% id %}
mapoperation -> mapoperation _ "[" _ expression _ "]" {% function(d) { return ["map",d[0],d[4]]; } %} | element {% id %}
element -> variable {% id %} | number {% id %} | "(" _ expression _ ")" {% function(d) { return d[2]; } %} | "{" _ block "}" {% function(d) { return d[2]; } %} | "if" _ expression _ expression {% function(d) { return ["if",d[2],d[4]]; } %} | "while" _ expression _ expression {% function(d) { return ["for",d[2],d[4]]; } %}
variable -> "$":? [a-z]:+ {% function(d) { return ["variable", d[0], d[1].join("")]; } %}
number -> [0-9]:+ {% function(d) { return ["number", d[0].join("")]; } %}
_ -> ___:* {% id %}
__ -> ___:+ {% id %}
___ -> [\t \n] {% empty %} | mlcomment | slcomment
mlcomment -> "/*" mlcommentchars:+ .:? "*/" {% function(d) { return [];/*["comment", d[1].join("")+d[2]];*/ } %}
slcomment -> "//" [^\n]:* "\n" {% function(d) { return [];/*["comment", d[1].join("")];*/ } %}
mlcommentchars -> "*" [^/] {% function(d) { return d[0] + d[1]; } %} | [^*] . {% function(d) { return d[0] + d[1]; } %}

Precedence for ambiguous parsings?

Can't figure out how to change precedence for ambiguous parsings. My grammar is ambiguous because it includes emoji (a subset of unicode) and all unicode characters.

Because of this emoji can be parsed as a single emoji or as two constituent unicode characters--but I can't figure out how to prefer parsings where emoji are parsed as emoji (and not as their constituent characters).

Missing /lib

When i downloaded v0.0.4 from NPM, there was no /lib directory.

I recommend fixing it with a new version number.

Stuck at step 0

Alright, I've done the following:

  1. Installed nearley via npm.
  2. Created a parser.ne file.
  3. Copied and pasted an example from the examples folder here.
  4. Compiled that example to a grammar JS file successfully. (Although the javascript.ne file does not compile for me.)
  5. Run valid input through the parser using the code example in the documentation.

Result:
Error at character 0

Since I'm using all the code provided here, I'm perplexed as to what the problem could be. Even a simple, one-line grammar file fails to actually parse anything. Sooo..... o_O ?

JS Grammar Returns Ambiguous Parsings

Some grammars will have ambigous parsings due to things like
return [2];
being parseable as both return the list [2] or the element at index 2 of the array return.
Nearley may need new features to circmvent this problem easily.

Lazily evaluate postprocessors

Profiling shows that we're running postprocessors too often. Also, we're creating too many this.data = []s, which bloats memory.

Limiting array depth (and other questions)

Next question-- (Sorry for all the questions, I'm trying to evaluate this for a large project, would love to talk to you about it.)

When using the example javascript.js, or even in writing short grammar of my own, the parser seems to generate unnecessarily deep array structures. For instance, with parsing something like this:

(function() {
  var blah = 'blah';
});

I get a structure like this:

screen shot 2015-03-24 at 7 45 50 am

Most of the arrays contain nothing but another array. Even from the text output, you can see it's creating arrays with no value.
screen shot 2015-03-24 at 7 53 19 am

Since each array increases the number of JS objects in memory, is there a way to keep this structure a little flatter and eliminate empty arrays (or arrays with one value which is another array)? Seems like matching entities like variable names or values could be pushed as a single value. Does nearley support that?

Limiting Parsings for Ambiguous Grammars (Bocages)

I know its a long shot and not really something earley was meant for, but are there any methods or optimizations we can implement to deal with ambiguous grammars with exponential parsings? For instance:

num -> num "+" num | [0-9]:+

This seems simple, but nearley can't deal well with large statements of this type because the amount of possible parsings is exponential with the number of plus signs. Could we implement the option of limiting the number of parsings?

disclude `undefined` from data array, provide notreegen

Related to post processing and eliminating nodes, is there a value I can return from a matched token that will eliminate it from output automatically?

For example, it would be nice if I could just drop optional whitespace, such that--

selector _ combinator _ element

-- would return an array of 3 elements instead of 5. I thought I could write:

_ -> null

-- but then that just returned an actual null value. I can see why I might want to post-process based on null, so I'm not sure what to suggest, maybe a special nearley token? Like: %null% or something?

On the GPU

Libraries do exist for interfacing with the GPU in Java Script (such as https://github.com/timoxley/saltmine or other, more mature alternatives which do exist). Parsing is definitely something that can benefit from being on the GPU sometimes, the biggest problem being we'd have to convert parts of the parser too the languages they support on the GPU (such as https://en.wikipedia.org/wiki/OpenGL_Shading_Language). If we do lazily compute post proccessing then this could be quite useful when you have a specific expression that is large, complicated to check, or otherwise would benefit from being on the GPU.

Bug regarding percent symbol '%' in JS postproc code

When a % symbol is inside of a postprocessor block, it causes an error. It seems the parser doesn't ignore % that are found in arbitrary JS code; it consistently complains that there are "no possible parsings" while pointing at whatever symbol - any symbol - after the culprit % symbol.

Example:

main -> number {% function(d) { return d[0] + '%' } %}

number -> [\d]:+ {%
    function(d) { return d[0].join(''); }
%}

Error:

Error: nearley: No possible parsings (@48: ''').
    at Parser.feed (C:\Users\Raymond\AppData\Roaming\npm\node_modules\nearley\lib\nearley.js:219:23)
    at StreamWrapper.write [as _write] (C:\Users\Raymond\AppData\Roaming\npm\node_modules\nearley\lib\stream.js:12:18)
    at doWrite (_stream_writable.js:301:12)
    at writeOrBuffer (_stream_writable.js:288:5)
    at StreamWrapper.Writable.write (_stream_writable.js:217:11)
    at ReadStream.ondata (_stream_readable.js:540:20)
    at ReadStream.emit (events.js:107:17)
    at readableAddChunk (_stream_readable.js:163:16)
    at ReadStream.Readable.push (_stream_readable.js:126:10)
    at onread (fs.js:1679:12)

Tool to test if a grammar with a sample input file is ambiguous

It has been shown that testing whether or not a grammar is ambiguous automatically is impossible / close to it. However, many ambiguities will show up in a sufficiently complex test file. I'd like a simple program that can be attached to a nearleyc build script to run the grammar with a given test file: if there are no parsings / an error parsing, the script raises an error, if there are ambiguities, the script also raises an error. If there is a single, unique parsing, the script gives a return code of 0 / no output.

Would make developing grammars a bit less painful, maybe.

Prevent duplicate states from being added to tables

With the simple grammar from Aycock and Horspool:

S -> A A A A
A -> "a"
A -> E
E -> null

When run against the trivial input:

aa

This yields the following parse tables -- note the duplicate states.

table 0
     { _start โ†’ โ— S },0
     { S โ†’ โ— A A A A },0
     { A โ†’ โ— "a" },0
     { A โ†’ โ— E },0
     { A โ†’ E โ— },0
     { S โ†’ A โ— A A A },0
     { S โ†’ A A โ— A A },0
     { S โ†’ A A A โ— A },0
     { S โ†’ A A A A โ— },0
     { _start โ†’ S โ— },0
table 1
     { A โ†’ "a" โ— },0
     { S โ†’ A โ— A A A },0
     { S โ†’ A A โ— A A },0
     { S โ†’ A A A โ— A },0
     { S โ†’ A A A A โ— },0
     { A โ†’ โ— "a" },1
     { A โ†’ โ— E },1
     { _start โ†’ S โ— },0
     { A โ†’ E โ— },1
     { S โ†’ A A โ— A A },0
     { S โ†’ A A A โ— A },0
     { S โ†’ A A A A โ— },0
     { S โ†’ A A A โ— A },0
     { S โ†’ A A A A โ— },0
     { S โ†’ A A A A โ— },0
     { _start โ†’ S โ— },0
     { _start โ†’ S โ— },0
     { _start โ†’ S โ— },0
table 2
     { A โ†’ "a" โ— },1
     { S โ†’ A A โ— A A },0
     { S โ†’ A A A โ— A },0
     { S โ†’ A A A A โ— },0
     { S โ†’ A A A โ— A },0
     { S โ†’ A A A A โ— },0
     { S โ†’ A A A A โ— },0
     { A โ†’ โ— "a" },2
     { A โ†’ โ— E },2
     { _start โ†’ S โ— },0
     { _start โ†’ S โ— },0
     { _start โ†’ S โ— },0
     { A โ†’ E โ— },2
     { S โ†’ A A A โ— A },0
     { S โ†’ A A A A โ— },0
     { S โ†’ A A A A โ— },0
     { S โ†’ A A A A โ— },0
     { _start โ†’ S โ— },0
     { _start โ†’ S โ— },0
     { _start โ†’ S โ— },0

It does parse correctly, however, yielding

[ [ [ [] ], [ [] ], [ 'a' ], [ 'a' ] ],
  [ [ [] ], [ 'a' ], [ [] ], [ 'a' ] ],
  [ [ 'a' ], [ [] ], [ [] ], [ 'a' ] ],
  [ [ [] ], [ 'a' ], [ 'a' ], [ [] ] ],
  [ [ 'a' ], [ [] ], [ 'a' ], [ [] ] ],
  [ [ 'a' ], [ 'a' ], [ [] ], [ [] ] ] ]

So this is just an efficiency concern.

Right now this is due to the lack of duplication checking (or Set-like datastructure) in State.prototype.process, specifically table[location].push(x);

I'm actively working on this but haven't figured out a tidy way to solve it yet.

`@include` for file in same directory fails

Given grammar a.ne:

a -> "a"

and b.ne:

@import "a.ne"
# or @import "./a.ne"

b -> a:*

I expect the include to work but it throws an exception. The path it ends up trying to include is ./b.ne/a.ne.

Browser tests

nearley should work in the browser. Ideally, there'd be a demo page which compiles samples (like PEGjs/Jison).

Request: Add on-demand compilation like PEG.js' `buildParser` API

My workflow with PEG.js is to only ever use .pegjs files and to include them in node via PEG.js' buildParser API. I find this less error-prone than using generated JavaScript files (no wasted time debugging things and then realizing you forgot to regenerate after changing the grammar).

Failing to parse something: "No possible parsings"

The following grammar fails to parse $a; with No possible parsings (@2: ';').

A change - which shouldn't make any difference - lets it parse correctly. Changing the definition of statement to reference product rather than sum works โ€“ even through sum -> product.

program -> _ block {% function(d) { return d[1]; } %}
block -> (statement _):* {% function(d) { return ["block", d[0].map(function(s){return s[0];})]; } %}
#statement -> expression _ ";" {% id %}
statement -> sum _ ";" {% id %}
#statement -> product _ ";" {% id %}
expression -> expression _ ("="|"=="|"!="|">"|"<"|"<="|">=") _ sum {% function(d) { return ["operation", d[0], d[2], d[4]]; } %} | sum
#sum -> sum ("*"|"/") product | product
sum -> product
product -> product ("*"|"/") exp | exp
exp -> unaryoperation "^" exp | unaryoperation # this is right associative!
unaryoperation -> unaryoperation _ ("++"|"--") {% function(d) { return ["unaryoperation",d[0],d[2]]; } %} | mapoperation
mapoperation -> mapoperation _ "[" _ expression _ "]" {% function(d) { return ["map",d[0],d[4]]; } %} | element
element -> variable {% id %} | number {% id %} | "(" _ expression _ ")" {% function(d) { return d[2]; } %} | "{" _ block "}" {% function(d) { return d[2]; } %} | "if" _ expression _ expression {% function(d) { return ["if",d[2],d[4]]; } %} | "while" _ expression _ expression {% function(d) { return ["for",d[2],d[4]]; } %}
variable -> "$":? [a-z]:+ {% function(d) { return ["variable", d[0], d[1].join("")]; } %}
number -> [0-9]:+ {% function(d) { return ["number", d[0].join("")]; } %}
_ -> ___:* {% id %}
__ -> ___:+ {% id %}
___ -> [\t \n] {% empty %} | mlcomment | slcomment
mlcomment -> "/*" mlcommentchars:+ .:? "*/" {% function(d) { return [];/*["comment", d[1].join("")+d[2]];*/ } %}
slcomment -> "//" [^\n]:* "\n" {% function(d) { return [];/*["comment", d[1].join("")];*/ } %}
mlcommentchars -> "*" [^/] {% function(d) { return d[0] + d[1]; } %} | [^*] . {% function(d) { return d[0] + d[1]; } %}

Zero-lenth assertions?

I'm loving nearley, thank you for building it! This is vastly better than LR/LL parsing and I'm astonished LR/LL still garners so much attention given the limitations.

A question: Is there a way to encode zero-length assertions or otherwise control ambiguity when one nonterminal is an abbreviation of another?

For instance, consider a language of 'a', 'b', and 'ab' tokens where ab should be matchedinstead of 'a' followed by 'b'. Given the rule
tokens -> ("ab" | "a" | "b"):*

"aabab" would ideally parse to ["ab" | "a" | "b"]

Instead, you get 5 matches breaking up the "ab" tokens differently:
[[["a"],["ab"],["ab"]]]
[[["a"],["a"],["b"],["ab"]]]
[[["a"],["a"],["ba"],["b"]]]
[[["a"],["ab"],["a"],["b"]]]
[[["a"],["a"],["b"],["a"],["b"]]]

I can get the ideal output if I write a grammar where a standalone "b" is not allowed. But what if you need that? (e.g., if you're trying to parse "elseif", "else", and "if" distinctly)

I'd like to be able to either enforce priorities on some OR operators, or explicitly rule out look-behind/look-ahead matches without capturing those ruled-out match characters, ala:

tokens ->  ("ab" || "a" || "b")          or    tokens -> ("ab" | "a" !"b" | !"a" "b") 

The best hacks I can come up with far are:

  1. Apply a regular expression in advance that inserts a boundary token at all word boundaries, then modify the grammar to match boundaries. But this makes the grammar much more complex.
  2. Detect and track word boundaries in advance (without changing the input string), then use a postprocessor to reject matches that don't start at a word boundary based on the l parameter. But this only works for leading-edge boundaries.

So far I'm thinking of building 2, and may be able to live without trailing-edge constraints.

Thanks for nearley, and for any ideas!

Show locations of parse errors

Right now the "it didn't work" is hard to debug in a large grammar. Having positions of errors (and what would work at that point) would be awesome.

Readme: install

The command is npm install -g nearley, not npm install -g nearleyc.

Complexity of parsing indented comments

I commented out a block of lines for debugging purposes and ran into some strange behaviour: the complexity of parsing a parser definition seems to grow dramatically with the number and indentation level of subsequent indented comments.

$ node_modules/.bin/nearleyc --version
0.2.2

For example, with a series of files named indentn.ne, where a block of comments is indented n spaces, i.e.

$ cat indent0.ne
foo -> "bar"
#1
#2
#3
#4
#5
#6
$ cat indent2.ne
foo -> "bar"
  #1
  #2
  #3
  #4
  #5
  #6

etc., execution time shoots up rapidly:

$ time node_modules/.bin/nearleyc indent0.ne > /dev/null
real    0m0.116s
user    0m0.096s
sys     0m0.019s
$ time node_modules/.bin/nearleyc indent2.ne > /dev/null
real    0m0.763s
user    0m0.720s
sys     0m0.048s
$ time node_modules/.bin/nearleyc indent4.ne > /dev/null
real    0m8.349s
user    0m8.138s
sys     0m0.256s

time node_modules/.bin/nearleyc indent6.ne > /dev/null has not yet finished. ๐Ÿ˜‰

Is full regex supported?

Is this supported? foo -> [0-9A-F?]{1,6}

I'm getting a "no possible parsings" error when generating the grammar.js file.

Unrolling regexes

@JacobEdelman claims we can take full regexes in a grammar and compile them down to nearley automatically. I believe him, so I hereby assign this to him.

Move nearley.js and generated grammars to strict mode

First I want to thank you for this library. It made me able to experiment freely with the syntax of my toy language. It saved me a lot of time.

Now to the point of this post: while I was making small change here and there in the code I noticed that the nearly.js file and the generated grammar js file are not in strict mode. Adding the 'use strict' directive made the code ~2x faster when parsing my grammar on node.js 4.0. This may not be the case for other grammars and other environments, but I think it is worth trying.

Add performance tests

To see how large grammars and large inputs perform, and make sure an update doesn't dramatically slow it down.

EBNF support

It would be nice to compile grouping and a kleene operator to pure BNF.

Comments

Allow comments: anything after %% should be ignored on a line.

Use require architecture

When compiling to nodejs, the compiler should use require('nearley') instead of copying out nearley.js literally.

Free memory in completed states

Completed states in the table can be null'd and fed to the gc. This ought to free up memory (someone correct me if I'm wrong).

Named tokens

Provide a way to name a token, and bind these names by augmenting the data array to an object.

a -> name:string {% function(d) {return d.name;} %}

Allow full unicode literals in strings

From @beaugunderson

another aside: it seems like the only way to include unicode literals (like
\u2e03) is via a character class; is that intentional?

Note that for now, you can include the unicode literal, uh, "literally" like this:

a -> "cafรฉ"

Alternatively, charclasses:

a -> [\uxxxx]

Spurious parses stemming from nullable nonterminals

The following grammar:

@builtin "whitespace.ne"

d -> a

a -> b _ "&"
   | b

b -> letter
   | "(" _ d _ ")"

letter -> [a-z]

when run using nearley-test on (x), generates two (identical) parses [ [ [ '(', null, [ [ [ [ 'x' ] ] ] ], null, ')' ] ] ]. Since the above grammar is unambiguous, this is unexpected.

Notice that this does not occur if you omit the rule a -> b _ "&", which does not appear in the derivation, and this is even more unexpected. It has to do with the order of prediction.

I am preparing a pull request that suggests a fix.

feedback from a noob

hi, I am new to the world of language parsers, and I took some notes while wrapping my head around nearley, perhaps they can be useful:

nearley notes

from calculator: "main is the nonterminal that nearley tries to parse, so we define it first."

vs from readme:

"The first nonterminal you define is the one that the parser tries to parse."

glossary

  • nonterminal - basic parser constructions, made up of a name and expansions
  • name - the left side of the -> in a nonterminal
  • expansions - the stuff on the right side of the -> in a nonterminal. you can have many of these if they are | separated
  • postprocessor - defined inside {% %} blocks at the end of production rules
  • production rules - AKA 'meanings', the name for the overall expression including nonterminals and postprocessors
  • id postprocessor - built in postprocessor that is a shorthand for doing function(data) {data[0];}

JS preprocessors

I'm quite accustomed to CoffeeScript, so I like to use it here and there. It would be nice if I can use a custom preprocessor with nearley (e.g. Babel, CoffeeScript, 5to6, PromisedLand...).

Add Leo reductions for right-recursive grammars

Using the tweaks to the algorithm from Leo, Joop "A general context-free parsing algorithm running in linear time on every LR(k) grammar without using lookahead", Theoretical Computer Science, Vol 82 (1991)

Why is the generated JS so verbose?

This is not meant to be a criticism but rather, I'm wondering why the generated JS is so "human-readable" and preserves all the named tokens. Is there any reason why the JS file generated from the grammar needs to be readable? Nearley doesn't need to know the names of the tokens, does it? Is that better for testing?

I was just taking a look at the parser file for javascript.js, and I think you could probably reduce the generated code to maybe 1/4 the size. And you could probably reduce the memory footprint by prototyping objects. ( new Literal("w") vs { "literal": "w" }. Objects that have the same "shape" can be optimized by the JIT compiler, whereas { "literal": "w" } and { "literal": "h" } won't necessarily be detected as having the same shape, IIRC.)

(Also, the nearley parser then could perform different actions on symbols based on a matching type, rather than reading the property name "literal".)

.... I suppose you're going to make me write this, lol.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.