pegjs / pegjs Goto Github PK

PEG.js: Parser generator for JavaScript

License: MIT License

JavaScript 81.08% CSS 13.46% HTML 5.46%

pegjs's Introduction

PEG.js is a simple parser generator for JavaScript that produces fast parsers with excellent error reporting. You can use it to process complex data or computer languages and build transformers, interpreters, compilers and other tools easily.

PEG.js is still very much work in progress. There are no compatibility guarantees until version 1.0

public packages

package	version	dependency status
pegjs

local packages

package	dependency status
~/
~/test
~/tools/benchmark
~/tools/bundler
~/tools/impact
~/tools/publish-dev

pegjs's People

Contributors

Stargazers

Watchers

Forkers

manveru sam-splat celer novakps leto jdarpinian wilkerlucio benw laurie71 steveyen myopiccat nilshero blankyao rwaldron ceymard shamansir gero3 rymohr nicktackes jaekwon bigfei lmorchard jayphelps thinkfuse brandonbloom mulka wakita basecasegmbh andreypopp johnhadj sciumo yocontra langpavel tslocke zina- dignifiedquire sergi otac0n hastebrot saolsen wclaeys curvedmark slavah lontivero ilink itspg almad sheremetyev ganeshv zhaobr robraux wwoods motatoes knewter kennyluck fresheneesz sait2000 awwright sense-hdd dipspb nvdnkpr elemantics ttilley benburkert kgzm crguezl lemonhall for-get sergeyt nanonid rush brettz9 mindonaut hsanchez thangnq vrana wardcunningham tonylukasavage pdubroy dpocock fweinb chiehwen jirkapenzes mingun gerardoperez132 kentaromiura ericmcornelius paradoxxxzero leizongmin seacgroup grncdr chaosim sahilsk lucciano artazor gwicke paulmaynard socialhq khoussem anoopchaurasia

pegjs's Issues

Give predicates access to previously matched labeled expressions

It would be really nice if predicates could access previously matched labeled expressions.

For example, consider a simple HTML parser that is aware of self-closing tags. It would be nice to be able to do something like:

selfclosing_tag
= tag:tag_open "/>"
/ tag:tag_open &{ return tag.name == "br" } ">"

Would this be possible to implement? (my parsing-foo is weak)

And, if so, would a patch to do this be accepted?

Parametrize the grammar by externally-supplied variables

I'll take the example of JinJa (I created a JS clone of it with PegJS) :

You can write {% block my_block %}{% endblock %}, which is the usual syntax.

It is however possible in Jinja to redefine {% and %} to other tokens. I would like to be able to have a rule like tag = tk_open "block" tk_close contents tk_open "endblock" tk_close {...}

Maybe with another syntax ? Like tag = $tk_open "block" $tk_close to indicate these are terminals in a variable ?

The variable could then be declared in the leading { ... } section. This best used when in conjunction with another ticket I opened : "Create an optional argument 'options' in parse()"

PEG.buildParser efficiency

What's the proper way to efficiently define a grammar that includes hundreds (if not thousands) of possible values for a given rule?

I'm using PEG.js to drive an autocomplete on my site and it was working great until the number of values grew into the couple hundred. The parser is built on the fly from a dynamically generated grammar, and it typically locks up the browser for a good 20-30 seconds while building the parser.

Here's an example gist you can try out to see the delay I'm talking about.

https://gist.github.com/1435954

Enhance documentation

Right now, the documentation is just a couple of markdown files with simple examples and quick overviews. We should add more documents:

tutorial/quick-start.md — walk the user through the basic use of PEG.js in Node.js and a browser
tutorial/walkthrough.md — a more detailed version of Tutorial/Quick Start that shows the capabilities of PEG.js on a gradually improved example grammar/parser
api/pegjs.md — complete API reference of PEG.js
api/parser.md — complete API for the generated parsers
spec/current.md — specification for the grammar used in the current release
spec/draft.md — a onging specification for the future versions of the grammar
spec/ast/pegjs-$v.md — a AST that represents the PEG.js grammar for $v

The API references, AST docs and specs should be detailed enough so that anybody would be able to reimplement PEG.js just using these documents.

Return value to ignore output for the rule

In some grammars there are separators that are important to the grammar but uninteresting in the output from the parse tree. Whitespace is the most obvious example.

It'd be cool if there was a way, from within a rule action, to indicate that the output of the rule should not be included in the parse tree.

Since null is already taken to indicate a match/predicate failure, and false is conceivable as a desired return value, perhaps a return value of undefined could be used to cull the rule's output.

New symbol for actions: ->

As mentionned in the mailing list, it is for now impossible to have imbalanced { or } in an action, other than by specifying them by unicode value.

I propose the following syntax ; having a -> instead of a { and that stops whenever indentation decreases, or at the end of the line if there was content right after it.

Exemple :

my_rule
    = c:contents -> return c + "blah";

my_other_rule
    = c:contents m:my_other_rule ->
          c = c + "blah";
          return c + m;

    / contents

I actually implemented in with a preprocessor (that also passes the { ... } value through Coco).

See example there : http://code.ravelsoft.com/node-jinjs/src/f02528b73914/src/expression.pegco

Nota: as a preprocessor, it doesn't solve the whole '{' '}' ordeal, since in the end what I generate is regular pegjs. This would only work if it were at the parsing level.

Is the Online page available for use locally?

I would like to be able to play with PEG.js while I'm traveling next week.

Once it has loaded in the browser, it appears that the http://pegjs.majda.cz/online page works fine without a network connection. Is there a way to load all this into the local file system so that I can debug my grammar without internet access? Thanks.

Ability to specify repetition count (like in regexps)

It would be helpful if the PEG.js grammar allowed something like range expressions of POSIX basic regular expressions to be used. E.g.:

```
"a"\{1,7\}
```
matches a, aa, ..., aaaaaaa
```
"a"\{0,1\}
```
matches the empty string and a
```
"a"\{,6\}
```
matches a string with up to (and including) six a's
```
"a"\{6,\}
```
matches a string of six or more a's
```
"a"\{3\}
```
matches only aaa, being equivalent to "a"\{3,3\}

Make PEG.js's Git repository a valid npm pacakge

PEG.js's Git repository should be a valid npm pacakge, so that development versions can be installed more easily using just the following commands:

git clone git://github.com/dmajda/pegjs.git
npm install ./pegjs

Right now attempt to run these commands ends up with npm error:

npm ERR! Could not install: ./pegjs/
npm ERR! Error: Invalid version: @VERSION
npm ERR! Must be X.Y.Z, with an optional trailing tag.
npm ERR! See the section on 'version' in `npm help json`
npm ERR!     at /usr/local/lib/node_modules/npm/lib/utils/read-json.js:257:13
npm ERR!     at /usr/local/lib/node_modules/npm/lib/utils/read-json.js:132:32
npm ERR!     at P (/usr/local/lib/node_modules/npm/lib/utils/read-json.js:109:40)
npm ERR!     at cb (/usr/local/lib/node_modules/npm/lib/utils/graceful-fs.js:31:9)
npm ERR!     at [object Object].<anonymous> (fs.js:107:5)
npm ERR!     at [object Object].emit (events.js:61:17)
npm ERR!     at afterRead (fs.js:878:12)
npm ERR!     at wrapper (fs.js:245:17)
npm ERR! Report this *entire* log at:
npm ERR!     <http://github.com/isaacs/npm/issues>
npm ERR! or email it to:
npm ERR!     <[email protected]>
npm ERR! 
npm ERR! System Linux 2.6.38-10-generic
npm ERR! command "node" "/usr/local/bin/npm" "install" "./pegjs/"
npm ERR! 
npm ERR! Additional logging details can be found in:
npm ERR!     /home/dmajda/tmp/npm-debug.log
npm not ok

Two things will need to happen to resolve this issue:

The build system needs to avoid concatenating files into one. We should just use require in Node.js environment. For browser environment, the files will still need to be assembled into one, but this is probably doable using Browserify or some similar tool.
The @VERSION substitution should be eliminated at least from the pacakge.json file. This will lead to small duplication of the current version value, but this is lesser evil than making installation of PEG.js development versions easier.

Error message missing expected values after submatch

Given the following grammar

start
  = function

function
  = without_function / with_function

with_function
  = "with" args

without_function
  = "without" args

args
  = "()"

And given the incomplete input:

with

I'd expect the error message to be something like below since with could be completed as either with() or without():

Line 1, column 5: Expected "()" or "without" but end of input found.

Instead I get

Line 1, column 5: Expected "()" but end of input found.

I thought it had to do with an ordering error on my part, but I get the same message regardless of the way the functions are ordered. I'm using the expected values to prompt users through an autocomplete, and missing expectations make that a little tricky. ; )

Give it a go at http://pegjs.majda.cz/online

Is there something I'm doing wrong here?

null indicates failure - workaround for a grammar that should produce nulls?

I have a grammar for a data structure that ideally should include nulls. If the top-level result is null, no problem: the grammar can produce a special object. In that case the function calling the parser can detect the special object and return a null.

However for nulls deeper in the data structure, I'd hate to have to traverse that structure to replace all special objects with nulls. Any ideas?

One thought is to tell the parser what value to use as the failure indicator. Then I can tell the parser to fail when some unique object is returned.

Another thought is to indicate failure by throwing an exception. Or maybe there is an existing workaround I don't know of?

Thanks! PEG.js has saved me a ton of time.

Action Failures Don't Reset Parser Position

When an action returns null to indicate failure, the parser's position is not reset to where it was before the expression was run. This can be seen with an example grammar. With this grammar, the parsed output from any input is an empty array.

start
  = .* { return null; }
  / .*

Implement parametrizable rules

It would be great to be able to parametrize rules with variables ;

string
   = '\"' parse_contents '\"' ->
   / '\'' parse_contents('\'') '\'' ->
   / '+' parse_contents('+') '+' -> /* sure why not :) */

parse_contents(terminator='\"')
    = ('\\' terminator / !terminator .)+ -> return stuff

Running rake fails

(skai)(~/git/pegjs master )$ rake --trace
(in /home/leto/git/pegjs)
** Invoke default (first_time)
** Invoke build (first_time)
** Invoke lib/peg.js (first_time)
** Invoke src/emitter.js (first_time, not_needed)
** Invoke src/parser.js (first_time, not_needed)
** Invoke src/parser.pegjs (first_time, not_needed)
** Invoke src/compiler.js (first_time, not_needed)
** Invoke src/passes.js (first_time, not_needed)
** Invoke src/peg.js (first_time, not_needed)
** Invoke src/checks.js (first_time, not_needed)
** Invoke src/utils.js (first_time, not_needed)
** Execute lib/peg.js
rake aborted!
No such file or directory - lib/peg.js
/home/leto/git/pegjs/Rakefile:36:in initialize' /home/leto/git/pegjs/Rakefile:36:inopen'
/home/leto/git/pegjs/Rakefile:36
/usr/lib/ruby/1.8/rake.rb:636:in call' /usr/lib/ruby/1.8/rake.rb:636:inexecute'
/usr/lib/ruby/1.8/rake.rb:631:in each' /usr/lib/ruby/1.8/rake.rb:631:inexecute'
/usr/lib/ruby/1.8/rake.rb:597:in invoke_with_call_chain' /usr/lib/ruby/1.8/monitor.rb:242:insynchronize'
/usr/lib/ruby/1.8/rake.rb:590:in invoke_with_call_chain' /usr/lib/ruby/1.8/rake.rb:607:ininvoke_prerequisites'
/usr/lib/ruby/1.8/rake.rb:604:in each' /usr/lib/ruby/1.8/rake.rb:604:ininvoke_prerequisites'
/usr/lib/ruby/1.8/rake.rb:596:in invoke_with_call_chain' /usr/lib/ruby/1.8/monitor.rb:242:insynchronize'
/usr/lib/ruby/1.8/rake.rb:590:in invoke_with_call_chain' /usr/lib/ruby/1.8/rake.rb:607:ininvoke_prerequisites'
/usr/lib/ruby/1.8/rake.rb:604:in each' /usr/lib/ruby/1.8/rake.rb:604:ininvoke_prerequisites'
/usr/lib/ruby/1.8/rake.rb:596:in invoke_with_call_chain' /usr/lib/ruby/1.8/monitor.rb:242:insynchronize'
/usr/lib/ruby/1.8/rake.rb:590:in invoke_with_call_chain' /usr/lib/ruby/1.8/rake.rb:583:ininvoke'
/usr/lib/ruby/1.8/rake.rb:2051:in invoke_task' /usr/lib/ruby/1.8/rake.rb:2029:intop_level'
/usr/lib/ruby/1.8/rake.rb:2029:in each' /usr/lib/ruby/1.8/rake.rb:2029:intop_level'
/usr/lib/ruby/1.8/rake.rb:2068:in standard_exception_handling' /usr/lib/ruby/1.8/rake.rb:2023:intop_level'
/usr/lib/ruby/1.8/rake.rb:2001:in run' /usr/lib/ruby/1.8/rake.rb:2068:instandard_exception_handling'
/usr/lib/ruby/1.8/rake.rb:1998:in `run'
/usr/bin/rake:28

bin/pegjs cant work if symlinked

solution:

-DIR=`dirname "$0"`
+DIR=`[ -L $0 ] && dirname "$(readlink $0)" || dirname "$0"`

Make generated parsers smaller

Generated parsers are too big:

dmajda@inuit:~/Programování/Projekty/PEG.js/pegjs$ bin/pegjs examples/arithmetics.pegjs 
dmajda@inuit:~/Programování/Projekty/PEG.js/pegjs$ bin/pegjs examples/css.pegjs 
dmajda@inuit:~/Programování/Projekty/PEG.js/pegjs$ bin/pegjs examples/javascript.pegjs 
dmajda@inuit:~/Programování/Projekty/PEG.js/pegjs$ bin/pegjs examples/json.pegjs 
dmajda@inuit:~/Programování/Projekty/PEG.js/pegjs$ wc -c examples/*.js
  13327 examples/arithmetics.js
 429010 examples/css.js
 610664 examples/javascript.js
  54518 examples/json.js
1107519 celkem
dmajda@inuit:~/Programování/Projekty/PEG.js/pegjs$ wc -c src/parser.js 
103838 src/parser.js

They should be much smaller, ideally without sacrificing performance.

Allow a wider range of rule names.

I have a use case which requires the colon character within rule names. I tweaked the metagrammar to allow for this, but the resulting parser threw a syntax error. My solution was simply to quote the keys for the "parse" functions inside the parser object. The fix is just two lines compiler.js:

--- compiler.js
+++ Local Changes
@@ -476,7 +476,7 @@
       }

       return PEG.Compiler.formatCode(
-        "_parse_${name}: function(context) {",
+        "'_parse_${name}': function(context) {",
         "  var cacheKey = ${name|string} + '@' + this._pos;",
         "  var cachedResult = this._cache[cacheKey];",
         "  if (cachedResult) {",
@@ -756,7 +756,7 @@

     rule_ref: function(node, resultVar) {
       return PEG.Compiler.formatCode(
-        "var ${resultVar} = this.${ruleMethod}(context);",
+        "var ${resultVar} = this['${ruleMethod}'](context);",
         {
           ruleMethod: "_parse_" + node.name,
           resultVar:  resultVar

Semantic Predicates with preceding labels

Semantic predicates cannot access labels even if they precede. Code like this (just a stupid example) does not work:

id = id:( [a-z]+ ) !{ return id.length > 3 }

Parsing fails when returning unbalanced braces in a string

I believe I've run into a bug in PEG.js (at least the web version of
it):

Trying both of these rules will demonstrate:

foo = a:"a" { return "{" + a + "}"; }
foo = a:"a" { return "{" + a; }

The first one is fine, but the second one fails to parse the pegjs
grammar. It's not realizing the "{" is inside quotes and is trying to
match it.

Create an optional argument 'options' in parse()

It could be useful to pass variables to the parse() method to customize the behaviour of the parser on-demand.

My proposition is to have the following signature : parse(input, startRule, options), with the following behaviour ;

if (options === undefined && startRule instanceof Object) {
    options = startRule; startRule = undefined;
}

Or something of the like.

Right now, I believe the only way to do so is to have var options = arguments[2] || {};, which is not very elegant. You also have to generate the module that way to be able to call parse without the full parse(input, undefined, options) ;

grammar_source = "module.exports = #{parser.toSource()}; var _parse = module.exports.parse; module.exports.parse = function (input, startRule, options) {
        if (startRule instanceof Object) { options = startRule; startRule = undefined; }
        return _parse (input, startRule, options);
    };"

Unused variables in generated parser code

Run bin/pegjs examples/arithmetics.pegjs.
Observe that in parse_additive function in the generated parser there are result3 and pos2 variables that are not used.

Most likely, stack depth calculation is out-of-sync with actual variable use in some case(s) (sequences? actions?).

Import/include other grammars

It could be extremely useful to have the ability to define grammars by importing rules from other grammars.

Several ideas ;

@include "expression.pegjs"
(or @from "expression.pegjs" import expression)

tag_if
    = "if" space? expression space? { ... }

@import "expression.pegjs" as expr

tag_if
    = "if" space? expr.expression space?

Ideally, this would not re-generate the whole code in every .pegjs that includes another ; maybe we would have to modify a little the behaviour of parse() to something of the like ;

Editing as per what you were saying in the options issue ;

parse(input, startRule)
->
parse(input, { startRule: "...", startPos : 9000 })

And at the end, if startPos != 0 && result !== null, we don't check if we went until input.length, but instead return the result as well as the endPos (don't really know how to do that elegantly - maybe simply modifying the options parameter ?).

It would allow reusability of grammars and modularisation of the code, which I think are two extremely important aspects of coding in general.

Ability to ignore certain productions

It would be nice to be able to tell the lexer/parser to ignore certain productions (i.e. whitespace and comment productions) so that it becomes unnecessary to litter all other productions with comment/whitespace allowances. This may not be possible though, due to the fact that lexing is builtin with parsing?

Thank you

Other useful debugging output

So one thing that would be really awesome is some more debugging output. When I have a parser that doesn't work and is already producing a structured output, I am left either having to gut it to get back to something that lets me look at the structured output and see where the error is in my grammar.

So it would be nice if there were some command line options for the parser generator, which caused the parser to dump debugging info, such as which rule fired, and what characters were captured as part of the rule.

Incomplete documentation for Using the Generated Parser

The Using the Generated Parser documentation only describes how to invoke it, and the exception it throws when something goes wrong. What does it return on a successful parse? What does the API look like? How do I use the result of parsing input once I have it?

Use charAt instead of subStr to match one-character regexes

if you have a rule that includes, say, [0-9], then the current code looks like

if (/^[0-9]/.test(input.substr(pos))) {

Where it could be way more efficient as such:

if (/^[0-9]$/.test(input.charAt(pos))) {

or something similar.

javascript.pegjs line 633...

          type:      "FuctionCall",

I think you meant

          type:      "FunctionCall",

Add ability to track node position

PEG.js-generated parsers currently don't track position (line and column). I'd like to add this feature since it would be quite useful.

One way is to add line and column properties to each object returned as a match result:

start = "a" b:"b" { return [b.line, b.column]; } // Returns [1, 2] on the input "ab".

Another way is just to make special line and column variables available inside actions/predicates, referring to the position of the beginning of the current rule:

start = "a" "b" { return [line, column]; } // Returns [1, 1] on the input "ab".

The first way is more flexible, but this flexibility might not be needed actually. I am not sure which way I'll implement yet.

Both ways would hurt performance. To prevent this in cases where position tracking is not required, the tracking should be enabled only if trackPosition option with a truthy value is passed to PEG.buildParser when generating the parser.

Infinite loop

The following causes an infinite loop.

start = scheme
scheme = ALPHA ( ALPHA / DIGIT / "+" / "-" / "." )*
ALPHA = [a-zA-Z]*
DIGIT = [0-9]*

Error when running "jake clean" without previous "jake build"

When I run jake clean without previous jake build the following error appears:

jake aborted.
Error: ENOENT, No such file or directory './lib'
    at Object.readdirSync (fs.js:376:18)
(See full trace by running task with --trace)

BUG in error message

In the arithmetic demo, if you enter "2*(3+4++5)", it will return the following message:

Line 1, column 8: Expected "(", ")", "+" or integer but "+" found.

Indeed, first it says that "+" is one of the possible value, but in the end it complains that + had been provided.

Tested in the online version.

BTW, greater parser generator !

Laurent Debacker.

benchmark/run currently fails

[11:14] 〠 ~/Temp/pegjs (master) $ node benchmark/run
Each test is run 10 times.

┌─────────────────────────────────────┬───────────┬────────────┬──────────────┐
│                Test                 │ Inp. size │ Avg. time  │  Avg. speed  │
├─────────────────────────────────────┴───────────┴────────────┴──────────────┤
│                                    JSON                                     │
├─────────────────────────────────────┬───────────┬────────────┬──────────────┤

undefined:1212
        i  f (result0 !== null) {
           ^
SyntaxError: Unexpected identifier
    at Object.compile (/Users/shamansir/Temp/pegjs/lib/peg.js:3835:23)
    at Object.buildParser (/Users/shamansir/Temp/pegjs/lib/peg.js:21:25)
    at eval at <anonymous> (/Users/shamansir/Temp/pegjs/benchmark/run:11:11)
    at Object.run (eval at <anonymous> (/Users/shamansir/Temp/pegjs/benchmark/run:11:11))
    at Timer.ontimeout (eval at <anonymous> (/Users/shamansir/Temp/pegjs/benchmark/run:11:11))

Node is v0.5.8, Jake 1.7.0

Errors don't indicate the failing rule(s)

Expected "(" or integer but "z" found.

It would be great to know the rule that reached the farthest spot in the text to be parsed (as Treetop does), or the spot in the expressions for the various where the next possible match is allowed.

Today I hacked together a parser for SVG path data, based on the BNF. (http://pastie.org/1036541) During development and testing of the grammar, I hit a situation where it said something like

Expected [ \n\r] but "H" found.

There were 38 spots in my grammar where whitespace was possible; it would have been really nice to see the rules traversed so far and the next expression(s) being looked for to get an idea of where the problem lay.

input filename with path not recognized

I think I found a bug. I'm calling pegjs and submitting the relative path and name of an input file which is a few directories deeper. pegjs does not recognize this as an input file and complains about "unknown option". See this:

$ pegjs node_modules/robocom-compiler/grammar/robocom.pegjs 
Unknown option: node_modules/robocom-compiler/grammar/robocom.pegjs.

Now I'm doing the exact same thing from the directory where the source file is located and it works:

$ cd node_modules/robocom-compiler/grammar/
$ pegjs robocom.pegjs 
$ ls -l
total 208
-rw-r--r--  1 dennis  staff  91166 12 Sep 19:39 robocom.js
-rw-r--r--@ 1 dennis  staff  11819 12 Sep 05:42 robocom.pegjs

I thought it might be related to the minus sign in the path and I tried to put the filename in double quotes, but it didn't work either.

Callback for actions in CoffeeScript / Coco

It would be interesting to have the ability to register a callback for action compilation.

Coffeescript (and Coco, its little brother, that I use) is a very nice language to work with, a tad nicer than javascript.

It could then be interesting to register a callback that receive the action's text in argument and that replies with compiled javascript text.

In command line, maybe add a switch like so :

pegjs -a, --action-command "coco -bpe '%'"

Where % would be replaced by the code.

"^"?"^" should match "^"

I may be wrong, but I believe:

start = "^" ? "^"

Should match the string "^", but currently does not.

I believe it gets confused because on seeing "^" it assume the first portion is necessarily true and then fails to find a second "^". However, on this failure, it should then try to match without the "^"?, which would succeed.

Line and Column position not in error message (Node.js)

Using Node the position properties are lost in the error message.

I get an error object like the below:

{ stack : <getter>, arguments : undefined, type : undefined, message : 'Expected "}}" but "}" found.', name : 'SyntaxError' }

If I replace the error code with the below, then the position information is passed on:

 var e = new Error(buildErrorMessage());
 e.errorPosition = computeErrorPosition();
 throw e;

I don't understand why the SyntaxError object isn't working, but it isn't.

I am using Node v0.3.6-pre

computeErrorPosition doesn't match buildErrorMessage logic

In buildErrorMessage the actual error position is calculated via:

        var actualPos = Math.max(pos, rightmostMatchFailuresPos);

But in computeErrorPosition only rightmostMatchFailuresPos is used. In my usage (where I'm just parsing a single line of input this is causing the computeErrorPosition results to be invalid (always reports line 1, column 1) on error. If I change computerErrorPosition to use the same method as buildErrorMessage then I get the correct results. To fix I think you need to update computeErrorPosition to look like:

      function computeErrorPosition() {
        /*
         * The first idea was to use |String.split| to break the input up to the
         * error position along newlines and derive the line and column from
         * there. However IE's |split| implementation is so broken that it was
         * enough to prevent it.
         */

        var line = 1;
        var column = 1;
        var seenCR = false;
        var actualPos = Math.max(pos, rightmostMatchFailuresPos);

        for (var i = 0; i <  actualPos; i++) {
          var ch = input.charAt(i);
          if (ch === '\n') {
            if (!seenCR) { line++; }
            column = 1;
            seenCR = false;
          } else if (ch === '\r' | ch === '\u2028' || ch === '\u2029') {
            line++;
            column = 1;
            seenCR = true;
          } else {
            column++;
            seenCR = false;
          }
        }

        return { line: line, column: column };
      }

Online generator issue

I find it really hard to type on the online generator ,
it tries to parse as soon as i type and often it goes to infinte loops, causing the browser to report "unresponsive script" .
i am not able to complete writing and get the parser generated

Variables from outer scope are inaccessible in actions

The following grammar now results in error ("a is not defined") when given input ab:

start = a:"a" (b:"b" { return a; })

The action should see variable a with value "a". Generally, actions should see all variables from outside scopes which already have values.

([bc]+ { return true; }) rule is not parsed

start = "a" [bc]+ – passes
start = "a" ([bc] { return true; }) – passes
start = "a" ([b-c] { return true; }) – passes
start = "a" ([bc]+ { return true; }) – not passes

(* and ? quantificators work ok)

testing code:

var testComplexCharacterClasses = PEG.buildParser('start = "a" ([bc]+ { return true; }) { return "passed"; }');
parses(testComplexCharacterClasses, "ab", "passed");

result:

✖ classes

    Message:  Died on test #7: Unexpected identifier - {}

Node is v0.5.8, Jake is 1.0.7

This is a clean clone of https://github.com/dmajda/pegjs.git

Allow case insensitive matching of literals

Right now, matching literals case-insensitively is hard and ugly. For example, the only way to match "select" case-insensitively is:

select = [Ss][Ee][Ll][Ee][Cc][Tt]

Having one global flag for case-insensitivity would create problems when parts of a language is case-sensitive and another case-insensitive. Also combining languages (a feature I am thinging about for later) would be harder. Better way would be to signify case insensitivity for each literal separately, e.g. like this:

select = "select"i

avoid generated parse.js for dist

Hi,

Im wondering if there is no way to parser compile herself on initialize instead of has the cached parser.js version.

I noticed on code that parser compile itself, but at some start point you should needed to write it by hand. Also, I mean a hard coded version will be a much less size than generated one (100kb is too much to deploy on an web app).

Im working on a project, and I noticed it's more efficient to include PEG.js and generate on fly than generating parser and using js (peg is 139kb, generation of my parser takes 200kb+).

I mean you can do the same for PEG.js, I mean a hard coded version of parser for PEG syntax will consume a lot of less bytes.

Another viable option that I see, its to don't parse PEG.js syntax at all, instead define a DSL on Javascript to describe the language, so it will not need a parser at all, just a compiler.

For my project I defined a simple DSL in Coffee, but my DSL is to generate codestring for PEG: https://github.com/wilkerlucio/scripted_css/blob/master/lib/scripted_css/parser/css_parser.coffee this DSL is most for better code compression and better writing.

Thanks for you attention.

Array object clobbered in inline javascript

The following should should return ['a', 'b', 'c'] but returns '3'.

start = .+ {var arr = ['a', 'b']; return arr.push('c')}

infinite loop with negativ expression

this tow rules should do the same, isn't it?

rule1 = 'a' (!'a')* 'a';

rule2 = 'a' [^a]* 'a';

But pegjs generates for rule1 an infinite loop. rule2 works how i expect it.

So it seems to be a bug.

In the parser.pegjs grammar files i found something like that:

rule3 = 'a' (!'a' .)* 'a';

In the online version, rule3 does what rule1 should do, but (!a .) looks illogical for me.

Syntax error in parser.js

I've got a syntax error if I run pegjs from command line:

js: uncaught JavaScript runtime exception: SyntaxError: Invalid range in character class.

The error is caused by binary characters used at https://github.com/dmajda/pegjs/blob/2d38c5cab3e4b63654b9947422ef67728a011850/src/parser.js#L3407

Consider using escape sequences instead of binary characters (\x or \u).

Use charCodeAt instead of charAt when applicable

If we're checking against a literal character, instead of using charAt, we can just as well use charCodeAt, which is more efficient. (http://jsperf.com/charcodeat-vs-string-comparison/2)

Closing brace in initializer

{ s = "}" } does not compile. It should deliver an empty parser with an initializer.

Fixed width font for http://pegjs.majda.cz/online

MInd switching the font-style in http://pegjs.majda.cz/online to a fixed width font?

Non-greedy operators for * , + , and ?

I have a language where there are repeated instances of the same pattern where I only care about the first symbol. For example:

          system       OBJECT IDENTIFIER ::= { mib-2 1 }
          interfaces   OBJECT IDENTIFIER ::= { mib-2 2 }
          at           OBJECT IDENTIFIER ::= { mib-2 3 }
          ip           OBJECT IDENTIFIER ::= { mib-2 4 }
          icmp         OBJECT IDENTIFIER ::= { mib-2 5 }
          tcp          OBJECT IDENTIFIER ::= { mib-2 6 }
          udp          OBJECT IDENTIFIER ::= { mib-2 7 }
          egp          OBJECT IDENTIFIER ::= { mib-2 8 }

This simple example could be matched by this pattern (where _ is whitespace):

identifier _ "OBJECT IDENTIFIER" _ "::=" _ "{" _ identifier _ number _ "}"

This isn't such a big deal in this case (I already typed the pattern :-) But the language has a set of other big hairy constructs that don't warrant the full parsing (I only want the initial identifier on each line to do the job I have in mind).

I would like to type something like this pattern:

identifier _ "OBJECT IDENTIFIER" .*? "}"

where the ".*?" is non-greedy - it only consumes to the first occurrence of the terminal. Could this be on the list for PEG.js? Many thanks.