philippesigaud / pegged Goto Github PK
View Code? Open in Web Editor NEWA Parsing Expression Grammar (PEG) module, using the D programming language.
A Parsing Expression Grammar (PEG) module, using the D programming language.
I have a grammar where some input is parsed very slowly.
Consider a grammar like:
Expr < Plus / Minus / Term
Plus ...
Minus ...
Term < Mul / Div / Factor
Mul ...
Div ...
Factor < UnaryMinus / UnaryPlus / Function
UnaryMinus ...
UnaryPlus ...
Function < BinaryFunction / UnaryFunction / Primary
BinaryFunction < Identifier '(' Expr ',' Expr ')'
UnaryFunction < Identifier '(' Expr ')'
Assume you want to parse the expression min(0, max(2, 3))
. Then each alternative is checked first. E.g. before "min" is parsed 3 * 3 * 3. I.e it is especially costly if the last alternative will always succeed. I assume this why pegs can take exponential time to parse.
I think a packrat parser solves those issues by some memory trade-off. For some (probably most) grammars a LR parser may work even better in practice. Are there any plans to support more efficient parsing schemes? Are they possible to integrate with the current design?
Probably I do smth wrong, but simple example doesn't work:
module pegged.test;
import std.algorithm;
import std.conv;
import std.stdio;
import std.traits;
import std.typecons;
import std.typetuple;
import pegged.grammar;
void main()
{
mixin(grammar(
`Expr <- Num AddExpr*
AddExpr <- ('+'/'-') Num
Num <~ [0-9]+`
));
auto tree = Expr.parse(`1 + 2-3+4`);
writeln(tree);
}
During compilation (dmd t2.d) I get error:
t2.d(15): Error: undefined identifier Num, did you mean struct No?
Any ideas?
Example of current output:
Arithmetic.Instantiate failure at pos [index: 15, line: 0, col: 15]
Example of a better, pretty-print output:
Arithmetic.Instantiate failure at pos [index: 15, line: 0, col: 15]
instantiate(A,B(int))
^
(ie, show where the parsing error is with a "^" as in iron python. that should be really easy to do)
(and perhaps the current parse result and valid rules at that point, but I guess that's already there)
I'm a little confused as to what 'line' means in a ParseTree's begin
position. For example when parsing via the JSON example:
import std.stdio;
import pegged.grammar;
import json;
enum example3 =
`{
"glossary": {
"title": "example glossary",
"GlossDiv": {
"title": "S",
"GlossList": 1
}
}
}`;
void main()
{
auto pt = JSON.parse(example3);
foreach (child; pt.children)
{
foreach (sub1; child.children)
{
foreach (sub2; sub1.children)
{
foreach (sub3; sub2.children)
{
writefln("%s : '%s'", sub3.begin, sub3.capture);
}
}
}
}
}
This prints:
[index: 18, line: 1, col: 16] : '["title", "example glossary", "GlossDiv", "title", "S", "GlossList", "1"]'
I thought "title" would begin at line 4 or 5, but not 1. If I'm misunderstanding the meaning of this it could be a good idea to document it for newbies like myself. :)
It's currently not very practical to create comments in Pegged. One would have to insert them in every single rule in the program to allow what most programming languages let you do.
Just creating this as a sort of meta-bug. A couple of things that come to mind:
[as per discussion in issue 60]
Given a rule tree A -> B -> C, it would be great if Pegged could provide a rule operator (along the lines of ;
, :
and ^
) that, when applied to B, would turn the tree into A -> C. In other words, the marked rule disappears and all of its children are attached to its parent at the point where the node was in the list of A's children.
That means
-having the basic docs just containing a special 'import thisExample' statement.
In your reference example (https://github.com/PhilippeSigaud/Pegged/wiki), the provided code doesn't compile:
enum parseTree1 = Expr.parse("1 + 2 - (3_x-5)_6");
instead this compiles:
enum parseTree1 = Arithmetic.parse("1 + 2 - (3_x-5)_6");
Also, like mentioned elsewhere, it should be written in the tutorial not to put the definition of the grammar inside a function.
import pegged.examples.ddump;
enum dcode = q{void main() {}};
enum parseTree1 = Module.parse(dcode);
pragma(msg, parseTree1.capture);
// outputs "null"
i am trying to get my head around the grammar and PEG in general but i dont understand why FunctionBody does not work here...
Code:
mixin(grammar("Binary <- '0' ('b' / 'B') [01]+"));
Am I doing it wrong or is this a bug?
The following mixed-in grammar compiled at commit 04e2b60 (a couple of days ago) but gives an out of memory error now when compiling with git head. I'll revert to the point it last worked at, but here is the grammar if you want to track down the issue:
mixin(grammar(`
Parse:
Line <- (Spaces (Keyword / Other / Number / String / Parens / Symbol) Spaces)*
Other <~ [a-zA-Z_]+
Number <~ digit+ / (digit+ '.' digit*) / (digit* '.' digit+)
String < FullString / PartialString
FullString <~ quote (!quote .)* quote
/ backquote (!backquote .)* backquote
/ doublequote (!doublequote .)* doublequote
PartialString <~ quote (!quote .)*
/ backquote (!backquote .)*
/ doublequote (!doublequote .)*
Symbol <- '~' / '!' / '@' / '#' / '$' / '%' / '^' / '&' / '*' / '/' /
'+' / '=' / '<' / '.' / '>' / ',' / ':' / ';' / backslash
Parens <- '(' / ')' / '{' / '}' / '[' / ']'
Spaces <~ (' ' / '\n' / '\t')*
Keyword <- "abstract" / "alias" / "align" / "asm" / "assert" / "auto" / "body" / "bool" / "break" / "byte"
/ "case" / "cast" / "catch" / "cdouble" / "cent" / "cfloat" / "char" / "class" / "const" / "continue" / "creal" / "dchar"
/ "debug" / "default" / "delegate" / "delete" / "deprecated" / "double" / "do" / "else" / "enum" / "export" / "extern"
/ "false" / "finally" / "final" / "float" / "foreach_reverse" / "foreach" / "for" / "function" / "goto" / "idouble" / "if"
/ "ifloat" / "immutable" / "import" / "inout" / "interface" / "invariant" / "int" / "in" / "ireal" / "is" / "lazy"
/ "long" / "macro" / "mixin" / "module" / "new" / "nothrow" / "null" / "out" / "override" / "package" / "pragma"
/ "private" / "protected" / "public" / "pure" / "real" / "ref" / "return" / "scope" / "shared" / "short" / "static"
/ "struct" / "super" / "switch" / "synchronized" / "template" / "this" / "throw" / "true" / "try" / "typedef" / "typeid"
/ "typeof" / "ubyte" / "ucent" / "uint" / "ulong" / "union" / "unittest" / "ushort" / "version" / "void" / "volatile"
/ "wchar" / "while" / "with" / "__FILE__" / "__LINE__" / "__gshared" / "__thread" / "__traits"
`));
I stumbled over strange behavior and I believe the cause is the built-in Identifier rule:
Identifier <~ Alpha Alphanum*
This rule ignores spacing. That's why a something
is recognized as an Identifier which is wrong. The rule should be
Identifier <- ~(Alpha Alphanum*)
I'm not completely confident that I'm right. I'm sorry, for not writing a test case and verifying it further.
It seems that a grammar name is required for Pegged to accept any grammar now. This isn't a problem, but it took some digging to figure out what was wrong. If this is intended behavior, the docs should probably be updated to reflect it.
I know it's a bit of a far-fetched argument, but I think that for orthogonality purposes, Pegged should support a <^
operator. It follows the principle of least surprise when/if a user tries to use it.
It seems that Pegged has moved to camelCase for rule identifiers, which is great. Documentation needs to be updated, though; some of it still points to stuff like Spacing
.
Hi Philippe,
I've found some time to experiment with the speedup1
work that you've incorporated onto the master
branch. I wrote some test code (a copy of your arithmetic.d
example) like so:
module test.parser;
import pegged.grammar;
enum parser = grammar(`
TEST:
Term < Factor (Add / Sub)*
Add < "+" Factor
Sub < "-" Factor
Factor < Primary (Mul / Div)*
Mul < "*" Primary
Div < "/" Primary
Primary < Parens / Neg / Number
Parens < :"(" Term :")"
Neg < "-" Primary
Number < ~([0-9]+)
`);
pragma(msg, parser);
mixin(parser);
I compiled it with the following options:
dmd -c -ofsrc/parser.o -fPIC -O -inline -release -w -wi -I./src/ -I./Pegged/ src/parser.d
and I get the following errors during compilation (of the generated code):
src/parser.d(50): Error: cannot implicitly convert expression (tuple("Term",p.end)) of type Tuple!(string,ulong) to Tuple!(string,uint)
src/parser.d(55): Error: cannot implicitly convert expression (tuple("Term",p.end)) of type Tuple!(string,ulong) to Tuple!(string,uint)
src/parser.d(68): Error: cannot implicitly convert expression (tuple("Add",p.end)) of type Tuple!(string,ulong) to Tuple!(string,uint)
src/parser.d(73): Error: cannot implicitly convert expression (tuple("Add",p.end)) of type Tuple!(string,ulong) to Tuple!(string,uint)
// snip
But if I pass the -m32
option to the compiler it builds and links fine. I'm not sure what the problem is, you can find the generated code (before the -m32-
flag) here: http://paste.ubuntu.com/1192434/
Let me know if there's anything else I can do to test. Cheers.
I've been playing around with the example markdown grammar[1] when I noticed some peculiarities. I've gotten it down to this test case:
#!/usr/bin/env rdmd
import std.stdio;
import pegged.grammar;
mixin( grammar( `
Test:
Inlines <- Inline+
Inline <- String / Spaces
String <~ NormalChar+
Spaces <~ Spacechar+
Spacechar <- " " / "\t"
NormalChar <- !( Spacechar ) .
`));
void main() {
auto tree = Test("foo bar baz ");
writeln( tree );
writeln( tree.matches );
}
I've noticed several issues:
First, the following output is observed for the given program:
Test [0, 12]["foo bar baz "]
+-Test.Inlines [0, 12]["foo bar baz "]
+-Test.Inline [0, 12]["foo bar baz "]
+-Test.String [0, 12]["foo bar baz "]
["foo bar baz "]
This seems very wrong. I would expect it to alternate between String
s of NormalChar
s which don't have spaces, then a space, then a String
again etc. It seems that a NormalChar
will match a space even though it shouldn't.
Second, if I change the input to " foo bar baz " (notice the starting space!), the program hangs.
Lastly, if I change Spacechar
rule to Spacechar <- " "
, everything works. So why is the \t
killing things?
[1] Which is just terrible BTW. I know, it's not your fault, the original peg-markdown grammar has the bugs, I checked. I'm improving it so that it's correct and uses the very nice Pegged extensions and I'll pull-request the new grammar once I'm done.
import std.stdio : writeln;
import pegged.grammar;
mixin(grammar(`
Test:
A <- B*
B <- .*
`));
void main () {
writeln(Test.A.parse("lol"));
}
This is the reduced testcase for the culprit in my grammar.
Given a simple grammar for math expressions (e.g. the one defined on https://github.com/PhilippeSigaud/Pegged/wiki/Writing-Your-Own-Grammar) I do not see how to evaluate an expression given its ParseTree. The problem is that I do not know whether Add
corresponds to the operator -
or +
. How does one handle this?
It fails compilation if I put '"'
, using '\"'
compiles fine.
Hi, I was working on a Ruby grammar some, but when I factored out some parts of a float literal, the grammar stopped compiling. (dmd 2.058) This works:
FloatLiteral <~ DecLiteral ('e' [+-]? [0-9] [0-9]*)?
DecLiteral <- [0-9] [0-9]*
But this doesn't:
FloatLiteral <~ DecLiteral ('e' [+-]? DecLiteral)?
DecLiteral <- [0-9] [0-9]*
Nor does this:
FloatLiteral <~ DecLiteral ('e' [+-]? ([0-9] [0-9]*))?
DecLiteral <- [0-9] [0-9]*
Shouldn't the latter two examples work?
I get this as a compile error on the line that calls FloatLiteral.parse()
:
ruby_parse.d(25): Error: undefined identifier FloatLiteral
So generating the parser fails silently and then I get an error when I try to use it? It is hard to track down the part of the grammar that Pegged doesn't like. It'd be nice to have an error message, but that would be a separate issue. I know you've mentioned the error handling needs improved.
There should be some docs on what exactly the memoization feature does, how it can affect a grammar/parser, when it's useful/not useful, etc.
with the current master i cannot generate the parser of the dgrammar.
mixin(grammar(Dgrammar)); leads to an out of memory. exception....
This may be related to another recent issue:
mixin(grammar(`
Parse:
Line < Keyword*
Keyword <- "one" / "two"
`));
void main()
{
string input = "one two";
auto res = Parse(input);
writeln(res);
}
This hangs indefinitely. Changing the grammar to:
mixin(grammar(`
Parse:
Line < Keyword Keyword
Keyword <- "one" / "two"
`));
Returns:
Parse [0, 4]["one", "one"]
+-Parse.Line [0, 4]["one", "one"]
+-Parse.Keyword [0, 3]["one"]
+-Parse.Keyword [0, 3]["one"]
Like it is not consuming the input or something.
I'm not really sure what the D programming language's limitations are for opening external files with CTFE.
I've seen this: http://www.dsource.org/projects/tutorials/wiki/ImportFile which suggests that it's possible to use the import
statement to include the contents of a file at compile time.
Is there a way to use this to include a PEG grammar definition? (e.g.):
mixin(grammar(import("path/to/file.peg")));
// other code
dmd -version=select -w -Ivendor/pegged -ofvendor/pegged/pegged/peg.o -c vendor/pegged/pegged/peg.d
dmd -version=select -w -Ivendor/pegged -ofvendor/pegged/pegged/grammar.o -c vendor/pegged/pegged/grammar.d
vendor/pegged/pegged/grammar.d(2550): Error: cannot implicitly convert expression (diag.infiniteLoops.length()) of type ulong to int
vendor/pegged/pegged/grammar.d(2593): Error: cannot implicitly convert expression (cast(ulong)(breaker + 1) % diag.infiniteLoops.length()) of type ulong to int
vendor/pegged/pegged/grammar.d(2608): Error: cannot implicitly convert expression (cast(ulong)(breaker + 1) % diag.infiniteLoops.length()) of type ulong to int
rdmd -I. pegged/examples/xml
pegged/examples/xml.d(12): Error: struct pegged.peg.Output(TParseTree) if (isParseTree!(TParseTree)) is used as a type
pegged/examples/xml.d(12): Error: struct pegged.peg.Output(TParseTree) if (isParseTree!(TParseTree)) is used as a type
pegged/examples/xml.d(22): Error: struct pegged.peg.Output(TParseTree) if (isParseTree!(TParseTree)) is used as a type
pegged/examples/xml.d(22): Error: struct pegged.peg.Output(TParseTree) if (isParseTree!(TParseTree)) is used as a type
Failed: 'dmd' '-I.' '-v' '-o-' 'pegged/examples/xml.d' '-Ipegged/examples'
It would be nice to have some way to record line/column information for every capture. This is essential for producing good errors in a compiler.
On: https://github.com/PhilippeSigaud/Pegged/wiki/Extended-PEG-Syntax
Text <~ (!("/*"/"/*") .)*
Was probably meant to be:
Text <~ (!("/*"/"*/") .)*
?
I believe this is a silly question but at the moment, I'm just trying to understand what the best practices are for discarding comments across all rules. At the moment I have the following code and it appears to just pause during the CTFE pragma evaluation:
mixin(grammar(`
TEST:
Spacing <- (spacing / Comment)*
Comment <- ';' (!eol .*) eol
`));
version (unittest) {
pragma(msg, TEST(`
; a comment
`));
}
I'm not sure what I'm doing wrong? :/
enum string testGrammar = `
TestGrammar:
Root < 'a' '.'
Spacing <- blank*
`;
import pegged.grammar;
import std.stdio;
mixin(grammar(testGrammar));
void main()
{
stdout.writefln("%s", TestGrammar.Root("a."));
}
I would expect the above grammar to recognize the example. Instead, it never halts.
On the other hand, this works:
enum string testGrammar = `
TestGrammar:
Root < 'a' '.'
Spacing <- blank+
`;
import pegged.grammar;
import std.stdio;
mixin(grammar(testGrammar));
void main()
{
stdout.writefln("%s", TestGrammar.Root("a."));
}
/+
Prints:
TestGrammar.Root [0, 2]["a", "."]
+-literal!("a") [0, 1]["a"]
+-literal!(".") [1, 2]["."]
+/
This surprises me because the Spacing symbol explicit requests at least 1 blank, yet the text it recognizes has zero blanks.
There's another issue that might be the same thing: I am unable to place the right-hand-side of rules on a different line than the lhs like I used to:
enum string testGrammar = `
TestGrammar:
Root <
'a'
'.'
Spacing <- blank+
`;
import pegged.grammar;
import std.stdio;
mixin(grammar(testGrammar));
void main()
{
stdout.writefln("%s", TestGrammar.Root("a."));
}
/+
During compilation:
test.d(14): Error: static assert "Pegged (failure)
+-Pegged.Grammar (failure)
+-Pegged.GrammarName [2, 13]["TestGrammar"]
+-Pegged.Identifier [2, 13]["TestGrammar"]
+-oneOrMore!(Pegged.Definition) (failure)
+-Pegged.Definition (failure)
| +-Pegged.LhsName [16, 20]["Root"]
| +-Pegged.Identifier [16, 20]["Root"]
| +-Pegged.Arrow (failure)
| +-literal!("< ") failure at line 3, col 5, after "ar:
Root" expected "< ", but got "<
'a'
'."
"
+/
This makes it difficult to align rules that are best represented vertically:
`
Branch < '^' ^identifier
/ '->' Node
/ '{' Node+ '}'
`
It might not be clear what's going on there, but there is no possible way for me to tab the alternations over to line up with the '^' terminal (nor would I want to: tabs are terrible/evil for alignment, but great for indentation). I'd rather just put the entire rhs into its own indentation level. That way my editor won't choke on spaces that don't match an indentation level. I'd like to be able to write it this way:
`
Branch <
'^' ^identifier
/ '->' Node
/ '{' Node+ '}'
`
Or, in explicit form:
`
Branch <
{tab}{space}{space}'^' ^identifier
{tab}/ '->' Node
{tab}/ '{' Node+ '}'
`
The Pegged grammar looks like it can handle this, but it doesn't.
Ideally I'd even be able to do something like this:
`
Branch < ThisSymbolNeverMatches
/ '^' ^identifier
/ '->' Node
/ '{' Node+ '}'
`
which would entirely eliminate the desire to have tabs adjacent to spaces (tabs for indentation, spaces for alignment).
`
Branch < []==
/ '^' ^identifier
/ '->' Node
/ '{' Node+ '}'
`
It's the hammer operator, because you can't touch this ;)
This is all from commit 0406fd1:
commit 0406fd19e6c6261f2adab213d57e91d608fcf8f9
Author: Philippe Sigaud <[email protected]>
Date: Wed Oct 17 21:15:21 2012 +0200
Testing callumenator out-of-memory error.
Why do we have all the following allowed:
A < B
A <~ B
A <- B
instead of just, say, A < B?
wouldn't it be simpler if we just had the latter?
the example from the readme.md does not work anymore. the parser just matches the "1".
why dont u have unittests for those things ?
import pegged.examples.json;
import pegged.examples.jsonExample;
import std.stdio;
void main()
{
// Parsing at compile-time:
enum parseTree1 = JSON.parse(example1);
pragma(msg, parseTree1.capture);
writeln(parseTree1);
}
when i try to build with win32 dmd i get:
"Error setting up build: Invalid UTF-8 sequence (at index 1)"
OK, so I know the whole idea behind a "keyword" may not make sense in PEG, but most computer languages do need to specify reserved keywords such that non-PEG tools have an easier time figuring out the language.
Could Pegged get a feature to specify certain reserved words?
When i try to create a grammar module with semantic actions, like
asModule("parser","
Number < [0-9]+ {doStuff} "
);
The generated grammar always gives:
static assert(false, `Bad grammar: ["PEGGED.Grammar failure at pos [index: 25, line: 1, col: 24]", "Pegged.EOI failure at pos [index: 25, line: 1, col: 24]"]`);
Are semantic actions currently implemented? What is the correct way to use them?
So I just got back to working on an old Pegged grammar and noticed that I used a >
operator at the expression level. I cannot for the life of me recall what that does and it seems to be undocumented (possibly removed?).
@PhilippeSigaud can you shed some light on this?
Silly, I know. I probably shouldn't have used something undocumented to begin with...
Trying to compile Pegged master branch, I get the error:
..\Pegged\pegged\peg.d 374 Error: undefined identifier isRule, did you mean function Rule?
However I try to fix this (importing pegged\parser.d, etc) I get a forward reference error on GenericPegged.Pegged.
This is with an up-to-date master, but it has been happening for a couple of weeks now. This is simply when compiling Pegged, not trying to generate a parser or anything.
Hi,
I'm trying to parse a file that has standard Unix-style '#-to-EOL' comments; but it's not working, and I believe at this point it's a bug in Pegged (though I'm not 100% sure).
This is my grammar:
ENI:
Grammar <- Statement (EOL Statement)* EOL? EOI
Statement <- ( AutoStatement / HotplugStatement / EmptyLine / Comment )
Comment <: '#'
EmptyLine <: S*
AutoStatement <- 'auto' S ^Identifier
HotplugStatement <- 'allow-hotplug' S ^Identifier
When I give that the following file:
auto lo
(i.e., the first line is empty, the third line starts with #)
then it won't parse correctly:
Parse output: failure
named captures: []
position: [index: 0, line: 0, col: 0]
ENI.Grammar failure at pos [index: 9, line: 2, col: 0]
Pegged.EOI failure at pos [index: 9, line: 2, col: 0]
When taking out the #, it works.
Maybe I'm doing something wrong, but I believe this should just work ,no?
It would be nice to have a utility function to get the location of a fatal failure during parsing, at least as a temporary measure until we work out a proper error handling mechanism.
(This might exist already and I just can't find it...)
This is a'post'it' issue:
There are problems with the way parameterized rules are generated: string/TParseTree overloading does not work.
It took me two hours to track this bug down... Here's a test case:
#!/usr/bin/env rdmd
import std.stdio;
import pegged.grammar;
/*mixin( grammar( `*/
mixin( grammar!(Memoization.yes)( `
Test:
Div <- HtmlBlockTag( 'div' )
HtmlBlockTag( Tag ) <- HtmlTagOpen( Tag )
( HtmlBlockTag( Tag ) /
AllIfNot( HtmlTagClose( Tag ) ) )*
HtmlTagClose( Tag )
HtmlTag( Contents ) <- Lt Spnl ^Contents Spnl Gt
HtmlTagClose( Tag ) <- HtmlTag( ^slash ^Tag )
# The version of HtmTagOpen that uses the HtmlTag rule fails under memoization;
# both versions work under no memoization
# HtmlTagOpen( Tag ) <- Lt Spnl ^Tag Spnl ( HtmlAttribute Spnl )* Spnl Gt
HtmlTagOpen( Tag ) <- HtmlTag( ^Tag Spnl HtmlAttribute* )
HtmlAttributeValue <~ (Quoted / (!"/" !">" Nonspacechar)+)
HtmlAttributeName <~ (AlphanumericAscii / "-")+
HtmlAttribute <- HtmlAttributeName Spnl (^"=" Spnl HtmlAttributeValue)? Spnl
Lt <- "<"
Gt <- ">"
Quoted <- ^doublequote FuseAllUntil(doublequote) ^doublequote
/ ^quote FuseAllUntil(quote) ^quote
BlankLine <~ Spaces Newline
AlphanumericAscii <~ [A-Za-z0-9]
Nonspacechar <~ !Spacechar !Newline .
Spacechar <~ " " / "\t"
Newline <~ "\n" / "\r" "\n"?
Spaces <~ Spacechar*
Spnl <~ Spaces (Newline Spaces)?
AllIfNot(Predicate) <- (!Predicate .)
AllUntil(Predicate) <- AllIfNot(Predicate)*
FuseAllUntil(Predicate) <~ AllUntil(Predicate)
`));
void main() {
auto tree = Test(
`<div id="bar">
foo <br/> bar
</div>
`);
writeln( tree );
writeln( tree.matches );
}
With memoization turned on, this test code fails to parse the input. With memoization off, parsing succeeds.
I also tracked it down to a single rule; look at the comments in the grammar code. The version of HtmTagOpen that uses the HtmlTag rule fails under memoization; both versions work under no memoization. Switch between the two version of HtmlTagOpen to see the behavior.
The performance of Pegged is currently not fantastic in some laboratory cases. For instance, given a simple D-like grammar, a file with 100.000 lines of public class A {}
takes several minutes to parse. This generally isn't good enough for a compiler, and needs improvement.
I don't know if it's related at all, but if some of the slowness comes from Pegged needing to be CTFE-compatible, you could special-case the runtime path by using if (!__ctfe)
.
i dont know if i am too stupid but after a couple of months i wanted to give pegged another try and failed miserably. even the arithmetic example does not work for me in dmd2059 or current gdc based on v2057.
it simply does not generate the grammer. dmd simply gives "out of memory" if i try to mixin the generated grammer and gdc says:
Error 1 Error: template pegged.grammar.PEGGED!(ParseTree).PEGGED.parse(ParseLevel pl = ParseLevel.parsing) parse(ParseLevel pl = ParseLevel.parsing) matches more than one template declaration, ../../_extern/pegged/grammar.d(111):parse(ParseLevel pl = ParseLevel.parsing) and ../../_extern/pegged/grammar.d(126):parse(ParseLevel pl = ParseLevel.parsing) c:\Users\Dilly\Downloads_code\Projects_extern\pegged\grammar.d 128
i have cut it down to this:
[CODE]
enum grammarStr =
"Expr < Factor AddExpr*
AddExpr < ^('+'/'-') Factor
Factor < Primary MulExpr*
MulExpr < ^('*'/'/') Primary
Primary < Parens / Number / Variable / ^'-' Primary
Parens < '(' Expr ')'
Number <~ [0-9]+
Variable <- Identifier";
enum genGrammar = grammar(grammarStr);
pragma(msg,"grammar: " ~ genGrammar);
[/CODE]
even without mixin it in it does not work. the string coming out of grammar already seems to be wrong...
Just wanted to point this out, as it took me a while to track down the cause, so is good to be aware of.
Currently I use a generated parser to parse some text, then later run individual rules within the parser to analyze specific parts of different text, so I do this:
auto p = Glint.decimateTree(Glint.Type(ParseTree(``, false, [], s, 0, 0)));
where s
is some new string I want to parse with a given sub-rule of the Glint
grammar. The parser doesn't always pick up the new text, but recycles an old result, from a previous (different) text input which was parsed using Glint(input_text)
. This is caused by memoization coupled with the fact that I am calling a sub-rule explicitly. The way I get around this at the moment is to do this:
Glint.memo = null
before the call to the explicit sub-rules. Currently memo is nulled before a new call to the main parser (when using a string as input). It would be cool if a similar facility existed to call sub-rules using just a string (or maybe this exists already?) which also null's out the memo.
Thanks!
The example from the README file does not work.
#!/usr/bin/env rdmd
import std.stdio;
import pegged.grammar;
mixin( grammar( `
Arithmetic:
Expr < Factor AddExpr*
AddExpr < ^('+'/'-') Factor
Factor < Primary MulExpr*
MulExpr < ^('*'/'/') Primary
Primary < '(' Expr ')' / Number / Variable / ^'-' Primary
Number <~ [0-9]+
Variable <- identifier
` ) );
void main() {
auto tree = Arithmetic(" 0 + 123 - 456 ");
writeln( tree );
writeln( tree.matches ); // prints ["0"]
writeln( tree.matches == ["0", "+", "123", "-", "456"] ); // prints false
}
I recommend adding more unit tests to the project so this kind of breakage doesn't happen in the future. At the very least, the examples from the docs should always work. Seeing the main example break doesn't instill a lot of confidence in the project.
This is with dmd v2.060 on Mac OS 10.8.
Under DMD 2.059, using a Range expression triggers an internal compiler exception. I'd be happy to file off a bug report under the DMD project, but I'm having a hard time isolating the fault.
import pegged.grammar;
mixin(grammar(Number < [0-9]*
));
enum result = Number.parse("123");
dmd: interpret.c:6642: bool isCtfeValueValid(Expression*): Assertion `((ArrayLiteralExp *)se->e1)->ownedByCtfe' failed.
I believe the fault is in the return expression of the Range class, in peg.d. The content of okfailMixin() is kind of beyond me at this point. I'd like to help get a bug filed on this one, as I'm eager to use Pegged in my future work in D.
Since EOL is a built-in rule, wouldn't it make sense to make EOI the same?
It would be nice if each generated tree node had an enum string field holding the name used in the grammar. This would be much more maintainable/clean when switching over node names (since you'd get errors when you rename a node).
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.