picoe / eto.parse Goto Github PK

View Code? Open in Web Editor NEW

148.0 13.0 30.0 1.52 MB

Recursive descent LL(k) parser for .NET with Fluent API, BNF, EBNF and Gold Grammars

License: MIT License

C# 80.50% HTML 19.50%

eto.parse's Introduction

Eto.Parse

A recursive descent LL(k) parser framework for .NET

Description

Eto.Parse is a highly optimized recursive decent parser framework that can be used to create parsers for context-free grammars that go beyond the capability of regular expressions.

You can use BNF, EBNF, or Gold Meta-Language grammars to define your parser, code them directly using a Fluent API, or use shorthand operators (or a mix of each).

Why not use RegEx?

Regular Expressions work great when the syntax is not complex, but fall short especially when dealing with any recursive syntax using some form of brackets or grouping concepts.

For example, creating a math parser using RegEx cannot validate (directly) that there are matching brackets. E.g. "((1+2)*3)", or "{ 'my': 'value', 'is' : {'recursive': true } }"

Matching

The framework has been put together to get at the relevant values as easily as possible. Each parser can be named, which then builds a tree of named matches that represent the interesting sections of the parsed input. You can use events on the named sections to perform logic when they match, or just parse the match tree directly.

Left Recursion

One rather cumbersome issue to deal with using recursive descent parsers is left recursion. Eto.Parse automatically identifies left recursive grammars and transforms them into a repeating pattern.

Performance

Eto.Parse has been highly optimized for performance and memory usage. For example, here's a comparison parsing a large JSON string 1000 times (times in seconds):

Speed

Test	Parsing	Slower than best	Warmup	Slower than best
Eto.Parse	2,327s	1,00x	0,008s	1,00x
Newtonsoft Json	2,523s	1,08x	0,068s	8,08x
ServiceStack.Text	2,854s	1,23x	0,066s	7,78x
Irony	25,401s	10,92x	0,188s	22,28x
bsn.GoldParser	11,186s	4,81x	0,013s	1,49x
NFX.JSON	11,847s	5,09x	0,187s	22,10x
SpracheJSON	92,774s	39,88x	0,189s	22,37x

(Warmup is the time it takes to initialize the engine for the first time and perform the first parse of the json string).

Memory & Objects

Framework	Allocated	More than best	# Objects	More than best
Eto.Parse	553.99 MB	1.00x	15268050	1.00x
Newtonsoft.Json	1,074.27 MB	1.94x	21562432	1.41x
ServiceStack.Text	2,540.91 MB	4.59x	15738493	1.03x
Irony	4,351.44 MB	7.85x	94831118	6.21x
bsn.GoldParser	2,012.16 MB	3.63x	74387176	4.87x

Example

For example, the following defines a simple hello world parser in Fluent API:

// optional repeating whitespace
var ws = Terminals.WhiteSpace.Repeat(0);

// parse a value with or without brackets
var valueParser = Terminals.Set('(')
	.Then(Terminals.AnyChar.Repeat().Until(ws.Then(')')).Named("value"))
	.Then(Terminals.Set(')'))
	.SeparatedBy(ws)
	.Or(Terminals.WhiteSpace.Inverse().Repeat().Named("value"));

// our grammar
var grammar = new Grammar(
	ws
	.Then(valueParser.Named("first"))
	.Then(valueParser.Named("second"))
	.Then(Terminals.End)
	.SeparatedBy(ws)
);

Or using shorthand operators:

// optional repeating whitespace
var ws = -Terminals.WhiteSpace;

// parse a value with or without brackets
Parser valueParser = 
	('(' & ws & (+Terminals.AnyChar ^ (ws & ')')).Named("value") & ws & ')')
	| (+!Terminals.WhiteSpace).Named("value");

// our grammar
var grammar = new Grammar(
	ws & valueParser.Named("first") & 
	ws & valueParser.Named("second") & 
	ws & Terminals.End
);

Or, using EBNF:

var grammar = new EbnfGrammar().Build(@"
(* := is an extension to define a literal with no whitespace between repeats and sequences *)
ws := {? Terminals.WhiteSpace ?};

letter or digit := ? Terminals.LetterOrDigit ?;

simple value := letter or digit, {letter or digit};

bracket value = simple value, {simple value};

optional bracket = '(', bracket value, ')' | simple value;

first = optional bracket;

second = optional bracket;

grammar = ws, first, second, ws;
", "grammar");

These can parse the following text input:

var input = "  hello ( parsing world )  ";
var match = grammar.Match(input);

var firstValue = match["first"]["value"].Value;
var secondValue = match["second"]["value"].Value;

firstValue will equal "hello", and secondValue will equal "parsing world".

License

Licensed under MIT.

See LICENSE file for full license.

eto.parse's People

Contributors

Stargazers

Watchers

eto.parse's Issues

GrammarGenerators

hi,
can you implement more Grammars like Ebnf and Gold?

Antlr Grammar support

Just testing a Gold grammar for XML on a large XML file and the performance of the resulting grammar is absolutely terrible. However, parsing the equivalent JSON file (we have the same data in XML & JSON format) results in much better performance (~80x faster). Perhaps this is because the gold grammar is an LALR grammar being translated into an LL(*) grammar.

Do you have a plan to support Antlr style grammars, only Antlr is probably one of the most popular formats (alongside yacc/lex, bison/flex) and, most importantly, there are a huge number of published grammars for Antlr - which are optimised for LL(*) parsers (as that's what Antlr uses).

The syntax appears to be fairly straight forward.

Finally, if we decide to go ahead then I plan on adding an MSBuild target that will allow you to include grammars directly in the project and have them produce the cs code (using your CodeParserWriter) I can set it so that it uses the Build Action on the file, which would make it possible to specify BNF/EBNF/Gold and have them auto generate the equivalent cs file. Loading the grammar fluently has a big performance benefit on start up over loading the files directly so performing the compile time step is something that would be really useful.

I mention this as I'm more than happy to provide the (un)install.ps1 scripts for your nuget, the impact would be non-existent if they don't set anything to build, but would make it really easy for them to generate cs files rather than load grammars at runtime.

GOLD grammar - terminal with sequence

Again similar definition of GOLD grammar terminal:
Identifier = {Letter}{Digit}+
is transformed to SequenceParser with not null Separator.
Thus input "a 1" matches this terminal.

In code GoldGrammar.cs - line 218 - for every sequences (RegExp included) is Separator defined:
var seq = new SequenceParser(parsers) { Separator = sep };

Thanks,
Tomas

EBNF parsing doesn't work

I can't get any EBNF parsing to work at all. Neither custom examples or samples used in the unit tests.

Trouble with "or" parsing

I'm trying to get a parser that allows for arguments to be given in arbitrary order, and I'm having some difficulty with this. Maybe I'm going at this completely the wrong way - please enlighten me if this is the case ;-)

I run into trouble if I want to "OR" together two parsers, one for each fragment.

Some code:

[Test]
public void TestEitherOrGrammar()
{
  var ws = -Terminals.WhiteSpace;
  var space = +Terminals.WhiteSpace;
  Parser eq = "=";

  var str = new StringParser { AllowNonQuoted = true };
  var @int = new NumberParser { AllowDecimal = false, AllowExponent = false, ValueType = typeof(int) };

  var width = "width" & eq & @int.Named("width");
  Grammar wGrammar = new Grammar(width);
  Assert.AreEqual(500, wGrammar.Match("width=500").Matches["width"].Value);

  var height = "height" & eq & @int.Named("height");
  Grammar hGrammar = new Grammar(height);
  Assert.AreEqual(600, hGrammar.Match("height=600").Matches["height"].Value);

  var wh = width & space & height;
  Grammar whGrammar = new Grammar(wh);
  Assert.AreEqual(700, whGrammar.Match("width=350 height=700").Matches["height"].Value);
  Assert.AreEqual(350, whGrammar.Match("width=350 height=700").Matches["width"].Value);

  var hw = height & space & width;
  Grammar hwGrammar = new Grammar(hw);
  Assert.AreEqual(700, hwGrammar.Match("height=700 width=350").Matches["height"].Value);
  Assert.AreEqual(350, hwGrammar.Match("height=700 width=350").Matches["width"].Value);

  // why is this not working
  var whhw = width | height;
  // var whhw = (wh|hw); //also not working

  var whhwGrammar = new Grammar(whhw);
  Assert.AreEqual(800, whhwGrammar.Match("width=400 height=800").Matches["height"].Value);
  Assert.AreEqual(400, whhwGrammar.Match("width=400 height=800").Matches["width"].Value);
  Assert.AreEqual(800, whhwGrammar.Match("height=800 width=400").Matches["height"].Value);
  Assert.AreEqual(400, whhwGrammar.Match("height=800 width=400").Matches["width"].Value);

}

[Req] More navigation

So, i have a string and a tree of matches.

I want to have the possibility to get match by offset in the string (the problem is that there are several similar objects exists for one integer offset);
I want to have the possibility to move from one match to adjancent one (i.e., my file consists from sections and intersection separators, I want to obtain the nearest separator after section which I hold now);
i don't understood how to move up in the Matches tree (without traversing the whole tree again).

Add RegEx grammar support

Should implement a RegEx grammar parser, which can be used as a drop-in replacement for RegEx, and allow you to use all the pre-existing regular expressions as the basis for your grammar.

Gold Grammar parser fails if no new line at EOF

The parser for gold grammars fails if the last rule ends at the EOF without a newline (i.e. it insists on a newline at the end of rules)

Support for INDENT and DEDENT

Having optional support for INDENT/DEDENT tokens would make parsing languages that require whitespace to define blocks easier.

Example:

do_thing param1, param2
if thing
    do_even_more "Hello"
    another_call "World"
and_finally "!"

Here an INDENT token would be placed right before do_event_more, but not before another_call as they're in the same indentation level. A DEDENT would be placed right before and_finally.

More information on how this is handled by the python parser can be found here:
http://www.secnetix.de/olli/Python/block_indentation.hawk

Eto.Parse.Match.StringValue doesnt display full value (Cuts last character)

Some of the Matches children doesn't display full value of .StringValue (Might as well be problem with .Value)
Here's an example of it:

with the source file being:

EBNF grammar - W3C style syntax

Coming back to my original issue.

Would it be a lot of work to implement a grammar, which parses the W3C EBNF syntax, decribed here

I looked at the EbnfGrammar class, but there is a lot going on, seemingly much more than the bare mimimum...

GOLD grammar - different Alternative notation

Hi all, could you please help me?
I have GOLDparser grammar:

` const string testedGrammer = @"
""Case Sensitive"" = False
""Start Symbol"" =

Identifier = ({Letter}?|{Digit}+)

::=

::= Identifier
";
`

this input does not match the grammar:

const string expression03 = @"45646";

But if I change the notation for the Identifier to:

Identifier = ({Digit}+|{Letter}?)

everything works well.

Could you please help me?

Thanks,
Tomas

[Req] PL/SQL Parser

Hi! I'm developing a .NET SQL-99 database system ( http://github.com/deveel/deveeldb ): in the old version (pre-2.0) I used a CC that I ported from Java (JavaCC to CSharpCC), that was performing quite well, but with a lot of problems with the maintainance and extendibility.
Because of these issues, I decided to switch to Irony, that was quite a discovery, letting me define the grammar of the SQL parser directly in-code and with huge control on the analysis of the sources.

A guy interested in my project pointed me to Eto.Parse, especially highlighting the performances gain, and based on your reports this is justified.
So now I'm very interested in understanding more about it, although I see an evident learning curve not easy to overcome.

Could you provide a sample of a complete PL/SQL parser that I can use as reference?

[Q] Is Grammar/Parser Threadsafe

Just looking for confirmation... It is possible to create a Grammar as a singleton then have many threads concurrently parsing many strings? (This is the purpose of ParseArgs)

Create an AST builder

how can i build a abstract syntax tree with eto.parse?

Improve Markdown parser

Should improve the markdown grammar so that it can pass all tests.

Also, can implement some parsers manually such as the encodings (especially the fairly complicated LinkEncoding). However, should also keep a non-manually implemented parser for these to demonstrate/test the base parsers and their speed.

Getting error line/column

Hi. Is there an easy way to find out at which line/column parsing failed? Or do I have to calculate that manually by index?

[Req] navigating with XPath

XPath uses path expressions to select nodes or node-sets.

brief introduction:
http://www.w3schools.com/xsl/xpath_syntax.asp

EBNF-like grammar:
http://www.w3.org/TR/xquery-xpath-parsing/#id-grammar

.net class library contains a set of classes and interfaces which automates that.
(starting from System.Xml.XPath, IXPathNavigable )

I propose to implement such access to the tree of Match-es, which are returned as result of parsing

CharSetTerminal CaseSensitive test inverted

The CharSetTerminal Test method has the results of the TestCaseSensitive check inverted. So that it behaves in the opposite manner than the rest of the Grammar.

[Req] Please add a method to check grammar completeness

I often forget to define some rules.

something like

rule1 := rule2, rule3;
rule2 := "A";

rule3 is undefined in grammar, but I don't see that.
So the program should have a method which will check the grammar for completeness (or duplication of definitions), which I can call after grammar construction but before parsing.

Examples

Hi,
can you implement more Examples?
like Tex or something like this to create an ast for language

Parser.GetValue() should be callable with either a ParseMatch, index/length, or perhaps a text value directly

If you want to call GetValue() from the InnerParse, it is not easy as you have to pass a Match to GetValue, which is only returned after the parsing is completed.

[Q] .NET standard support?

I have one question, this library is amazing but have .net standard support? I tried to install but nuget restored only for .net framework 4.6 :/

Case sensetivity error in CharSetTerminal

Following example fails; it should pass.

var sample = "A";

var parser = new CharSetTerminal('a','b','c');
var grammar = new Grammar(parser)
{
	CaseSensitive = false
};
var match = grammar.Match(sample);
Assert.IsTrue(match.Success,match.ErrorMessage);

Match Errors when a Parser is Optional

Hi, I'm trying to make a very simple example like this:

var value = new NumberParser().WithName("value");
var op = Terminals.Set('+', '-', '*', '/').WithName("op");
var expr = new SequenceParser().WithName("expr");
var opexpr = (op & expr).WithName("opexpr");
expr.Add(value, opexpr.Optional());
var g = new Grammar(expr);
var m = g.Match("1+3*2+4");

And the matcher has m.ErrorMessage

Index=7, Line=1, Context="1+3*2+4>>>"
Expected:
Char: '+','-','*','/'
(Char: '+','-','*','/', (value, Optional: opexpr))

I don't understand why the error happens given than "opexpr" is optional.
I tried printing the parsers just to double check, and opexpr is Optional.

Grammar: expr
    expr
        value
        Optional: opexpr
            opexpr
                Char: '+','-','*','/'
                expr

So why does it output an error when "opexpr" fails given that its parent parser is optional?
Sorry if I'm doing something obviously wrong.

EBNF Sample code on homepage doesn't parse as expected (returns error)

The sample code on the homepage doesn't appear to work as expected:

// Copied from https://github.com/picoe/Eto.Parse
var grammar = new EbnfGrammar().Build(@"
(* := is an extension to define a literal with no whitespace between repeats and sequences *)
ws := {? Terminals.WhiteSpace ?};

letter or digit := ? Terminals.LetterOrDigit ?;

simple value := letter or digit, {letter or digit};

bracket value = simple value, {simple value};

optional bracket = '(', bracket value, ')' | simple value;

first = optional bracket;

second = optional bracket;

grammar = ws, first, second, ws;
", "grammar");

var input = "  hello ( parsing world )  ";
var match = grammar.Match(input);

var firstValue = match["first"]["value"].Value;
var secondValue = match["second"]["value"].Value;

Console.WriteLine("F: {0}", firstValue);
Console.WriteLine("S: {0}", secondValue);
Console.WriteLine(match.ErrorMessage);

When run, this outputs:

F:
S:
Index=24, Context="ing world >>>)  "
ChildIndex=27, Context=" world )  >>>"
Expected:
letter or digit: Char: Letter or Digit
simple value: Sequence

Matching surrogate pair characters

To fully replicate the property path grammar I partially described in issue #9 I need a parser, which matches high codepoint UTF-8 characters. Originally the rule contained a range

[#x10000-#xEFFFF]

Unfortunately .NET doesn't allow char constants over 65535. Does Eto.Parse support matching such characters?

AlternativeParser doesn't work in one case, but works if I switch the order

thisWayIsBroken = wsMandatory | ( wsOptional & new CharSetTerminal( ':', '=' ) & wsOptional );
butThisWayWorksOK = ( wsOptional & new CharSetTerminal( ':', '=' ) & wsOptional ) | wsMandatory;

This is for a name-value parser, where the name and value can be separated by either just spaces, or a colon or an equal sign. In case of colon or equal sign, spaces in between the name and colon/equalsign and value are optional.

	//All these examples should be valid
	List<string> examples = new List<string>() {

				" ",
				 "   ",
				":",
				" :",
				":  ",
				 "  :  ",
				 "=",
				" =",
				"=  ",
				 "  =  "
	};
	
	var wsMandatory = new RepeatCharTerminal( char.IsWhiteSpace, 1 );
	var wsOptional = new RepeatCharTerminal( char.IsWhiteSpace, 0 );
	
	var thisWayIsBroken = wsMandatory |  ( wsOptional & new CharSetTerminal( ':', '=' ) & wsOptional );
	
	var g = new Grammar();
	g.Inner = thisWayIsBroken;
	foreach( var e in examples )
	{
				var match = g.Match( e );
				Console.WriteLine( "{0}   \"{1}\"", ( match.Success ? "ok  " : "FAIL" ), e );
	}
	
	var butThisWayWorksOK =  ( wsOptional & new CharSetTerminal( ':', '=' ) & wsOptional ) | wsMandatory;

All of the examples should be passing, but I get:

ok     " "
ok     "   "
ok     ":"
FAIL   " :"
ok     ":  "
FAIL   "  :  "
ok     "="
FAIL   " ="
ok     "=  "
FAIL   "  =  "

get all productions of ebnf

hi,

how can i get all productions of an ebnf grammar (left part)?

i want to make a visual studio grammar language service

Gold Grammar Sample not working

I tried the sample from http://goldparser.org/doc/grammars/, but this gives me a KeyNotFoundException:

            var grammar = new GoldGrammar().Build(@"
Id = {Letter}{AlphaNumeric}*


<Statement> ::= if Id then <Statement>
              | if Id then <Then Stm> else <Statement>
              | Id ':=' Id


<Then Stm>  ::= if Id then <Then Stm> else <Then Stm>
              | Id ':=' Id
");

Please evaluate tree visualizer

is it appropriate for syntax debugging?

https://github.com/jepst/Tree

Parsing in streaming-like way

Hi. As my idea for solving #16 I though that I could listen the Matched event of line break terminal and keep track of current line and last line end's position in stream. Unfortunately I learned that events are fired only after whole text has been parsed. Is there an easy way to monitor the parsing process as it goes?

I was thinking about this also because that way it souldn't be necessary to wait until all text has been parsed before handling the matches. Or is that not such a good idea ❓

Sign assemblies

In .NET signed assemblies cannot make use of unsigned assemblies. When publishing to a public NuGet if your assemblies are unsigned they cannot be easily used in projects that are signed, significantly reducing their usability and usefulness.

Signing is trivially easy and has no down sides (the signed assemblies can be used in unsigned projects). You can create a key and sign the assembly from the project properties page. Any dll that you are pushing to the NuGet should be signed.

The standard approach is to NOT upload the official key to GitHub, and instead place a note in the project to indicate to contributors that they should add their own signatures whilst building. This ensures you can validate the ownership of the NuGet.

Grammer's inner parser set to empty

Hi,

have I an error where the Inner parser of the Grammar i'm creating is transformed to an EmptyParser by this function during the Match.

Here's a sample of the grammar I wrote

public class CalculatorGrammar : Grammar
{
    public CalculatorGrammar() : base("calc")
    {
        EnableMatchEvents = false;
        CaseSensitive = true;

        var ws = new RepeatCharTerminal(char.IsWhiteSpace);
        
        var ldu = LetterOrDigit | "_";
        var identifier = LetterOrDigit.Then(ldu.Repeat().Optional()).WithName("Identifier");
        
        var lbr = Set("{");
        var rbr = Set("}");

        var floatParser = new NumberParser{AllowSign = true, AllowDecimal = true, AllowExponent = true,ValueType = typeof(double), Name = "ConstantDouble", AddMatch = true, AddError = true};
        var intParser = new NumberParser{AllowSign = true, AllowDecimal = false, AllowExponent = false, ValueType = typeof(long), Name = "ConstantInteger", AddMatch = true, AddError = true};
        var boolParser = new BooleanTerminal{CaseSensitive = true, TrueValues = new string[]{"true"}, FalseValues = new string[]{"false"}, AddError = true, AddMatch = true, Name = "ConstantBoolean"};

        var constants = floatParser.Or(intParser).Or(boolParser).WithName("Constant"); 

        var primary_expr = identifier.Or(constants).WithName("PrimaryExpression");

        var assign = identifier.Then(Set("=")).Then(primary_expr).WithName("AssignExpression");
        assign.SeparateChildrenBy(ws);

        var parenthesis_expr = new SequenceParser();
        parenthesis_expr.Add(
            primary_expr.WithName("NoParens")
            .Or(
                Set("(").Then(parenthesis_expr).Then(")").WithName("Parens")
            )
        );
        parenthesis_expr.WithName("ParenthesisExpression");
        parenthesis_expr.SeparateChildrenBy(ws);

        var mul_expr = new SequenceParser();

        mul_expr.Add(
            parenthesis_expr.WithName("NoMult")
            .Or(
                mul_expr.Then("*").Then(mul_expr).WithName("Mult")
            )
        );
        mul_expr.WithName("MultExpression");
        mul_expr.SeparateChildrenBy(ws);
        
        
        Inner = mul_expr;

    }
}

I don't yet understand the source but would be very happy to have any direction to solve the issue/contribute to this project

Is there any reason why Name is not preserved when cloning a Parser

Hello,

I just discovered Eto.Parse and I really enjoy it. Great piece of work.

I would like use it to build a library of reusable parsers which can be combined in multiple small grammars for scraping items in documents.

The problem is that when cloning a Parser, the names are lost and it does not properly fill the Matches property of the result

The following test fails

			var p = Eto.Parse.Terminals.Literal("hello");
			var helloNamed = p.Named("greetings");
			Assert.AreEqual("greetings", helloNamed.Name);
			var helloClone = helloNamed.Clone();
			Assert.AreEqual("greetings", helloClone.Name); // FAILS helloClone.Name is null

I just checked Parser copy constructor, and it does not set the name of the new instance to the one of the source.

Is there any reason for this, or can I make a Pull request ?

Support run time start rule.

What's the chance of allowing the start rule to be selected at run time? For example when using an XML grammar, you may want to start at 'document' or 'node', or in a language you might want to pick 'type' or 'expression', etc. Recompiling the grammar with different start rules seems highly inefficient?

This would also solve a problem with auto-code generation, at the moment BNF and EBNF do not specify a start rule as part of the syntax, being able to pick the start rule after the fact would allow these grammars to be converted to fluent code with the start rule being specified at runtime.

Antlr Grammar

Create an Antlr Grammar like ebnf

Letter parser accepts all Unicode letters as a letter

Not sure if this is a bug, but the code below accepts all Unicode letters as letters - i.e. Hebrew, Cyrillic, etc.:

Eto.Parse.Parsers.LetterTerminal.Test, Char.IsLetter() test

My guess is that is should have been an explicit check for the ranges 'a'..'z' and 'A'..'Z' (ASCII letters).

If not, please close this issue - I just stumbled across the above and wanted to make sure it was intentional.

dnx support

I have used Eto.Parse in the past on a project with great success. I am starting a new project where I would like to leverage it again but the project will be using dnx and ASP vNext, which is currently not a supported platform for Eto.Parse. I have a local branch where I have started to add the basics to support the new project system in dnx. Currently, I have only updated the main project as the test projects are NUnit-based, which does not have any dnx-compatible libraries/runners yet. To do it right, I would probably need to port the tests to xunit. Is dnx something that you intend to support in the future? I could submit a PR if you would like.

Character ranges in EBNF grammar do not include the last character in each range

See the code in this gist (to run it, just add Eto.Parse 1.4.0.0 from NuGet).

It fails to recognize the string and prints:

a,b,c,d,e,A,B,C,D,E,0,1,2,3,4,5,6,7,8
ChildIndex=0, Context=">>>ffFF09"

F, f and 9 are missing from the character set.

In my grammar, I have replaced character ranges with character lists and it works correctly.

Greedy repeat parser problem

I'm trying to create a grammar to parse SPARQL property paths, as defined here. I only need part of that vocabulary. Unfortunately W3C uses their own EBNF syntax so instead of tranlating it to vanilla EBNF I decided to try rewrite the relevant rules using your shortcut syntax, which I find quite neat.

However, I've bumped into problems with rule PN_PREFIX.

PN_PREFIX = PN_CHARS_BASE ((PN_CHARS | '.')* PN_CHARS)?
PN_CHARS_BASE = (* basically any Unicode letter character *)
PN_CHARS = PN_CHARS_U | '-' | [0-9] | #x00B7 | [#x0300-#x036F] | [#x203F-#x2040]
PN_CHARS_U = PN_CHARS_BASE | '_'

In short, PN_PREFIX should match the prefix of a QName URI. For example, given a QName rdf:type it would be matching the rdf part. As per the rule, the first character must be a letter, and then additionally characters are allowed.

I rewrote PN_PREFIX as

pn_chars_base & ~(-(pn_chars | '.') & pn_chars)

pn_chars_base matches the r and then the RepeatParser matches df, which is unfortunate because then pn_chars fails, because it doesn't match the colon, thus failing entire optional pattern.

The intent is that pn_chars_base, pn_chars inside repeat and the last pn_chars matched r, d and f respectively so that the entire pn_prefix matched rdf.

Any idea what's not right with my grammar?

add standard parsers

add a string and Number terminal

ebnf: ? Terminals.Number ?

[Question] How can i make a "recursive" parser using the C# api ?

I Have this EBNF example, and i'm trying to reproduce what postfix_expression does using the C# api

constants = 
    integer_cst 
    | float_cst 
    | bool_cst;

identifier := letter, {"_" | letter_or_digit};

primary_expression := identifier | constants;

increment_op = "++" | "--";

(* This part is recursive *)
postfix_expression := 
    primary_expression
    | postfix_expression, "[", postfix_expression, "]"
    | postfix_expression, increment_op;

grammar = postfix_expression;

memory leaks when parsing chinese

var _grammar = new EbnfGrammar (EbnfStyle.W3c).Build ($"id ::= [a-zA-Z\u0100-\uffff_][0-9a-zA-Z\u0100-\uffff_]*", "id");
var _match = _grammar.Match ("张三李四");

Gold Grammar parser does not support comments

Gold grammars support single line comments where the comment starts with '!'.

e.g.
! This is a comment line
rule = something ! with a comment