goccmack / gogll Goto Github PK

Generates generalised LL (GLL) and reduced size LR(1) parsers with matching lexers

License: Apache License 2.0

Go 99.93% Makefile 0.07%

parser-generator context-free-grammars lexer-generator gll golang go rust compiler-construction compiler-frontend rustlang

gogll's Introduction

Note

This version does not support Rust. Please use v3.2.0 for Rust or log an issue if you need the features of this version in Rust.

GoGLL

Gogll generates a GLL or LR(1) parser and FSA-based lexer for any context-free grammar. The generated code is Go or Rust.

Click here for an introduction to GLL.

See the LR(1) documentation for generating LR(1) parsers.

The generated GLL parser is a clustered nonterminal parser (CNP) following [Scott et al 2019]. CNP is a version of generalised LL parsing (GLL) [Scott & Johnstone 2016]. GLL parsers can parse all context free (CF) languages.

The generated LR(1) parser is a Pager's PGM or Knuth's original LR(1) machine [Pager 1977].

The generated lexer is a linear-time finite state automaton FSA [Grune et al 2012]. The lexer ignores whitespace.

Gogll accepts grammars in markdown files, which is very useful for documenting the grammar. For example: see gogll's own grammar.

GLL has worst-case cubic time and space complexity but linear complexity for all LL productions [Scott et al 2016]. See here for space and CPU time measurements of an extreme ambiguous example. For comparable grammars tested so far gogll produces faster lexers and parsers than gocc (FSA/LR-1).

News

2022-10-11

SPPF extraction added to the generated code. See boolx example

2022-08-09

Gogll is used to build DAU DASL

2020-08-12

From v3.2.0 gogll supports tokens that can be suppressed by the lexer. This is useful, for example, to implement code comments. See example.

2020-06-28

Gogll now also generates LR(1) parsers. It supports Pager's Practical General Method, weak compatibility as well as Knuths original LR(1) machine for comparison. Pager's PGM generates LR(1) parser tables similar is size to LALR. The option to generate a Knuth LR(1) machine is provide for reference.
See LR(1) documentation for details.
Please note that the -t <target> option has been replace by -go and -rust. See see usage below.

2020-06-01

See for an introduction to GLL and a performance comparison of the generated Go and Rust code parsers.

2020-05-22

GoGLL v3.1 generates Rust as well as Go parsers with similar performance:

	Lexer	Parser	Build
Go	119 μs	1324 μs	0.124s
Rust	71 μs	1297 μs	2.932s

The duration was averaged over 1000 repetitions.
Build time was measures with the time command.
1. For Rust: time cargo build --release
2. For Go: time go build

See examples/rust for the Rust and Go programs used for this comparison.

Use gogll's target option to generate a Rust lexer/parser: -t rust (see usage below). Gogll generates Go code by default.

2020-04-24

GoGLL now generates a linear-time FSA lexer matching the CNP parser.
This version of GoGLL is faster than gocc. It compiles a sample grammar in
0.074 s, which GoCC compiles in 0.118 s. Gogll compiles itself in 0.041s.

Benefits and disadvantages of GLL and LR(1)

GLL is a parsing technique that can handle any context-free (CF) language. GLL has worst case cubic time and space complexity.

LR(1) handles a subset of the context-free languages that can be parsed bottom-up with one token look-ahead. LR(1) has linear time complexity and its table driven parser is very efficient. Pager's Practical General Method (PGM) combines compatible states as they are generated, keeping the state space small.

A GLL parser has more expensive bookkeeping than an LR(1) parser, making the LR(1) parser more efficient for parsing very large inputs.

When to use GLL

When the CF grammar that best expresses the problem is not LR(1).
When the LR(1) parser has more than a few conflicts that require additional language symbols or complex grammar refactorisation to resolve.
The inputs to be parsed are not too big. GLL works very well for DSLs or programming languages.

When to use LR(1)

When the language can be expressed as an LR(1) grammar. A grammar is LR(1) if gogll can generate a conflict-free parser for it.
When the input is very big, for example: log files containing tens of thousands of lines.

Motivation for a separate lexer

The following observations were made while using GoGLLv2 on a couple of projects.

Most of the ambiguity in grammars were generated by the lexical rules.
Handling token separation explicitly produces messy, hard to maintain grammars.
Most of a grammar input file is whitespace, which together with the additional ambiguity introduced by the lexical rules, causes most of the parse time in a scannerless parser.
Writing good markdown with the grammar produced slow compilations.

Input Symbols, Markdown Files

Gogll and lexers generated by gogll accept UTF-8 input strings, which may be in a markdown file or a plain text file.

If the input is a markdown file gogll and lexers generated by gogll treat all text outside markdown code blocks as whitespace. Markdown code blocks are delimited by triple backticks. See gogll.md for an example.

Gogll Grammar

Gogll v3 has a BNF grammar. See gogll.md

Installation

Install Go from https://golang.org
go install github.com/goccmack/gogll/v3@latest or
Clone this repository and run go install in the root of the directory where it is installed.

Usage

Enter gogll -h or gogll for the following help:

use: gogll -h
    for help, or

use: gogll -version
    to display the version of goggl, or

use: gogll [-a][-v] [-CPUProf] [-o <out dir>] [-go] [-rust] [-gll] [-pager] [-knuth] [-resolve_conflicts] <source file>
    to generate a lexer and parser.

    <source file>: Mandatory. Name of the source file to be processed. 
        If the file extension is ".md" the bnf is extracted from markdown code 
        segments enclosed in triple backticks.
    
    -a: Optional. Regenerate all files.
        WARNING: This may destroy user editing in the LR(1) AST.
        Default: false
         
    -v: Optional. Produce verbose output, including first and follow sets,
        LR(1) sets and lexer FSA sets.
    
    -o <out dir>: Optional. The directory to which code will be generated.
                  Default: the same directory as <source file>.
                  
    -go: Optional. Generate Go code.
          Default: true, but false if -rust is selected

    -rust: Optional. Generate Rust code.
           Default: false
           
    -gll: Optional. Generate a GLL parser.
          Default true. False if -knuth or -pager is selected.
                  
    -knuth: Optional. Generate a Knuth LR(1) parser
            Default false

    -pager: Optional. Generate a Pager PGM LR(1) parser.
            Default false

    -resolve_conflicts: Optional. Automatically resolve LR(1) conflicts.
            Default: false. Only used when generating LR(1) parsers.
    
    -bs: Optional. Print BSR statistics (GLL only).
    
    -CPUProf : Optional. Generate a CPU profile. Default false.
        The generated CPU profile is in <cpu.prof>. 
        Use "go tool pprof cpu.prof" to analyse the profile.

Using the generated lexer and parser

Create a lexer:
From an []rune:

	lexer.New(input []rune) *Lexer

or from a file. If the file extension us .md the lexer will treat all text outside the markdown code blocks as whitespace.

	lexer.NewFile(fname string) *Lexer

Parse the lexer:

	if err, errs := parser.Parse(lex); err != nil {...}

Check for ambiguities in the parse forest

	if bsr.IsAmbiguous() {
		fmt.Println("Error: Ambiguous parse forest")
		bsr.ReportAmbiguous()
		os.Exit(1)
	}

Ambiguous BSRs must be resolved by walking the parse forest and ignoring unwanted children of ambiguous NTs (see Complete Example). 4. Use the disambiguated parse tree for the further stages of compilation. For example, see gogll's AST builder.

Complete Example

The code of following example can be found at examples/boolx. The example has the following grammar: boolx.md, which generates boolean expressions such as: a | b & c | d & e:

package "github.com/goccmack/gogll/examples/boolx"

Expr :   var
     |   Expr Op Expr
     ;

var : letter ;

Op : "&" | "|" ;

The second alternate above, Expr : Expr Op Expr, is ambiguous and can produce an ambiguous parse forest. The grammar does not enforce operator precedence, this has to be done during semantic analysis.

The grammar is compiled by the following command:

gogll examples/boolx/boolx.md

The test file, boolx_test.go shows the steps required to parse an input string and produce a disambiguated abstract syntax tree:

const t1Src = `a | b & c | d & e`

func Test1(t *testing.T) {

Create a lexer from the input string and parse. Fail if there are parse errors.

	if err, errs := parser.Parse(lexer.New([]rune(t1Src))); err != nil {
		fail(errs)
	}

Build an abstract syntax tree for each root of the parse forest and print them.

	for i, r := range bsr.GetRoots() {
		fmt.Printf("%d: %s\n", i, buildExpr(r))
	}
}

The input string produces an ambiguous parse forest, which is partially disambiguated by applying operator precedence. We get the following output from this test:

> go test -v ./examples/boolx
=== RUN   Test1
0: (a | ((b & c) | (d & e)))
1: <nil>
2: <nil>
3: ((a | (b & c)) | (d & e))
--- PASS: Test1 (0.00s)
PASS

The output shows that the parse forest has 4 roots, 2 of which produce valid ASTs after disambiguation. The removed trees are syntactically valid by semantically invalid because they give | precedence over &. Both the remaining ASTs are syntactically and semantically valid. The AST encodes operator precedence as shown by the parentheses. The choice of which valid AST to use for further processing is application specific.

In this example disambiguation by operator precedence is applied during the AST build.

Our AST has only one type of node: Expr.

type ExprType int

const (
	Expr_Var ExprType = iota
	Expr_Expr
)

type Expr struct {
	Type  ExprType
	Var   *token.Token
	Op    *token.Token
	Left  *Expr
	Right *Expr
}

A node can represent a variable (Type = Expr_Var) or an expression (Type = Expr_Expr). If the node represents a variable the field Var contains the variable token. Otherwise Op contains the operator token and Left and Right contain the nodes of the sub-expressions.

The AST is constructed recursively from each BSR root by the function, buildExpr in boolx_test.go.

/*
Expr :   var
     |   Expr Op Expr
     ;
Op : "&" | "|" ;
*/
func buildExpr(b bsr.BSR) *Expr {
	/*** Expr :   var ***/
	if b.Alternate() == 0 {
		return &Expr{
			Type: Expr_Var,
			Var:  b.GetTChildI(0),
		}
	}

	/*** Expr : Expr Op Expr ***/
	op := b.GetNTChildI(1). // Op is symbol 1 of the Expr rule
				GetTChildI(0) // The operator token is symbol 0 for both alternates of the Op rule

	// Build the left subexpression Node. The subtree for it may be ambiguous.
	left := []*Expr{}
	// b.GetNTChildrenI(0) returns all the valid BSRs for symbol 0 of the body of the rule.
	for _, le := range b.GetNTChildrenI(0) {
		// Add subexpression if it is valid and has precedence over this expression
		if e := buildExpr(le); e != nil && hasPrecedence(e, op) {
			left = append(left, e)
		}
	}
	// No valid subexpressions therefore this whole expression is invalid
	if len(left) == 0 {
		return nil
	}
	// Belts and braces
	if len(left) > 1 {
		panic(fmt.Sprintf("%s has %d left children", b, len(left)))
	}
	// Do the same for the right subexpression
	right := []*Expr{}
	for _, le := range b.GetNTChildrenI(2) {
		if e := buildExpr(le); e != nil && hasPrecedence(e, op) {
			right = append(right, e)
		}
	}
	if len(right) == 0 {
		return nil
	}
	if len(right) > 1 {
		panic(fmt.Sprintf("%s has %d right children", b, len(right)))
	}

	// return an expression node
	return &Expr{
		Type:  Expr_Expr,
		Op:    op,
		Left:  left[0],
		Right: right[0],
	}
}

Status

gogll v3 generates a matching lexer and parser. It generates GLL and LR(1) parsers. v3 compiles itself. v3 is used in a real-world project.
gogll v2 had the last vestiges of the bootstrap compiler grammar removed from its input grammar. v2 compiled itself.
gogll v1 was a GLL scannerless parser, which compiled scannerless GLL parsers. v1 compiled itself.
gogll v0 was a bootstrap compiler implemented by a gocc lexer and parser.

Features considered for future implementation

Tokens suppressed by the lexer, e.g.: code comments.
Better error reporting.
Better documentation, including how to traverse the binary subtree representation (BSR Scott et al 2019) of the parse forest as well as on disambiguating parse forests.
Letting the parser direct which tokens to scan Scott & Johnstone 2019

Documentation

At the moment this document and the gogll grammar are the only documentation. Have a look at gogll/examples/ambiguous for a simple example and also for simple disambiguation.

Alternatively look at gogll.md which is the input grammar and also the grammar from which the parser for this version of gogll was generated. gogll/da disambiguates the parse forest for an input string.

LR(1)

See the LR(1) documentation.

Changelog

see

Bibliography

[Pager 1977] David Pager
A Practical General Method for Constructing LR(k) Parsers
Acta Informatica 7, 1977

[Scott & Johnstone 2019] Elizabeth Scott and Adrian Johnstone
Multiple lexicalisation (a Java based study)
In: Proceedings of Software Language Engineering 2019. ACM, 2019. p. 71-82

[Scott et al 2019] Elizabeth Scott, Adrian Johnstone and L. Thomas van Binsbergen.
Derivation representation using binary subtree sets.
In: Science of Computer Programming (175) 2019

[Scott & Johnstone 2018] Elizabeth Scott and Adrian Johnstone.
GLL Syntax Analysers For EBNF Grammars.
In: Science of Computer Programming Volume 166, 15 November 2018

[Scott & Johnstone 2016] Elizabeth Scott and Adrian Johnstone.
Structuring the GLL parsing algorithm for performance.
In: Science of Computer Programming Volume 125, 1 September 2016

[Afroozeh et al 2013] Ali Afroozeh, Mark van den Brand, Adrian Johnstone, Elizabeth Scott, Jurgen Vinju.
Safe Specification of Operator Precedence Rules.
In: Erwig M., Paige R.F., Van Wyk E. (eds) Software Language Engineering. SLE 2013. Lecture Notes in Computer Science, vol 8225. Springer, Cham

[Grune et al 2012] Dick Grune, Kees van Reeuwijk, Henri E. Bal, Ceriel J.H. Jacobs and Koen Langendoen. Modern Compiler Design. Second Edition. Springer 2012

[Basten & Vinju 2012] Basten H.J.S., Vinju J.J. (2012) Parse Forest Diagnostics with Dr. Ambiguity. In: Sloane A., Aßmann U. (eds) Software Language Engineering. SLE 2011. Lecture Notes in Computer Science, vol 6940. Springer, Berlin, Heidelberg

gogll's People

Contributors

Stargazers

Watchers

gogll's Issues

Question: What's the recommended way of dealing with comments?

I just experimented with gogll, and it's pretty awesome. Thanks for working on this!

As stated in the title, I was wondering what the suggested way is of dealing with code comments. As actual tokens in the grammar? Or would it make sense to allow the unicode.IsSpace() call in lexer.New() to be replaced with a custom version that skips over comments?

Incorrect installation instructions

Because 'go get' is no longer supported outside a module, go get github.com/goccmack/gogll/v3 (as shown in the readme) no longer works (and implies that the code you're pulling in is a library).

go install github.com/goccmack/gogll/v3@latest is the comparable command, reflected in this change, which properly installs the latest version of gogll to $GOPATH/bin .

Please support the go module correctly.

I followed the documentation and ran go get github.com/goccmack/gogll and it installed v1.0.4 instead of the latest version v3.2.2.

This is because the major version suffix is not given to the module name, even though the major version of gogll is 2 or higher.

For more details, please refer to the following document.
https://golang.org/ref/mod#module-path
https://github.com/golang/go/wiki/Modules#semantic-import-versioning

To fix this, change the module name declared in the go.mod file as follows

module github.com/goccmack/gogll/v3

In addition, make the same change to all import statements.

Is there a char_set?

The Readme.md mentions a char_lit and char_set. But char_set seems undefined. How do I specify 'a'-'z' (a to z)?

Error supporting \n as part of the grammar

In BASIC there is no end-of-statement marker (like ; in C). Instead, the end-of-line is the marker. Event if I remove \n from !whitespace character, I'm still not allow to compile the grammar. Here is a simplified example:

\n is part of the syntax. See, it is not part of !whiespace

package "BUG_REPPORT"

File
	: DeclList                             							
;

DeclList
	: Stmt
	| DeclList   Stmt                   						
	;

Stmt
	: "Print" string_lit "\n"									 
	;


!line_comment
	: ('R''e''m' | ';') {not "\n" } "\n";

!whitespace : <' ' | '\t' | '\r'  > ;
string_lit 	: (quote {  quote quote | not_quote } quote) ;

This generates the following errors:

arse Errors:
Parse Error: LexZeroOrMore : ∙{ LexAlternates }  I[33]=string_lit (303,307) "\n" at line 21 col 37
Expected one of: ['[,<,>,any,|,),.,lowcase,not,{,},;,[,],number,tokid,(,char_lit,letter,upcase]
Parse Error: LexAlternates : RegExp ∙| LexAlternates  I[32]=} (301,302) } at line 21 col 35
Expected one of: [|]
Parse Error: RegExp : LexSymbol ∙RegExp  I[32]=} (301,302) } at line 21 col 35
Expected one of: [char_lit,.,<,[,lowcase,not,tokid,{,'[,letter,number,any,upcase,(]
Parse Error: RegExp : ∙LexSymbol  I[29]={ (291,292) { at line 21 col 25
Expected one of: [|,},),;,>,]]
Parse Error: RegExp : LexSymbol ∙RegExp  I[28]=) (289,290) ) at line 21 col 23
Expected one of: [(,upcase,char_lit,[,lowcase,not,tokid,{,'[,.,<,any,letter,number]
Parse Error: LexAlternates : RegExp ∙| LexAlternates  I[28]=) (289,290) ) at line 21 col 23
Expected one of: [|]
Parse Error: LexAlternates : ∙RegExp  I[26]=| (284,285) | at line 21 col 18
Expected one of: [),>,],}]
Parse Error: RegExp : LexSymbol ∙RegExp  I[26]=| (284,285) | at line 21 col 18
Expected one of: [(,upcase,char_lit,lowcase,not,tokid,{,'[,.,<,[,any,letter,number]
Parse Error: RegExp : ∙LexSymbol  I[25]=char_lit (280,283) 'm' at line 21 col 14
Expected one of: [|,},),;,>,]]
Parse Error: RegExp : ∙LexSymbol  I[24]=char_lit (277,280) 'e' at line 21 col 11
Expected one of: [),;,>,],|,}]

The expected result would be a valid grammar.

backslash used as operator (integer division)

in BASIC, the backslash character (\) is used for integer division on float numbers. When I try to use it, I get an error. Here is the code:

package "BUG_REPPORT"

Expr
	: number "\\" number;

And here is the message :

Parse Errors:
Parse Error: SyntaxRule : nt : ∙SyntaxAlternates ;  I[4]=number (36,42) number at line 6 col 7
Expected one of: [empty,nt,string_lit,tokid]

I also tried:

package "BUG_REPPORT"

Expr
	: number intDivOp number;      	    								

intDivOp: '\\';

This gives me the same error as above.

Support For Named Unicode Character Classes

I'm fiddling with gogll and I like it. Much cleaner than goyacc. I especially like the ability to embed the grammar in a CommonMark/Markdown file. A lovely way to document the grammar.

It would be nice/useful if gogll supported

Unicode named character classes
at least some Unicode character properties (e.g., ID_Start and ID_Continue)
arbitrary Go regular expressions as a means of defining terminals.

Currently, the only predefined character classes you support are

letter: Unicode character class L|Letter, which comprises the the following Unicode character classes:
- Lu | Uppercase Letter
- Ll | Lowercase Letter
- Lt| Titlecase Letter
- Lm | Modifier Letter
- Lo | Other Letter
upcase: Unspecified, but I suspect this is the Unicode character class Lu| Uppercase Letter.
lowcase: Unspecified, but I suspect this is the Unicode character class Ll|Lowercase Letter.
number: Unicode character class N|`Number, which comprises the following Unicode character classes
- Nd|Decimal Digit Number
- Nl|Letter Number
- No|Other Number

This makes writing parsers for some languages difficult (and the resulting parsers brittle should Unicode add new characters).

For instance, the Javascript/Ecmascript specification for an identifier is defined in terms of those Unicode characters having the ID_Start and ID_Continue properties:

And the Go programming language defines the identifier production as

identifier = letter { letter | unicode_digit } .

letter        = unicode_letter | "_" .

unicode_letter = /* a Unicode code point categorized as "Letter" */ .
unicode_digit  = /* a Unicode code point categorized as "Number, decimal digit" */ .

No having support for Unicode named character classes or properties makes it difficult to write a parser for such languages. The Unicode Number, decimal digit class comprises some 650 [discontiguous] code points. I have no idea how many code points are in the ID_Start category (a lot), and `ID_Continue adds to it. From Unicode® Standard Annex #31: Unicode Identifier and Pattern Syntax:

As you can see, adding support for this sort of stuff would be useful.

Examples fail to parse

After struggling to get gogll to parse a fairly basic grammar I reverted to the examples, and found that neither the GoGLL grammar nor the Json grammar currently parse; the only grammar I could get to parse was boolx?

> curl https://raw.githubusercontent.com/goccmack/gogll/master/examples/json/json.md -o json.md
> gogll json.md
ParseError: Error: Parse Failed right extent=380, m=1047
 Parse Error: CharLiteral : \' ∙\\ anyof("nrt\\'\"") \'  cI=364 I[cI]=" at line 15 col 11
 Parse Error: Sep : SepChar ∙Sep  cI=363 I[cI]=' at line 15 col 10
 Parse Error: Sep : SepChar ∙Sep  cI=361 I[cI]=: at line 15 col 8
 Parse Error: NTChars : NTChar ∙NTChars  cI=360 I[cI]=space at line 15 col 7
 Parse Error: Sep : SepChar ∙Sep  cI=354 I[cI]=s at line 15 col 1
 Parse Error: NTChars : NTChar ∙NTChars  cI=326 I[cI]=; at line 11 col 13
 Parse Error: Symbols : Symbol ∙Sep Symbols  cI=326 I[cI]=; at line 11 col 13
 Parse Error: Alternates : Alternate ∙SepE | SepE Alternates  cI=326 I[cI]=; at line 11 col 13
 Parse Error: Sep : SepChar ∙Sep  cI=321 I[cI]=V at line 11 col 8
 Parse Error: NTChars : NTChar ∙NTChars  cI=319 I[cI]=: at line 11 col 6
 Parse Error: Sep : SepChar ∙Sep  cI=314 I[cI]=G at line 11 col 1
 Parse Error: Sep : SepChar ∙Sep  cI=271 I[cI]=" at line 9 col 9
 Parse Error: Sep : SepChar ∙Sep  cI=263 I[cI]=p at line 9 col 1
Error in BSR: 0 parse trees exist for start symbol GoGLL

empty.md:

ParseError: Error: Parse Failed right extent=33, m=116
 Parse Error: NTChars : NTChar ∙NTChars  cI=32 I[cI]=space at line 3 col 14
 Parse Error: Terminal : l ∙o w c a s e  cI=27 I[cI]=e at line 3 col 9
 Parse Error: Sep : SepChar ∙Sep  cI=26 I[cI]=l at line 3 col 8
 Parse Error: Sep : SepChar ∙Sep  cI=24 I[cI]=: at line 3 col 6
 Parse Error: NTChars : NTChar ∙NTChars  cI=23 I[cI]=space at line 3 col 5
 Parse Error: Sep : SepChar ∙Sep  cI=19 I[cI]=n at line 3 col 1
 Parse Error: Sep : SepChar ∙Sep  cI=8 I[cI]=" at line 1 col 9
Error in BSR: 0 parse trees exist for start symbol GoGLL

and my dumbed down test:

package "test"

GoGLL : Package Options ;

identifier : letter ;

Package: "package" identifier ;

Options : "option" identifier
  | "option" identifier ',' Options
  ;

produces:

Semantic Error: Rule GoGLL is not used at line 3 col 1

Fix plain BNF input

The current version of gogll handles markdown grammar files correctly but plan BNF files don't work. For example all x.bnf files in the test directory.

The work-around for now is to use only markdown grammar files.

generate a parser for ASN1 grammar

I would like to know if I can use gogll to generate a parser from ASN.1 grammar from something like this:
ASN.1 Grammar example

What changes will be needed?

how to get comments ?

Lexer's Tokens not contains comments.

Generated files are set as executable

As stated in the title, and at least on linux, files generated using gogll are marked as executable despite being just plain source code. Is there a particular reason behind this?

This isn't a huge problem, but it does stand out. For reference, here is a screenshot of the file tree. The green files have all been generated by the tool and haven't been tampered with.

Bug with using "\""

I'm using gogll version:

> gogll -version
gogll v3.2.0

If I run the following simple program:

package "hello"

Aa: "\"";

It fails with:

> gogll jsony.md
panic: runtime error: index out of range [2] with length 1

goroutine 1 [running]:
github.com/goccmack/gogll/ast.(*CharLiteral).Char(0xc00000c5c0, 0x1)
        /home/agus/go/src/github.com/goccmack/gogll/ast/lex.go:171 +0x1d9
github.com/goccmack/gogll/lex/items/event.Subset(0x68cc20, 0xc00000c5c0, 0x68cc20, 0xc00000c5c0, 0xc00000c5c0)
        /home/agus/go/src/github.com/goccmack/gogll/lex/items/event/event.go:105 +0x2a5
github.com/goccmack/gogll/lex/items.(*Set).nextSets(0xc00011e1c0, 0x0, 0x0, 0x0)
        /home/agus/go/src/github.com/goccmack/gogll/lex/items/items.go:179 +0x119
github.com/goccmack/gogll/lex/items.New(0xc00005c280, 0xc00000c5a0)
        /home/agus/go/src/github.com/goccmack/gogll/lex/items/items.go:65 +0x24f
main.main()
        /home/agus/go/src/github.com/goccmack/gogll/main.go:84 +0x21c

If this isn't a bug but a conceptual problem, it should probably fail with a more informative error.

Thanks for the project, it's very nice to play with BNF!

Cannot build grammar containing backtick as token

It seems I'm unable to define a token to be the backtick `.

E.g. consider the g1.md grammar:

package "g1"

Exp : Exp Op Exp
    | id
    ;

Op : "&" | "|" ;

id : letter <letter | number> ;

If I were to extend id like so:

id : letter <letter | number | '-'> ;

all would be handled just fine, but if I were to extend it like so:

id : letter <letter | number | '`'> ;

then gogll would no longer be able to build a parser from it:

Parse Errors:
Parse Error: LexAlternates : RegExp | ∙LexAlternates  I[24]=Error (388,391) '`  at line 26 col 32
Expected one of: [.,[,letter,{,(,any,char_lit,lowcase,not,number,upcase,<]
Parse Error: RegExp : LexSymbol ∙RegExp  I[23]=| (386,387) | at line 26 col 30
Expected one of: [any,char_lit,lowcase,number,{,not,upcase,(,.,<,[,letter]
Parse Error: LexAlternates : ∙RegExp  I[23]=| (386,387) | at line 26 col 30
Expected one of: [),>,],}]
Parse Error: RegExp : LexSymbol ∙RegExp  I[21]=| (377,378) | at line 26 col 21
Expected one of: [not,upcase,(,.,<,[,letter,any,char_lit,lowcase,number,{]
Parse Error: LexAlternates : ∙RegExp  I[21]=| (377,378) | at line 26 col 21
Expected one of: [),>,],}]
Parse Error: RegExp : ∙LexSymbol  I[19]=< (369,370) < at line 26 col 13
Expected one of: [>,],|,},),;]
Parse Error: Rules : ∙Rule  I[16]=tokid (357,359) id at line 26 col 1
Expected one of: [EOF]

Add a v3.2.2 tag

Hi Marius, thank you for merging in PR #10.

Would you be able to add a v3.2.2 tag to the repository so we can pin our build to that version of GoGLL?

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.

Jobs

Jooble