goccmack / gogll Goto Github PK
View Code? Open in Web Editor NEWGenerates generalised LL (GLL) and reduced size LR(1) parsers with matching lexers
License: Apache License 2.0
Generates generalised LL (GLL) and reduced size LR(1) parsers with matching lexers
License: Apache License 2.0
As stated in the title, and at least on linux, files generated using gogll are marked as executable despite being just plain source code. Is there a particular reason behind this?
This isn't a huge problem, but it does stand out. For reference, here is a screenshot of the file tree. The green files have all been generated by the tool and haven't been tampered with.
I'm using gogll version:
> gogll -version
gogll v3.2.0
If I run the following simple program:
package "hello"
Aa: "\"";
It fails with:
> gogll jsony.md
panic: runtime error: index out of range [2] with length 1
goroutine 1 [running]:
github.com/goccmack/gogll/ast.(*CharLiteral).Char(0xc00000c5c0, 0x1)
/home/agus/go/src/github.com/goccmack/gogll/ast/lex.go:171 +0x1d9
github.com/goccmack/gogll/lex/items/event.Subset(0x68cc20, 0xc00000c5c0, 0x68cc20, 0xc00000c5c0, 0xc00000c5c0)
/home/agus/go/src/github.com/goccmack/gogll/lex/items/event/event.go:105 +0x2a5
github.com/goccmack/gogll/lex/items.(*Set).nextSets(0xc00011e1c0, 0x0, 0x0, 0x0)
/home/agus/go/src/github.com/goccmack/gogll/lex/items/items.go:179 +0x119
github.com/goccmack/gogll/lex/items.New(0xc00005c280, 0xc00000c5a0)
/home/agus/go/src/github.com/goccmack/gogll/lex/items/items.go:65 +0x24f
main.main()
/home/agus/go/src/github.com/goccmack/gogll/main.go:84 +0x21c
If this isn't a bug but a conceptual problem, it should probably fail with a more informative error.
Thanks for the project, it's very nice to play with BNF!
It seems I'm unable to define a token to be the backtick `.
E.g. consider the g1.md
grammar:
package "g1"
Exp : Exp Op Exp
| id
;
Op : "&" | "|" ;
id : letter <letter | number> ;
If I were to extend id
like so:
id : letter <letter | number | '-'> ;
all would be handled just fine, but if I were to extend it like so:
id : letter <letter | number | '`'> ;
then gogll
would no longer be able to build a parser from it:
Parse Errors:
Parse Error: LexAlternates : RegExp | ∙LexAlternates I[24]=Error (388,391) '` at line 26 col 32
Expected one of: [.,[,letter,{,(,any,char_lit,lowcase,not,number,upcase,<]
Parse Error: RegExp : LexSymbol ∙RegExp I[23]=| (386,387) | at line 26 col 30
Expected one of: [any,char_lit,lowcase,number,{,not,upcase,(,.,<,[,letter]
Parse Error: LexAlternates : ∙RegExp I[23]=| (386,387) | at line 26 col 30
Expected one of: [),>,],}]
Parse Error: RegExp : LexSymbol ∙RegExp I[21]=| (377,378) | at line 26 col 21
Expected one of: [not,upcase,(,.,<,[,letter,any,char_lit,lowcase,number,{]
Parse Error: LexAlternates : ∙RegExp I[21]=| (377,378) | at line 26 col 21
Expected one of: [),>,],}]
Parse Error: RegExp : ∙LexSymbol I[19]=< (369,370) < at line 26 col 13
Expected one of: [>,],|,},),;]
Parse Error: Rules : ∙Rule I[16]=tokid (357,359) id at line 26 col 1
Expected one of: [EOF]
I would like to know if I can use gogll to generate a parser from ASN.1 grammar from something like this:
ASN.1 Grammar example
What changes will be needed?
In BASIC there is no end-of-statement marker (like ; in C). Instead, the end-of-line is the marker. Event if I remove \n from !whitespace character, I'm still not allow to compile the grammar. Here is a simplified example:
\n is part of the syntax. See, it is not part of !whiespace
package "BUG_REPPORT"
File
: DeclList
;
DeclList
: Stmt
| DeclList Stmt
;
Stmt
: "Print" string_lit "\n"
;
!line_comment
: ('R''e''m' | ';') {not "\n" } "\n";
!whitespace : <' ' | '\t' | '\r' > ;
string_lit : (quote { quote quote | not_quote } quote) ;
This generates the following errors:
arse Errors:
Parse Error: LexZeroOrMore : ∙{ LexAlternates } I[33]=string_lit (303,307) "\n" at line 21 col 37
Expected one of: ['[,<,>,any,|,),.,lowcase,not,{,},;,[,],number,tokid,(,char_lit,letter,upcase]
Parse Error: LexAlternates : RegExp ∙| LexAlternates I[32]=} (301,302) } at line 21 col 35
Expected one of: [|]
Parse Error: RegExp : LexSymbol ∙RegExp I[32]=} (301,302) } at line 21 col 35
Expected one of: [char_lit,.,<,[,lowcase,not,tokid,{,'[,letter,number,any,upcase,(]
Parse Error: RegExp : ∙LexSymbol I[29]={ (291,292) { at line 21 col 25
Expected one of: [|,},),;,>,]]
Parse Error: RegExp : LexSymbol ∙RegExp I[28]=) (289,290) ) at line 21 col 23
Expected one of: [(,upcase,char_lit,[,lowcase,not,tokid,{,'[,.,<,any,letter,number]
Parse Error: LexAlternates : RegExp ∙| LexAlternates I[28]=) (289,290) ) at line 21 col 23
Expected one of: [|]
Parse Error: LexAlternates : ∙RegExp I[26]=| (284,285) | at line 21 col 18
Expected one of: [),>,],}]
Parse Error: RegExp : LexSymbol ∙RegExp I[26]=| (284,285) | at line 21 col 18
Expected one of: [(,upcase,char_lit,lowcase,not,tokid,{,'[,.,<,[,any,letter,number]
Parse Error: RegExp : ∙LexSymbol I[25]=char_lit (280,283) 'm' at line 21 col 14
Expected one of: [|,},),;,>,]]
Parse Error: RegExp : ∙LexSymbol I[24]=char_lit (277,280) 'e' at line 21 col 11
Expected one of: [),;,>,],|,}]
The expected result would be a valid grammar.
in BASIC, the backslash character (\) is used for integer division on float numbers. When I try to use it, I get an error. Here is the code:
package "BUG_REPPORT"
Expr
: number "\\" number;
And here is the message :
Parse Errors:
Parse Error: SyntaxRule : nt : ∙SyntaxAlternates ; I[4]=number (36,42) number at line 6 col 7
Expected one of: [empty,nt,string_lit,tokid]
I also tried:
package "BUG_REPPORT"
Expr
: number intDivOp number;
intDivOp: '\\';
This gives me the same error as above.
The Readme.md mentions a char_lit
and char_set
. But char_set
seems undefined. How do I specify 'a'-'z' (a to z)?
Lexer's Tokens
not contains comments.
The current version of gogll handles markdown grammar files correctly but plan BNF files don't work. For example all x.bnf files in the test directory.
The work-around for now is to use only markdown grammar files.
I just experimented with gogll, and it's pretty awesome. Thanks for working on this!
As stated in the title, I was wondering what the suggested way is of dealing with code comments. As actual tokens in the grammar? Or would it make sense to allow the unicode.IsSpace()
call in lexer.New()
to be replaced with a custom version that skips over comments?
Hi Marius, thank you for merging in PR #10.
Would you be able to add a v3.2.2 tag to the repository so we can pin our build to that version of GoGLL?
After struggling to get gogll to parse a fairly basic grammar I reverted to the examples, and found that neither the GoGLL grammar nor the Json grammar currently parse; the only grammar I could get to parse was boolx?
> curl https://raw.githubusercontent.com/goccmack/gogll/master/examples/json/json.md -o json.md
> gogll json.md
ParseError: Error: Parse Failed right extent=380, m=1047
Parse Error: CharLiteral : \' ∙\\ anyof("nrt\\'\"") \' cI=364 I[cI]=" at line 15 col 11
Parse Error: Sep : SepChar ∙Sep cI=363 I[cI]=' at line 15 col 10
Parse Error: Sep : SepChar ∙Sep cI=361 I[cI]=: at line 15 col 8
Parse Error: NTChars : NTChar ∙NTChars cI=360 I[cI]=space at line 15 col 7
Parse Error: Sep : SepChar ∙Sep cI=354 I[cI]=s at line 15 col 1
Parse Error: NTChars : NTChar ∙NTChars cI=326 I[cI]=; at line 11 col 13
Parse Error: Symbols : Symbol ∙Sep Symbols cI=326 I[cI]=; at line 11 col 13
Parse Error: Alternates : Alternate ∙SepE | SepE Alternates cI=326 I[cI]=; at line 11 col 13
Parse Error: Sep : SepChar ∙Sep cI=321 I[cI]=V at line 11 col 8
Parse Error: NTChars : NTChar ∙NTChars cI=319 I[cI]=: at line 11 col 6
Parse Error: Sep : SepChar ∙Sep cI=314 I[cI]=G at line 11 col 1
Parse Error: Sep : SepChar ∙Sep cI=271 I[cI]=" at line 9 col 9
Parse Error: Sep : SepChar ∙Sep cI=263 I[cI]=p at line 9 col 1
Error in BSR: 0 parse trees exist for start symbol GoGLL
empty.md:
ParseError: Error: Parse Failed right extent=33, m=116
Parse Error: NTChars : NTChar ∙NTChars cI=32 I[cI]=space at line 3 col 14
Parse Error: Terminal : l ∙o w c a s e cI=27 I[cI]=e at line 3 col 9
Parse Error: Sep : SepChar ∙Sep cI=26 I[cI]=l at line 3 col 8
Parse Error: Sep : SepChar ∙Sep cI=24 I[cI]=: at line 3 col 6
Parse Error: NTChars : NTChar ∙NTChars cI=23 I[cI]=space at line 3 col 5
Parse Error: Sep : SepChar ∙Sep cI=19 I[cI]=n at line 3 col 1
Parse Error: Sep : SepChar ∙Sep cI=8 I[cI]=" at line 1 col 9
Error in BSR: 0 parse trees exist for start symbol GoGLL
and my dumbed down test:
package "test"
GoGLL : Package Options ;
identifier : letter ;
Package: "package" identifier ;
Options : "option" identifier
| "option" identifier ',' Options
;
produces:
Semantic Error: Rule GoGLL is not used at line 3 col 1
I'm fiddling with gogll
and I like it. Much cleaner than goyacc
. I especially like the ability to embed the grammar in a CommonMark/Markdown file. A lovely way to document the grammar.
It would be nice/useful if gogll
supported
ID_Start
and ID_Continue
)Currently, the only predefined character classes you support are
letter
: Unicode character class L
|Letter
, which comprises the the following Unicode character classes:
Lu
| Uppercase Letter
Ll
| Lowercase Letter
Lt
| Titlecase Letter
Lm
| Modifier Letter
Lo
| Other Letter
upcase
: Unspecified, but I suspect this is the Unicode character class Lu
| Uppercase Letter
.lowcase
: Unspecified, but I suspect this is the Unicode character class Ll
|Lowercase Letter
.number
: Unicode character class N
|`Number, which comprises the following Unicode character classes
Nd
|Decimal Digit Number
Nl
|Letter Number
No
|Other Number
This makes writing parsers for some languages difficult (and the resulting parsers brittle should Unicode add new characters).
For instance, the Javascript/Ecmascript specification for an identifier is defined in terms of those Unicode characters having the ID_Start
and ID_Continue
properties:
And the Go programming language defines the identifier
production as
identifier = letter { letter | unicode_digit } .
letter = unicode_letter | "_" .
unicode_letter = /* a Unicode code point categorized as "Letter" */ .
unicode_digit = /* a Unicode code point categorized as "Number, decimal digit" */ .
No having support for Unicode named character classes or properties makes it difficult to write a parser for such languages. The Unicode Number, decimal digit
class comprises some 650 [discontiguous] code points. I have no idea how many code points are in the ID_Start
category (a lot), and `ID_Continue adds to it. From Unicode® Standard Annex #31: Unicode Identifier and Pattern Syntax:
As you can see, adding support for this sort of stuff would be useful.
Because 'go get' is no longer supported outside a module, go get github.com/goccmack/gogll/v3
(as shown in the readme) no longer works (and implies that the code you're pulling in is a library).
go install github.com/goccmack/gogll/v3@latest
is the comparable command, reflected in this change, which properly installs the latest version of gogll to $GOPATH/bin
.
I followed the documentation and ran go get github.com/goccmack/gogll
and it installed v1.0.4 instead of the latest version v3.2.2.
This is because the major version suffix is not given to the module name, even though the major version of gogll is 2 or higher.
For more details, please refer to the following document.
https://golang.org/ref/mod#module-path
https://github.com/golang/go/wiki/Modules#semantic-import-versioning
To fix this, change the module name declared in the go.mod file as follows
module github.com/goccmack/gogll/v3
In addition, make the same change to all import statements.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.