palle-k / covfefe Goto Github PK

A parser for nondeterministic context free languages

Home Page: https://palle-k.github.io/Covfefe/

License: MIT License

Swift 99.74% Shell 0.07% Ruby 0.19%

mathematical-expressions earley-parser cyk-parser grammar syntax-tree context-free-grammar context-free-language cyk earley earley-algorithm

covfefe's People

Contributors

Stargazers

Watchers

Forkers

between40and2 drjoeavg mikolasstuchlik mlajtos astrotuna201 schnaser gerzer contentstooldev

covfefe's Issues

Result Builders for Grammar

Hello,

I am going to use your library for rather large grammar of Relax NG Compact (https://www.oasis-open.org/committees/relax-ng/compact-20020607.html).

After some attempts to rewrite the original notation to the Covfefe notation as defined in https://github.com/palle-k/Covfefe/blob/master/BNF.md, I've came to the conclusion, it would be much better for me, to write a DSL.

I've completed a simple proof of concept implementation which supports all expressions with exception for ranges. The DSL I've designed looks like this:

    Grammar.build {
        %"person"   → %"name" .. %"space" .. (%"name" .. %"space")-? .. %"surname"
        %"name"     → ^"Mikolas"
                    | ^"Jan"
                    | ^"Lukas"
        %"surname"  → ^"Stuchlik"
                    | ^"Dvorak"
        %"space"    → ^" "
    }

translating to

name = ('Mikolas' | 'Jan' | 'Lukas');
space = ' ';
surname = ('Stuchlik' | 'Dvorak');
person = name, space, [name, space], surname;

I've had to deal with Swift's limitations on which characters are allowed as an operator.

I'm open to suggestions.

I see two main advantages in using Function Builders

some degree of static analysis is provided by the Swift compiler
the user of the library is able to write some additional checks, for example, whether there are any unused non-terminals or undefined non-terminals, warnings when a non-terminal has the same name as a terminal string etc.

Would you be interested in having a Builder as a part of the main library?

Have a nice day, Mikoláš

Swift 5.5 and Linux Unit tests

Recently, it has come to my attention, that swift test no longer needs, nor allows the LinuxMain.swift file in the Tests/ directory.

futuredapp/FTAPIKit@3feab35

Using #if swift(>=5.5) && !os(Linux) statement in the LinuxMain.swift won't work, the compilation (or linking) will fail as long as the LinuxMain.swift file is present.

I suppose this is a bug in the SPM itself, since the SPM should ignore all files that are outside of the target structure.

This bug is relevant on Swift 5.5. Since the minimal compiler version of this package is Swift 5.4, I suggest taking no action at this time, since deleting the LinuxMain.swift file will break the tests on Swift 5.4.

Example usage

If I have a simple arithmetic grammar, how should I go on and traverse the syntax tree and implement the eval?

import Covfefe

let grammarString = """
expression       = binary-operation | brackets | unary-operation | number;
brackets         = '(', expression, ')';
binary-operation = expression, binary-operator, expression;
binary-operator  = '+' | '-' | '*' | '/';
unary-operation  = unary-operator, expression;
unary-operator   = '+' | '-';
number           = {digit};
digit            = '0' ... '9';
"""

let grammar = try Grammar(ebnf: grammarString, start: "expression")
let parser = EarleyParser(grammar: grammar)

let syntaxTree = try parser.syntaxTree(for: "(23+47)*(-3)")

// syntaxTree.map(transform: { node in ???})

I am coming from Ohm/JS world and I feel lost on how to proceed from here. From the source I gathered that I have to guard for terminal vs. non-terminal and desctructure accordingly, but I can't really put that into Swift code. Can you please provide some snippet to get me going? I think README would benefit from it too.

dot output label is not human friendly

I see that you are moving forward with swift 4.1 but is there any chance you could point me in the right direction to clean up the dot output as illustrated with your parser.syntaxTree(for: "(a + b)*(-c)") example?

	node29 [label="Index(_compoundOffset: 40, _cache: Swift.String.Index._Cache.utf16)..<Index(_compoundOffset: 44, _cache: Swift.String.Index._Cache.utf16)" shape=box]

Thanks.

Ambiguous Grammar

Hello,

I am using EarleyParser with following EBNF grammar:

    start           = [expression];
    expression      = atom | operation;
    operation       = expression, atom, atom;
    atom            = brackets | number | symbol;
    brackets        = '(', expression, ')';
    number          = {digit};
    symbol          = letter | "+" | "-";
    letter          = 'a' ... 'z';
    digit           = '0' ... '9';

(Yes, no operator precedence – '1+2-3+4' should be parsed as ((1+2)-3)+4.)

String '121' is ambiguous – sometimes parsed as number '121' and sometimes as infix operation '1 2 1' where '2' is operator and ones are operands. If I use method allSyntaxTrees(for:) I get both, which is okay, but method syntaxTree(for:) returns non-deterministicaly both trees.

Is there any way to:

always return the same tree for ambiguous grammar, or
make the grammar unambiguous, so '121' is parsed as number, or
do something else which I don't know?

I have an Ohm/JS (Packrat parser) background and there isn't such problem, so I am a bit confused right now.

BNF.md says unicode scalars are supported, but they aren't

I understand if BNF.md is supposed to be a general reference, rather than a reference of the syntax of this specific parser, but then it shouldn't be implied that it's this parser's grammar in the readme:

Covfefe/README.md

Line 64 in 8c69c03

 Grammars can be specified in a superset of EBNF or a superset of BNF, which adopts some features of EBNF (documented [here](/BNF.md)). 

so I assumed that this documents the version of (E)BNF this parser uses.

I see in BNF.md that Unicode scalars are supported:

Covfefe/BNF.md

Lines 182 to 184 in 8c69c03

 unicode-scalar = "\\", "u", "{", unicode-scalar-digits, "}"; 

 unicode-scalar-digits = [digit], [digit], [digit], [digit], [digit], [digit], [digit], digit; 

 digit = '0' ... '9' | 'A' ... 'F' | 'a' ... 'f';

so I tried to implement an "any non-control character goes" rule like this:

comment_char = \u{000020} ... \u{1FFFFF} ;

but I get this error message:

Error: Unmatched pattern at L1:16: '\', expected: digit | delimiting-rule-name-char | whitespace | literal | expression-optional | comment | expression-repetition | single-char-literal | delimiting-rule-name-char | expression-group

Performance issues related to memory

Hello,

I have been playing with the package recently and I have experienced some issues.

The program seems to struggle with larger texts. At first I suspected, there might be an unreasonable amount of copying involved. After some debugging, I've came to conclusion, that this is in fact a problem with stack related to usage of recursion.

Reproduction example:

        let grammar = Grammar(start: "initial") {
            "initial"       --> n("initial") <+> t("a")
                            <|> t()
        }

        let parser = EarleyParser(grammar: grammar)
        let text = String.init(repeating: "a", count: 2500)
        let syntaxTree = try parser.syntaxTree(for: text )

I'm trying to work with files aroung 10K characters. At this time, I am able to work with 2000 characters long texts using this simple grammar.

For the time being, I will try to find a workaroud. I might have time to fix the issue in the future.

Re-creating the image in the readme

Hello,

I have a problem with grammar I'm working on (stack overflows and other crashes).

The graph in the README looks like an output of graphviz. Do you have a program that translates the syntax tree into the DOT language?

If so, would you share it with me?

Otherwise, I would attempt to implement such a program.

-Mikolas

Consider merging develop branch to master?

ABNF support

I would like to be able to parse Dhall, which provides an ABNF grammar. As far as I understand, ABNF is more standardized, so I wonder how hard would it be for Covfefe to support it?

palle-k / covfefe Goto Github PK

covfefe's People

Contributors

Stargazers

Watchers

Forkers

covfefe's Issues

Result Builders for Grammar

Swift 5.5 and Linux Unit tests

Example usage

dot output label is not human friendly

Ambiguous Grammar

BNF.md says unicode scalars are supported, but they aren't

Performance issues related to memory

Re-creating the image in the readme

Consider merging develop branch to master?

ABNF support

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs

	unicode-scalar = "\\", "u", "{", unicode-scalar-digits, "}";
	unicode-scalar-digits = [digit], [digit], [digit], [digit], [digit], [digit], [digit], digit;
	digit = '0' ... '9' \| 'A' ... 'F' \| 'a' ... 'f';