GithubHelp home page GithubHelp logo

hadron67 / tscc-compiler Goto Github PK

View Code? Open in Web Editor NEW
9.0 3.0 1.0 2.91 MB

An LALR(1) parser generator written in Typescript

Home Page: https://hadroncfy.com/tscc-compiler/web-demo/

License: MIT License

Makefile 0.05% JavaScript 40.62% TypeScript 56.52% Yacc 2.81%
compiler-compiler compiler-construction lexical-analyzer yacc

tscc-compiler's Introduction

tscc-compiler

An LALR(1) compiler generator written in typescript. It generates both the tokenizer and the parser. Currently it can generate parsers written in javascript and typescript. More target languages will be supported in future releases.

Check out wiki for tscc-compiler!

Installation

To use it in node, you can install tscc-compiler via npm:

npm install tscc-compiler -g

Or, to use it in browsers, just download the file tscc.js or tscc.min.js and reference them with a script tag. The latter is a compressed version of the former.

Usage

From command line interface (CLI)

To generate the corresponding parser whose grammar is specified in test.y, for example, you could use the following command:

tscc-compiler test.y

The output files actually depends on the target language. For js and ts, test.ts or test.js would be generated respectively, plus a report file test.output, which contains the lexical DFA tables and LALR parse table.

Options for CLI

Option Description Argument
-o, --output Specify the output file to print DFA and parse table Name of the output file
-t, --test Parse (no lexical analyse) the given input string. Parsing process will be printed. See below for explanation. The string to be parsed.
-d, --detail-time Print a detailed list of time costs of different generation phases. No
-h, --help Print help message and quit No

From module

This project uses module bundler rollup to create a source file tscc.js that contains the entail source code for tscc-compiler. You may import it as a module by var tscc = require('tscc-compiler'); or include tscc.js with a script tag in browsers. A simple way to invoke tscc-compiler is calling tscc.main with the argument being an object that contains various options. It returns 0 if no error ocurrs, otherwise it returns -1. Options are listed below:

Option Required Type Description
inputFile Yes string Name of the input file
input Yes string Content of the input file
outputFile No string Name of output file (not to be confused with output parser). If not specified, the output file won't be generated.
stdout Yes tscc.io.OutputStream An interface object to output all the messages. This object must contain write(s) and writeln(s).
writeFile Yes (path: string, content: string) => any A callback to write files.
testInput No string Test input. If specified, the result will be printed. See below for explanation.
printDetailedTime Yes boolean Whether to print the detailed time cost list.
printDFA No boolean Whether to print lexical DFA tables in the output file.
showlah No boolean Whether to show look-ahead tokens of items when printing parse table.
showFullItemsets No boolean Whether to show full item sets when printing parse table. If not specified or set to false, only kernel items will be printed.

Where type notations in Typescript are used.

Here's a simple example:

var fs = require('fs');
var tscc = require('tscc-compiler').main;
tscc({
    inputFile: 'example.y',
    input: fs.readFileSync('example.y', 'utf-8'),
    outputFile: 'example.output',
    stdout: {
    	write: function(s){ process.stdout.write(s); },
        writeln: function(s){ console.log(s || ''); }
    },
    writeFile: function(path, content){
        fs.writeFileSync(path, content);
    },
    printDetailedTime: true
});

The module also provides a more flexible way to use it.

Test input

You can give the tscc-compiler a test input string to test if the grammar works. Input string consists of the following two elements, seperated by spaces:

  • An identifier parenthesised by <> is a token, referenced by its name;
  • A raw string is also a token, but referenced by alias;

For example, to test the calculator grammar (see examples/calculator/):

tscc-compiler caculator.y -t "<CONST> + <CONST> * <CONST>"

The output should be:

preparing for test
| <CONST> "+" <CONST> "*" <CONST> 
<CONST> | "+" <CONST> "*" <CONST> 
expr | "+" <CONST> "*" <CONST> 
expr "+" | <CONST> "*" <CONST> 
expr "+" <CONST> | "*" <CONST> 
expr "+" expr | "*" <CONST> 
expr "+" expr "*" | <CONST> 
expr "+" expr "*" <CONST> | 
expr "+" expr "*" expr | 
expr "+" expr | 
expr | 
start | 
accepted!
compilation done in 0.071s
0 warning(s), 0 error(s)

Grammar file

The syntax of the grammar specifying file used by tscc is similiar to yacc. Checkout wiki for tscc-compiler for a specification of grammar file, and examples/ for explicit usages.

Syntax highlight

A syntax highlight mode of grammar file for CodeMirror can be found at web-demo/lib/tscc-highlight-codemirror.js. Feel free to check it out and use it.

Demo

An online demo for tscc-compiler can be found here.

License

MIT.

tscc-compiler's People

Contributors

hadron67 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

Forkers

webassemblyos

tscc-compiler's Issues

Grammar railroad diagram

Extracting an EBNF understood by https://www.bottlecaps.de/rr/ui to produce a navigable railroad diagram from https://github.com/Hadron67/tscc-compiler/blob/master/src/parser/parser.y , ideally tscc would have an option to output it.

Copy and paste the EBNF shown bellow on https://www.bottlecaps.de/rr/ui in the tab Edit Grammar the click the tab View Diagram to see a navigable railroad diagram.

start ::= options "%%" body "%%" epilogue

options ::= options option
options ::=
option ::= "%lex" states_ "{" lexBody "}"
option ::= associativeDir assocTokens
option ::= "%option" "{" optionBody "}"
option ::= "%header" block
option ::= "%extra_arg" block
option ::= "%type" block
option ::= "%init" block block
option ::= "%output" STRING
option ::= "%token" tokenDefs
option ::= "%token_hook" "(" NAME ")" block
option ::= "%touch" touchTokenList

tokenDefs ::= tokenDefs "<" NAME ">"
tokenDefs ::= "<" NAME ">"
touchTokenList ::= touchTokenList tokenRef
touchTokenList ::= tokenRef
epilogue ::=
epilogue ::= nonEmptyEpilogue
nonEmptyEpilogue ::= nonEmptyEpilogue ANY_CODE
nonEmptyEpilogue ::= ANY_CODE
associativeDir ::= "%left"
associativeDir ::= "%right"
associativeDir ::= "%nonassoc"
assocTokens ::= assocTokens assocToken
assocTokens ::= assocToken
assocToken ::= tokenRef
assocToken ::= NAME
optionBody ::= optionBody NAME "=" STRING
optionBody ::=
states_ ::= "<" states ">"
states_ ::=
states ::= NAME
states ::= states "," NAME
lexBody ::= lexBody lexBodyItem
lexBody ::=
lexBodyItem ::= NAME "=" "<" regexp ">"
lexBodyItem ::= newState "<" regexp ">" lexAction_
lexBodyItem ::= newState "<" NAME ":" regexp ">" lexAction_

newState ::=
lexAction_ ::= ":" lexAction
lexAction_ ::=
lexAction ::= "[" lexActions "]"
lexAction ::= actionBlock

lexActions ::= lexActions "," lexActionItem
lexActions ::= lexActionItem
lexActionItem ::= "+" NAME
lexActionItem ::= "-"
lexActionItem ::= "=>" NAME
lexActionItem ::= "=" STRING
lexActionItem ::= actionBlock
regexp ::= innerRegexp
regexp ::= "%least" innerRegexp
innerRegexp ::= union

union ::= union "|" simpleRE
union ::= simpleRE
simpleRE ::= simpleRE basicRE
simpleRE ::= basicRE
basicRE ::= primitiveRE rePostfix

rePostfix ::= "+"
rePostfix ::= "?"
rePostfix ::= "*"
rePostfix ::=
primitiveRE ::= "(" innerRegexp ")"
primitiveRE ::= "[" inverse_ setRE_ "]"
primitiveRE ::= "<" NAME ">"
primitiveRE ::= "%import" "(" STRING ")"
primitiveRE ::= STRING
inverse_ ::= "^"
inverse_ ::=
setRE_ ::= setRE
setRE_ ::=
setRE ::= setRE "," setREItem
setRE ::= setREItem
setREItem ::= STRING
setREItem ::= STRING "-" STRING
body ::= body bodyItem
body ::= bodyItem
bodyItem ::= compoundRule
compoundRule ::= NAME arrow rules ";"

arrow ::= ":"
arrow ::= "=>"
rules ::= rules "|" rule
rules ::= rule
rule ::= ruleHead ruleBody ruleTrailer

ruleHead ::= "%use" "(" varUseList ")"
ruleHead ::=
varUseList ::= varUseList "," NAME
varUseList ::= NAME
ruleBody ::= ruleItems
ruleBody ::= "%empty"
ruleItems ::= ruleItems ruleItem
ruleItems ::=
itemName ::= NAME "="
itemName ::=
ruleItem ::= NAME
ruleItem ::= NAME "=" NAME
ruleItem ::= itemName tokenRef
ruleItem ::= itemName lexAction

tokenRef ::= "<" NAME ">"
tokenRef ::= STRING
ruleTrailer ::=
ruleTrailer ::= rulePrec
ruleTrailer ::= rulePrec lexAction
rulePrec ::= "%prec" NAME
rulePrec ::= "%prec" tokenRef
block ::= "{" innerBlock "}"

innerBlock ::= innerBlock innerBlockItem
innerBlock ::=
innerBlockItem ::= codeList
innerBlockItem ::= "{" innerBlock "}"

actionBlock ::= always "{" innerActionBlock "}"

always ::= "%always"
always ::=
innerActionBlock ::= innerActionBlock innerActionBlockItem
innerActionBlock ::=
innerActionBlockItem ::= codeList
innerActionBlockItem ::= "$$"
innerActionBlockItem ::= "$token"
innerActionBlockItem ::= "$matched"
innerActionBlockItem ::= EMIT_TOKEN
innerActionBlockItem ::= "{" innerActionBlock "}"

codeList ::= codeList ANY_CODE
codeList ::= ANY_CODE

The above EBNF was extracted adding this code to tscc:

export class Rule{
...
    public toEbnfRR(){
        if(this.lhs.sym.charAt(0) == "@") return "";
        var ret = this.lhs.sym + ' ::=';
        for(var i = 0;i < this.rhs.length;i++){
            var r = this.rhs[i];
            if(r >= 0){
                var tok = this.g.tokens[r];
                // ret += ' <' + this.g.tokens[r].sym + '>';
                ret += ' ' + (tok.alias === null ? tok.sym : `"${tok.alias}"`);
            }
            else {
                var sym = this.g.nts[-r - 1].sym;
                if(sym.charAt(0) != "@") ret += ' ' + sym;
            }
        }
        return ret;
    }
}
...
export class Grammar implements TokenEntry{
...
    toEbnfRR(){
        var ret = '';
        this.forEachRule((lhs, rule) => {
            var s = rule.toEbnfRR();
            ret += s + "\n";
        });
        return ret;
    }
...
}

How to show line/col on rule conflicts ?

Trying an alternative online playground for tscc here https://meimporta.eu/TsccYaccLex/playground.html I want to be able to click on the conflict messages and the editor jump to the line/col of it, I already got something close but all line/col info are the one from the head lhs of the rule instead of the sub rule.

To see it go to https://meimporta.eu/TsccYaccLex/playground.html select JZend using the select in the upper middle screen then comment this line:

//%left '*' '/' '%'

Now click the Parse button (upper left corner) to get this messages:

Warning: state 75, shift/reduce conflict:
    token: "*"
    used rule: [ 112: expr_without_var => expr . "*" expr ]* (line 600, column 0)
    discarded rule: [ 118: expr_without_var => "+" expr . ]*  (line 600, column 0)
Warning: state 75, shift/reduce conflict:
    token: "/"
    used rule: [ 113: expr_without_var => expr . "/" expr ]* (line 600, column 0)
    discarded rule: [ 118: expr_without_var => "+" expr . ]*  (line 600, column 0)
Warning: state 75, shift/reduce conflict:
    token: "%"
    used rule: [ 114: expr_without_var => expr . "%" expr ]* (line 600, column 0)
    discarded rule: [ 118: expr_without_var => "+" expr . ]*  (line 600, column 0)
...

I would like to have the line/col of each used/discarded rule correctly pointing to then.

Could you give any help on it ?

This is based on https://github.com/yhirose/cpp-peglib playground here https://yhirose.github.io/cpp-peglib/
and also https://github.com/ChrisHixon/chpeg playground here https://chrishixon.github.io/chpeg/playground/
and https://github.com/mingodad/CocoR-Typescript playground here https://mingodad.github.io/CocoR-Typescript/playground

Cheers !

Lexer tables are big for any non trivial grammar

It seems that the lexer generator doesn't do a great job in reducing/minimizing the generated tables and for any non trivial grammar the lexer tables are big like for sql the final parser is around 800KB without managing case insensitive tokens if we make it case insensitive by adding rules like <CREATE: <['C','c']['R','r']['E','e']['A','a']['T','t']['E','e']> > then it becomes even bigger fast.

Access to an invalid object

While testing tscc with the https://github.com/Hadron67/tscc-compiler/blob/master/examples/zend/jzend_parser.y and intentionally commenting this line:

//%left UNARY

When trying to execute it on https://hadroncfy.com/tscc-compiler/web-demo/

Uncaught TypeError: Cannot read properties of undefined (reading 'pr')
    at Object.defineRulePr (tscc.js:2963:28)
    at jjdoReduction (tscc.js:5873:24)
    at jjtryReduce (tscc.js:6060:13)
    at jjacceptToken (tscc.js:6042:21)
    at jjdoLexAction (tscc.js:5300:36)
    at jjacceptChar (tscc.js:5388:21)
    at nextToken (tscc.js:4861:17)
    at Object.parse (tscc.js:4900:17)
    at yyparse (tscc.js:6237:12)
    at Object.compile (tscc.js:8480:17)

With this change it seems to work:

        else {
            var pt = _pseudoTokens[token.val];
            if(!pt){
                singlePosErr(`pseudo token "${token}" is not defined`, token);
            }
            else _top().pr = pt.pr; ///<<< adding `else` before 
        }

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.