bnfc / bnfc Goto Github PK

BNF Converter

Home Page: http://bnfc.digitalgrammars.com/

HTML 7.80% TeX 43.68% Makefile 0.28% CSS 0.03% C 0.31% PostScript 2.86% Haskell 33.29% Java 11.28% Shell 0.38% Dockerfile 0.07% Dhall 0.02%

bnfc parser-generator abstract-syntax-tree pretty-print lexer-generator grammar-specification bnf-converter bnf

bnfc's Introduction

The BNF Converter

What is the BNF Converter?

The BNF Converter (bnfc) is a compiler construction tool generating a compiler front-end from a Labelled BNF (LBNF) grammar. It is currently able to generate Haskell, Agda, C, C++, Java, and OCaml, as well as XML representations.

Given a LBNF grammar the tool produces:

an abstract syntax implementation
a case skeleton for the abstract syntax in the same language
an Alex, Ocamllex, JLex, or Flex lexer generator file
a Happy, Ocamlyacc, Menhir, ANTLR, CUP, or Bison parser generator file
a pretty-printer as a Haskell/Agda/C/C++/Java/Ocaml module
a Latex file containing a readable specification of the language

More information: http://bnfc.digitalgrammars.com/

Installation

Some binaries are available at https://github.com/BNFC/bnfc/releases. Installation from the Haskell sources is possible via stack or cabal.

Installation via stack (recommended)

You need a running installation of stack. To install and run the latest version of bnfc from stackage, enter at the command line:

  stack install BNFC
  bnfc --help

Installation via cabal

You need a running installation of a recent version of GHC and Cabal, most easily available via the GHCup. To install bnfc from hackage, enter at the command line:

  cabal install BNFC
  bnfc --help

Installing the development version

To install the development version of bnfc with the latest bugfixes (and regressions ;-)):

  git clone https://github.com/BNFC/bnfc.git
  cd bnfc/source

and then either

  cabal install

  stack install --stack-yaml stack-9.4.yaml

(replace 9.4 with your GHC major version, and if you want to build with your installed GHC then add flag --system-ghc).

Mini tutorial

Build a first parser in 5 min (Haskell backend):
1. In a fresh directory, prepare a grammar file Sum.cf with the following content:
```
EInt.  Exp ::= Integer;
EPlus. Exp ::= Exp "+" Integer;
```
2. Build a parser (in Haskell) with bnfc:
```
bnfc -d -m Sum.cf  &&  make
```
  The make step needs the Haskell compiler GHC, the lexer generator alex and the parser generator happy (all included in the GHC installation).
3. Inspect the generated files in directory Sum.
4. Test the parser.
```
echo "1 + 2 + 3" | Sum/Test
```

Try the C-family backends. (The prerequisites, GNU C(++) compiler (gcc / g++), lexer generator flex and parser generator bison, are usually present):

bnfc --c   -m -o sum-c   Sum.cf  &&  make -C sum-c    &&  echo "1 + 2 + 3" | sum-c/TestSum
bnfc --cpp -m -o sum-cpp Sum.cf  &&  make -C sum-cpp  &&  echo "1 + 2 + 3" | sum-cpp/TestSum

Try the other backends:

Option	Backend
`--java`	Requires Java, JLex or JFlex, and CUP.
`--java-antlr`	Requires ANTLR.
`--ocaml`	Requires OCaml, `ocamllex` and `ocamlyacc`.
`--ocaml-menhir`	Uses menhir instead of `ocamlyacc`.
`--agda`	Produces Agda bindings to the parser generated for Haskell.
`--pygments`	Produces a lexer definition for the Python highlighting suite Pygments.

Documentation

https://bnfc.readthedocs.org/en/latest/

Support

You can discuss with us issues around bnfc on our mailing list [email protected].

For current limitations of bnfc, or to report a new bug, please consult our issue tracker.

Contribute

Issue Tracker: https://github.com/BNFC/bnfc/issues
Source Code: https://github.com/BNFC/bnfc
Haskell coding style guide: https://github.com/andreasabel/haskell-style-guide/
Some pull request etiquette:
- Document, document, document! (See style guide)
- Include test cases that cover your feature.
- Include changelog entry.
- More etiquette: E.g. https://gist.github.com/mikepea/863f63d6e37281e329f8

License

The project is licensed under the BSD 3-clause license.

BNFC versions until 2.8.4 released under the GNU General Public License.

Example uses of the BNF Converter

In research:

NASA's OGMA tool uses LBNF for its grammars, e.g. for a subset of C 99.

In teaching:

Course Programming Language Technology at Chalmers / Gothenburg University.

bnfc's People

Contributors

Stargazers

Watchers

Forkers

mbenke gdetrez jeffpolk230 jyp jonas-hugo joecrayne bashi-bazouk coltsoftware josefs tel drhodes franklinchen robstewart57 simhu mortberg oliverbunting johnlato sanyaade-iot dragonli mortazazakeri cwelton adamse juodaspaulius ahnan4arch mukeshtiwari zhepingyang wictory cvardev dm606 gapag bitonic xoce51 olahol zcourts eelcovisser siwiwit nesrineob jmptrader detrevid johnjcamilleri pascalh tetsuo-jp dcavar hellocreativeworld mingewang ayberkt pthariensflame forflo benwiederhake ericfinster lsantos06 gdziadkiewicz teemperor ghbae liviust mahmoudfarouq poytr1 alanz svdev afeldman michocio adamm24680 mluszczyk jonasduregard mlazowik guardbotmk3 lexbailey tianyongchina dw4dev giuseppescimmietta codetriage-readme-bot andreasloow zoltus123 rthrs hckgit dmjio dredozubov juliac29 justinmeiners golovach-ivan wolf480pl pombredanne marinelli bvoq mbanaszek1 mkfilipiuk maciej-twardowski ravi050283 rabiet iomeone edvirt delihiros oscar2019 martinlofgren miketsukerman bernstein glcdibenedetto mkmkl93 amandasystems shanth2600

bnfc's Issues

Add support for in-language defined operators

I'm imagining something like Haskell's operators, where fixity and precedence are defined in the file.

I recognize that this is "wishlist" severity at best.

Makefile still building ps doc by default

The java makefile still build the doc by default

It should be a separate target so that, if you are not interested in the pdf doc, you can still run make without having to edit the makefile.
It should make use of the Common.Makefile module for that.

Get conflicts/ambiguities informations in bnfc directly

Cabal support for bnfc

Add support in cabal to build bnf files using bnfc (like what is done with alex and happy). This would, among other things, make bnfc own compilation easier.

Fix the alfa grammar (or delete?)

In one of the test cases, the language called alfa, the grammar doesn't look like it matches the example file. The case has been removed from the testsuite but it would be good to fix it or decide to drop it completely.

Options to choose makefile filename

Currently the only option relating to generating a makefile is -m, which generates a makefile with filename Makefile.

Could an option be added to allow the user to also optionally specify the name of the makefile, e.g.

bnfc -m --make-filename=MyMakefile --haskell Foo.cf

Use white-space as separator

A frequent question is how to use white-space as a separator, doing this:

separator A " ";

doesn't work and the answer is to use an empty string:

separator A "";

Maybe bnfc could accept " " as a separator and treat it like "".

Access to AST nodes' positions in the input file

In order to be able to provide better error messages to the users, we would like to say

Error E occured on line 25, col 55

However, currently the positions of AST nodes in the input file are not accessible from AST. How could we access the nodes' positons? How do AST node positions relate to token positions e.g.,

position token PIdent (letter (letter|digit|’_’|’\’’)*) ;

Code cleanup

Backend clean up
baseline for backends
things to deprecate in 2.5/2.6

merging back bnfc meta

Take advantage of DataKinds

It seems that generated code could, instead of using a bunch of empty data constructors for tags, have an enumeration which is lifted to the kind layer. For example, instead of

data Stmt_
data Expr_
type Stmt = Tree Stmt_
type Expr = Tree Expr_

data Tree :: * -> * where
[...]

have

data Tag = Stmt_ | Expr_
type Stmt = Tree Stmt_
type Expr = Tree Expr_

data Tree :: Tag -> * where
[...]

Adding compliant code examples to verbatim blocks in LaTeX

Feature request.

It would be great to be able to include in the LaTeX file a number of compliant code examples, to give a flavour of the language specified by the grammar. Could there be a special notation that could be used in a .cf file that would be ignored by bnfc with the exception of LaTeX generation, which would add each code example in to a verbatim block (if all rules are accepted by bnfc) ?

`entrypoint` directive does not work in Java

Using BNFC 2.5 .
With input the following katter.cf , with -java option:

Klas. Foo ::= "KlasKatt";
Hobbes. Foo ::= "Hobbes";
KrazyKat. Bar ::= "KrazyKat";
Findus. Bar ::= "Findus";

entrypoints Foo,Bar;

BNFC ignores the entrypoint Bar. Comments in the generated katter/Test.java mention the usage of the method pBar in the katter.parser class. This method does not exist.

In katter.parser :

  public katter.Absyn.Foo pFoo() throws Exception
  {
    java_cup.runtime.Symbol res = parse();
    return (katter.Absyn.Foo) res.value;
  }

exists.

Adding

  public katter.Absyn.Bar pBar() throws Exception
  {
    java_cup.runtime.Symbol res = parse();
    return (katter.Absyn.Bar) res.value;
  }

and modifying accordingly Test.java does not help. The parser recognizes only the first of the categories listed in entrypoints. In fact, substituting the last line in katter.cf with entrypoint Bar, Foo gives a specular problem.

Problems with some cf file names when using the C backend

If the cf file is named c.cf for instance, flex generates a file that conflicts with some function in the standart library (like calloc)

See this (old) discussion: http://lists.gnu.org/archive/html/help-flex/2003-03/msg00008.html

Grammar links on website http://bnfc.digitalgrammars.com/ are dead

And in https://github.com/BNFC/bnfc/tree/master/examples I do not see corresponding grammars also.

Investigate problems in haskell gadt backend

System test using the GADT backend exibit some problems (this is why they are disabled by default).

To enable and run the disabled tests do

cabal configure --enable-tests -fhaskell-gadt-tests && cabal build && cabal test

Compilation error in Java when a production uses more than one user-defined tokens

Consider the following multiple_token.cf grammar:

Label. Category ::= FIRST SECOND;
token FIRST 'a';
token SECOND 'b';

Issuing bnfc -m -java multiple_token.cf and then make results in compilation errors.

Reason: Fields in multiple_token/Absyn/Label.java mismatch those in other generated files. Excerpts follow:

package multiple_token.Absyn;

public class Label extends Category {
  public final String first_1, second_2;

and in multiple_token/ComposVisitor.java:

public Category visit(multiple_token.Absyn.Label p, A arg)
{
  String first_ = p.first_;
  String second_ = p.second_;

[ocaml] The Makefile shouldn't build the latex doc by default

`Lexing.from_channel` is deprecated

According to the doc, we should use Lexing.from_input instead.
http://batteries.forge.ocamlcore.org/doc.preview:batteries-beta1/html/api/Lexing.html#VALfrom_channel

Problem with multiples `rules` declaration with a common prefix

I am using the latest version of BNFC (2.5, March 2013) with to generate Java code.

Problem: a BNFC grammar cannot always have multiple rules macro over the same category.

Consider the following grammar:

rules Function ::= "a" "v" | "b" | "c";
rules Function ::= "a" "u" | "e" | "f";

running bnfc -java yields

ERROR: names not unique: Function_1

The automated label generation creates a clash.

Yet, when there is no ambiguity to be solved, it works:

rules Function ::= "v" | "b" | "c";
rules Function ::= "u" | "e" | "f";

If BNFC should accept any number of rules macro, then there should be a global counter for each category appearing in rules macro (now it appears to be reset for each rules macro).

Otherwise BNFC should prevent multiple rules macros on the same category in one grammar.

Layout resolver

What is the status of the layout resolver? It seems it has not been updated since three years ago. It seems to be only limited to having special keywords that open up the nested blocks. In our language, we don't have explicit keyword that mark the beginning of the block. Nesting is simply allowed after each declaration. For example,

abstract Person
   name : string
   spouse -> Person ?
       year : integer
   child -> Person *

Is resolving such layout possible?

new lexer backends (parallizable)

symbol "\n" displayed as a newline (or not displayed) instead of \n in the doc files

Hello,

I have the symbol "\n" in my BNF grammar and it seems it is printed as a newline (in the symbols section) or not displayed at all (in the grammar section) in the tex, ps, … reports.

I can provide an example if needed but I guess it is quite clear as it :)

C makefile still build latex doc by default

Java: use BigInteger and BigDecimal (?) for Integer and Double built-in tokens

BNFC 2.5., with -java option

Consider the following grammar:

Label. Category ::= Integer;

The generated parser accepts the string

1111111111

but fails in

11111111112

Reason: the parser parses the string into an java.lang.Integer object

Workaround: Use tokens, e.g.

token MY_DIGITS digit+;

and process afterwards the string.

Possible solution: Use a java.math.BigInteger field instead of java.lang.Integer, so that the information that the field is numeric (and integer) is preserved without bounds on its precision.

A similar problem arises with the grammar

Label. Category ::= Double;

In that case it is subtler because an overflowing number (wrt the ieee 64 bits double format) is converted into a java.lang.Double to the value Infinity. Therefore the parsing does not fail, but implicitly the semantics of ieee 64 bits floating-point numbers is given to the parsed string.

A solution coherent with the one about integers is to use java.math.BigDecimal instead of java.lang.Double

Option handling: should fail when given an erroneous option

When given an option that does not exists, like bnfc -zzz bnfc should exit with an error message. Right now it defaults to generating a parser using the haskell backend.

Factorizing the common part of the makefile

All formats should use the helpers in Common.Makefile which contains the latest version of the latex commands

Allow non-string user tokens

I would like to use the AST generated by BNFC throughout my language project. I see from the example of Java 1.1 that tokens which are not built into BNFC are always assumed to be strings. I would like to be able to parse a string like "123L" as a value 123 but I would like to be responsible for generating my own data value. I understand that this would make my tool backend-specific (although it's not hard to imagine writing one piece of code for each backend I want), but carrying an unparsed String value around everywhere that I'm using this AST type is far from ideal. Any thoughts?

Thanks! This tool is great; I hope I'll be able to use it.

Book testsuite

Create a test suite from all examples in the book to make sure that, if we were to break compatibility with the book, we'd know it.

Add support for bnf syntax in pygment

Pygment is used in many places to do syntax coloring, it would be nice to add support for bnfc

Allow custom character ranges in token definition

From the ML:

I would appreciate if own character ranges would be supported and not only the
predefined digit, letter, upper, lower character ranges or simple unions...
a use case could be accented characters for example...

OCaml Backend Does Not Insert Regexes

Token rules are omitted in generating the ocamllex files, and therefore cause lex failures at runtime.

Python backend

Scala back end?

Is there any interest in generating Scala? Obviously, this would simply involve modifying the existing Java back end to generate case classes with the intention of using the generated code in a way similar to using Haskell or OCaml generated code.

How to use whitespace as separator?

I want to write a grammar for a language where you can use both newlines and semi-colons to separate expressions:

# this is valid:
a()
b()
# so is this:
a();b()
# so is this:
a(
);b(
)

Any thoughts on how the labelled BNF grammar would look like?

Create a bnfc web service

Hackage warning: Exposed modules use unallocated top-level names

What top-level name shoule the CNF runtime use?

Convert latex & txt2tag to independant backends

It seems that the generated files are not used so often, that way we avoid generating useless files if they are not needed.

Java PrettyPrinter.print() method fails when terms exceed 5750 productions

Consider the following grammar

Atom. Category ::= "h";
NonAtom. Category ::= "o" Category;

I create a file to parse using the following python script

import sys

str = "o " * (int(sys.argv[1])-1)
f = open('input.txt','w')
f.write(str)
f.write("h")
f.flush()

where the argument tells how many productions are used for generating the string.
Then issuing on the shell

$ bnfc -java -m bug.cf && make && python inputgen.py 5750 && java bug.Test input.txt

works, but from that value on

$  bnfc -java -m bug.cf && make && python inputgen.py 5751 && java bug.Test input.txt

fails in delivering the linearized tree using the PrettyPrinter.print(Category) method:

[Linearized Tree]

At line 1, near "o o o o" :
     null

This does not occur in Haskell so I think it must be related with the Java specific facilities BNFC uses.

The actual length in character does not matter -- what matters is the number of applications of the nonterminal rule (the NonAtom one).

The last accepted number of rule applications is 5750.

Bug in the layout mechanism

In the generated layout processor, the cases for encountering a new line and a layout starting keyword are mutually exclusive.

I.E, given the following grammar:

layout "mutual" ;
Def.       Def ::= Ident ;
DefMutual. Def ::= "mutual" "{" [Def] "}" ;
separator Def ";" ;

the following is not accepted although it should be:

mutual
  foo
  mutual

Note that the following 2 examples are correctly accepted:

mutual
  foo
  bar

mutual
  foo ;
  mutual

(reported by Anders, Simon, Cyril on the bnfc mailing list)

happy -gca Calc/Par.y
unused terminals: 1
alex -g Calc/Lex.x
ghc --make Calc/Test.hs -o Calc/Test

Calc/Abs.hs:1:16: Warning:
    -fglasgow-exts is deprecated: Use individual extensions instead

Calc/ComposOp.hs:1:16: Warning:
    -fglasgow-exts is deprecated: Use individual extensions instead

Calc/Print.hs:1:16: Warning:
    -fglasgow-exts is deprecated: Use individual extensions instead
line-map.c: file "<command-line>" left but not entered
line-map.c: file "<command-line>" left but not entered
[3 of 8] Compiling Calc.Abs         ( Calc/Abs.hs, Calc/Abs.o )

Calc/Abs.hs:14:1:
    Illegal generalised algebraic data declaration for `Tree'
      (Use -XGADTs to allow GADTs)
    In the data declaration for `Tree'
make: *** [all] Error 1

CU. CompilationUnit ::= [PackageDecl] {ImportDeclaration} {TypeDeclaration}

giving rise to

data CompilationUnit = CU [PackageDecl] (Maybe ImportDeclaration) (Maybe TypeDeclaration)