GithubHelp home page GithubHelp logo

bnfc / bnfc Goto Github PK

View Code? Open in Web Editor NEW
567.0 19.0 160.0 7.77 MB

BNF Converter

Home Page: http://bnfc.digitalgrammars.com/

HTML 7.80% TeX 43.68% Makefile 0.28% CSS 0.03% C 0.31% PostScript 2.86% Haskell 33.29% Java 11.28% Shell 0.38% Dockerfile 0.07% Dhall 0.02%
bnfc parser-generator abstract-syntax-tree pretty-print lexer-generator grammar-specification bnf-converter bnf

bnfc's Introduction

Hackage version BNFC on Stackage Nightly Stackage LTS version Build status Documentation status

The BNF Converter

What is the BNF Converter?

The BNF Converter (bnfc) is a compiler construction tool generating a compiler front-end from a Labelled BNF (LBNF) grammar. It is currently able to generate Haskell, Agda, C, C++, Java, and OCaml, as well as XML representations.

Given a LBNF grammar the tool produces:

  • an abstract syntax implementation
  • a case skeleton for the abstract syntax in the same language
  • an Alex, Ocamllex, JLex, or Flex lexer generator file
  • a Happy, Ocamlyacc, Menhir, ANTLR, CUP, or Bison parser generator file
  • a pretty-printer as a Haskell/Agda/C/C++/Java/Ocaml module
  • a Latex file containing a readable specification of the language

More information: http://bnfc.digitalgrammars.com/

Installation

Some binaries are available at https://github.com/BNFC/bnfc/releases. Installation from the Haskell sources is possible via stack or cabal.

Installation via stack (recommended)

You need a running installation of stack. To install and run the latest version of bnfc from stackage, enter at the command line:

  stack install BNFC
  bnfc --help

Installation via cabal

You need a running installation of a recent version of GHC and Cabal, most easily available via the GHCup. To install bnfc from hackage, enter at the command line:

  cabal install BNFC
  bnfc --help

Installing the development version

To install the development version of bnfc with the latest bugfixes (and regressions ;-)):

  git clone https://github.com/BNFC/bnfc.git
  cd bnfc/source

and then either

  cabal install

or

  stack install --stack-yaml stack-9.4.yaml

(replace 9.4 with your GHC major version, and if you want to build with your installed GHC then add flag --system-ghc).

Mini tutorial

  • Build a first parser in 5 min (Haskell backend):

    1. In a fresh directory, prepare a grammar file Sum.cf with the following content:

      EInt.  Exp ::= Integer;
      EPlus. Exp ::= Exp "+" Integer;
      
    2. Build a parser (in Haskell) with bnfc:

      bnfc -d -m Sum.cf  &&  make
      

      The make step needs the Haskell compiler GHC, the lexer generator alex and the parser generator happy (all included in the GHC installation).

    3. Inspect the generated files in directory Sum.

    4. Test the parser.

      echo "1 + 2 + 3" | Sum/Test
      
  • Try the C-family backends. (The prerequisites, GNU C(++) compiler (gcc / g++), lexer generator flex and parser generator bison, are usually present):

    bnfc --c   -m -o sum-c   Sum.cf  &&  make -C sum-c    &&  echo "1 + 2 + 3" | sum-c/TestSum
    bnfc --cpp -m -o sum-cpp Sum.cf  &&  make -C sum-cpp  &&  echo "1 + 2 + 3" | sum-cpp/TestSum
    
  • Try the other backends:

    Option Backend
    --java Requires Java, JLex or JFlex, and CUP.
    --java-antlr Requires ANTLR.
    --ocaml Requires OCaml, ocamllex and ocamlyacc.
    --ocaml-menhir Uses menhir instead of ocamlyacc.
    --agda Produces Agda bindings to the parser generated for Haskell.
    --pygments Produces a lexer definition for the Python highlighting suite Pygments.

Documentation

https://bnfc.readthedocs.org/en/latest/

Support

You can discuss with us issues around bnfc on our mailing list [email protected].

For current limitations of bnfc, or to report a new bug, please consult our issue tracker.

Contribute

License

The project is licensed under the BSD 3-clause license.

BNFC versions until 2.8.4 released under the GNU General Public License.

Example uses of the BNF Converter

In research:

In teaching:

bnfc's People

Contributors

aarneranta avatar adamse avatar andreasabel avatar andreasloow avatar bafain avatar benwiederhake avatar bitonic avatar chaserhkj avatar commelina avatar dependabot[bot] avatar dm606 avatar enedil avatar felixonmars avatar forflo avatar gdetrez avatar grosa1 avatar iteratee avatar ivanperez-keera avatar janclarin avatar jeffpolk230 avatar jonasduregard avatar jyp avatar lexbailey avatar meemsbror avatar pascalh avatar robstewart57 avatar rpglover64 avatar simhu avatar teemperor avatar vbeatrice avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

bnfc's Issues

The java makefile still build the doc by default

It should be a separate target so that, if you are not interested in the pdf doc, you can still run make without having to edit the makefile.
It should make use of the Common.Makefile module for that.

Cabal support for bnfc

Add support in cabal to build bnf files using bnfc (like what is done with alex and happy). This would, among other things, make bnfc own compilation easier.

Fix the alfa grammar (or delete?)

In one of the test cases, the language called alfa, the grammar doesn't look like it matches the example file. The case has been removed from the testsuite but it would be good to fix it or decide to drop it completely.

Options to choose makefile filename

Currently the only option relating to generating a makefile is -m, which generates a makefile with filename Makefile.

Could an option be added to allow the user to also optionally specify the name of the makefile, e.g.

bnfc -m --make-filename=MyMakefile --haskell Foo.cf

Use white-space as separator

A frequent question is how to use white-space as a separator, doing this:

separator A " ";

doesn't work and the answer is to use an empty string:

separator A "";

Maybe bnfc could accept " " as a separator and treat it like "".

Access to AST nodes' positions in the input file

In order to be able to provide better error messages to the users, we would like to say

Error E occured on line 25, col 55

However, currently the positions of AST nodes in the input file are not accessible from AST. How could we access the nodes' positons? How do AST node positions relate to token positions e.g.,

position token PIdent (letter (letter|digit|’_’|’\’’)*) ;

Code cleanup

  • Backend clean up
  • baseline for backends
  • things to deprecate in 2.5/2.6

Take advantage of DataKinds

It seems that generated code could, instead of using a bunch of empty data constructors for tags, have an enumeration which is lifted to the kind layer. For example, instead of

data Stmt_
data Expr_
type Stmt = Tree Stmt_
type Expr = Tree Expr_

data Tree :: * -> * where
[...]

have

data Tag = Stmt_ | Expr_
type Stmt = Tree Stmt_
type Expr = Tree Expr_

data Tree :: Tag -> * where
[...]

Adding compliant code examples to verbatim blocks in LaTeX

Feature request.

It would be great to be able to include in the LaTeX file a number of compliant code examples, to give a flavour of the language specified by the grammar. Could there be a special notation that could be used in a .cf file that would be ignored by bnfc with the exception of LaTeX generation, which would add each code example in to a verbatim block (if all rules are accepted by bnfc) ?

`entrypoint` directive does not work in Java

Using BNFC 2.5 .
With input the following katter.cf , with -java option:

Klas. Foo ::= "KlasKatt";
Hobbes. Foo ::= "Hobbes";
KrazyKat. Bar ::= "KrazyKat";
Findus. Bar ::= "Findus";

entrypoints Foo,Bar;

BNFC ignores the entrypoint Bar. Comments in the generated katter/Test.java mention the usage of the method pBar in the katter.parser class. This method does not exist.

In katter.parser :

  public katter.Absyn.Foo pFoo() throws Exception
  {
    java_cup.runtime.Symbol res = parse();
    return (katter.Absyn.Foo) res.value;
  }

exists.

Adding

  public katter.Absyn.Bar pBar() throws Exception
  {
    java_cup.runtime.Symbol res = parse();
    return (katter.Absyn.Bar) res.value;
  }

and modifying accordingly Test.java does not help. The parser recognizes only the first of the categories listed in entrypoints. In fact, substituting the last line in katter.cf with entrypoint Bar, Foo gives a specular problem.

Investigate problems in haskell gadt backend

System test using the GADT backend exibit some problems (this is why they are disabled by default).

To enable and run the disabled tests do

cabal configure --enable-tests -fhaskell-gadt-tests && cabal build && cabal test

Compilation error in Java when a production uses more than one user-defined tokens

Consider the following multiple_token.cf grammar:

Label. Category ::= FIRST SECOND;
token FIRST 'a';
token SECOND 'b';

Issuing bnfc -m -java multiple_token.cf and then make results in compilation errors.

Reason: Fields in multiple_token/Absyn/Label.java mismatch those in other generated files. Excerpts follow:

package multiple_token.Absyn;

public class Label extends Category {
  public final String first_1, second_2;

and in multiple_token/ComposVisitor.java:

public Category visit(multiple_token.Absyn.Label p, A arg)
{
  String first_ = p.first_;
  String second_ = p.second_;

Problem with multiples `rules` declaration with a common prefix

I am using the latest version of BNFC (2.5, March 2013) with to generate Java code.

Problem: a BNFC grammar cannot always have multiple rules macro over the same category.

Consider the following grammar:

rules Function ::= "a" "v" | "b" | "c";
rules Function ::= "a" "u" | "e" | "f";

running bnfc -java yields

ERROR: names not unique: Function_1

The automated label generation creates a clash.

Yet, when there is no ambiguity to be solved, it works:

rules Function ::= "v" | "b" | "c";
rules Function ::= "u" | "e" | "f";

If BNFC should accept any number of rules macro, then there should be a global counter for each category appearing in rules macro (now it appears to be reset for each rules macro).

Otherwise BNFC should prevent multiple rules macros on the same category in one grammar.

Layout resolver

What is the status of the layout resolver? It seems it has not been updated since three years ago. It seems to be only limited to having special keywords that open up the nested blocks. In our language, we don't have explicit keyword that mark the beginning of the block. Nesting is simply allowed after each declaration. For example,

abstract Person
   name : string
   spouse -> Person ?
       year : integer
   child -> Person *

Is resolving such layout possible?

Java: use BigInteger and BigDecimal (?) for Integer and Double built-in tokens

BNFC 2.5., with -java option

Consider the following grammar:

Label. Category ::= Integer;

The generated parser accepts the string

1111111111

but fails in

11111111112

Reason: the parser parses the string into an java.lang.Integer object

Workaround: Use tokens, e.g.

token MY_DIGITS digit+;

and process afterwards the string.

Possible solution: Use a java.math.BigInteger field instead of java.lang.Integer, so that the information that the field is numeric (and integer) is preserved without bounds on its precision.

A similar problem arises with the grammar

Label. Category ::= Double;

In that case it is subtler because an overflowing number (wrt the ieee 64 bits double format) is converted into a java.lang.Double to the value Infinity. Therefore the parsing does not fail, but implicitly the semantics of ieee 64 bits floating-point numbers is given to the parsed string.

A solution coherent with the one about integers is to use java.math.BigDecimal instead of java.lang.Double

Allow non-string user tokens

I would like to use the AST generated by BNFC throughout my language project. I see from the example of Java 1.1 that tokens which are not built into BNFC are always assumed to be strings. I would like to be able to parse a string like "123L" as a value 123 but I would like to be responsible for generating my own data value. I understand that this would make my tool backend-specific (although it's not hard to imagine writing one piece of code for each backend I want), but carrying an unparsed String value around everywhere that I'm using this AST type is far from ideal. Any thoughts?

Thanks! This tool is great; I hope I'll be able to use it.

Book testsuite

Create a test suite from all examples in the book to make sure that, if we were to break compatibility with the book, we'd know it.

Allow custom character ranges in token definition

From the ML:

I would appreciate if own character ranges would be supported and not only the
predefined digit, letter, upper, lower character ranges or simple unions...
a use case could be accented characters for example...

Scala back end?

Is there any interest in generating Scala? Obviously, this would simply involve modifying the existing Java back end to generate case classes with the intention of using the generated code in a way similar to using Haskell or OCaml generated code.

How to use whitespace as separator?

I want to write a grammar for a language where you can use both newlines and semi-colons to separate expressions:

# this is valid:
a()
b()
# so is this:
a();b()
# so is this:
a(
);b(
)

Any thoughts on how the labelled BNF grammar would look like?

Java PrettyPrinter.print() method fails when terms exceed 5750 productions

Consider the following grammar

Atom. Category ::= "h";
NonAtom. Category ::= "o" Category;

I create a file to parse using the following python script

import sys

str = "o " * (int(sys.argv[1])-1)
f = open('input.txt','w')
f.write(str)
f.write("h")
f.flush()

where the argument tells how many productions are used for generating the string.
Then issuing on the shell

$ bnfc -java -m bug.cf && make && python inputgen.py 5750 && java bug.Test input.txt 

works, but from that value on

$  bnfc -java -m bug.cf && make && python inputgen.py 5751 && java bug.Test input.txt

fails in delivering the linearized tree using the PrettyPrinter.print(Category) method:

[Linearized Tree]

At line 1, near "o o o o" :
     null

This does not occur in Haskell so I think it must be related with the Java specific facilities BNFC uses.

The actual length in character does not matter -- what matters is the number of applications of the nonterminal rule (the NonAtom one).

The last accepted number of rule applications is 5750.

Bug in the layout mechanism

In the generated layout processor, the cases for encountering a new line and a layout starting keyword are mutually exclusive.

I.E, given the following grammar:

layout "mutual" ;
Def.       Def ::= Ident ;
DefMutual. Def ::= "mutual" "{" [Def] "}" ;
separator Def ";" ;

the following is not accepted although it should be:

mutual
  foo
  mutual

Note that the following 2 examples are correctly accepted:

mutual
  foo
  bar

&

mutual
  foo ;
  mutual

(reported by Anders, Simon, Cyril on the bnfc mailing list)

With -d option XML module is not generated inside the directorty

Following function does not generate Modname/XML.hs, instead it generate XMLModname.hs. This creates a build error.

makeXML :: FilePath -> Coding -> CF -> IO ()
makeXML name typ cf = do
writeFileRep (name ++ ".dtd") $ cf2DTD typ name cf
let absmod = "XML" ++ name
writeFileRep (absmod ++ ".hs") $ cf2XMLPrinter typ name absmod cf

Glasgow extentions no longer sufficient

Running bnfc -m -gadt -d Calc.cf generates files with the pragma {-# OPTIONS_GHC -fglasgow-exts #-}. Running make (without the document commands) produces the following output and fails:

happy -gca Calc/Par.y
unused terminals: 1
alex -g Calc/Lex.x
ghc --make Calc/Test.hs -o Calc/Test

Calc/Abs.hs:1:16: Warning:
    -fglasgow-exts is deprecated: Use individual extensions instead

Calc/ComposOp.hs:1:16: Warning:
    -fglasgow-exts is deprecated: Use individual extensions instead

Calc/Print.hs:1:16: Warning:
    -fglasgow-exts is deprecated: Use individual extensions instead
line-map.c: file "<command-line>" left but not entered
line-map.c: file "<command-line>" left but not entered
[3 of 8] Compiling Calc.Abs         ( Calc/Abs.hs, Calc/Abs.o )

Calc/Abs.hs:14:1:
    Illegal generalised algebraic data declaration for `Tree'
      (Use -XGADTs to allow GADTs)
    In the data declaration for `Tree'
make: *** [all] Error 1

Support for Alex 3.x?

I would very much like to use BNFC with alex version 3.x. Is it hard to add support for the latest version of that tool?

Polymorphic Maybe [Feature Request]

To make something like this possible :

CU. CompilationUnit ::= [PackageDecl] {ImportDeclaration} {TypeDeclaration}

giving rise to

data CompilationUnit = CU [PackageDecl] (Maybe ImportDeclaration) (Maybe TypeDeclaration)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.