GithubHelp home page GithubHelp logo

joelburget / lvca-hs Goto Github PK

View Code? Open in Web Editor NEW
4.0 5.0 0.0 657 KB

language verification, construction, and automation

License: BSD 3-Clause "New" or "Revised" License

Haskell 99.86% Makefile 0.14%
language programming-language parsing pretty-printing symbolic-execution

lvca-hs's Introduction

LVCA (pronounced "luca")

Introduction

LVCA is a tool for building programming languages. It has an intentionally small core. You create a language by specifying (1) its syntax, (2) its statics (ie typechecking rules), and (3) its dynamics (ie how it evaluates).

LVCA then provides tools:

  • parser
  • pretty-printer
  • interpreter
  • debugger

Things that don't yet exist but can and should:

  • An automatic typechecker
    • I'd like to have at least two versions of this. Ideally, all of your typechecking rules are specified in a bidirectional style, which gives us an algorithm for typechecking. Failing that, rules might be specified in an ad-hoc way and solved via an SMT solver.
  • Automatic serialization (to JSON, cbor, or some other format)
  • Relatedly, content-identifiation
  • Language-server protocol implementation

Example

First we define the abstract syntax of the language. This simple language only has booleans and functions.

tm :=
  // a term can be a simple boolean literal
  | true()
  | false()

  // ... or a type-annotated term (holding a tm and a ty)
  | annot(tm; ty)

  // ... an if-then-else
  | ite(tm; tm; tm)

  // ... or a function or function application. Note the `tm. tm` syntax means
  // that we bind a `tm` in the body of the function (also a `tm`). Contrast
  // with `tm; tm` which means there are two `tm` children.
  | lam(tm. tm)
  | app(tm; tm)

ty :=
  // a type is either a bool
  | bool()
  // or an arrow between two types
  | arr(ty; ty)

The syntax we're using here comes from Robert Harper's Practical Foundations for Programming Languages.

Next we define the typechecking rules for this language. We support expressing typing rules in a bidirectional style. I'd like to add support for unification in the future.


----------------------- (bool intro 1)
ctx |- true() => bool()

------------------------ (bool intro 2)
ctx |- false() => bool()

      ctx |- tm <= ty
-------------------------- (annot)
ctx |- annot(tm; ty) => ty

ctx |- t1 <= bool()  ctx |- t2 <= ty  ctx |- t3 <= ty
----------------------------------------------------- (bool elim)
           ctx |- ite(t1; t2; t3) <= ty

    ctx, x : ty1 |- tm <= ty2
---------------------------------- (lam intro)
ctx |- lam(x. tm) <= arr(ty1; ty2)

ctx |- tm1 => arr(ty1; ty2)  ctx |- tm2 <= ty1
---------------------------------------------- (lam elim)
        ctx |- app(tm1; tm2) => ty2

// important: this rule must go last or else it will subsume all others
ctx |- tm => ty
--------------- (switch)
ctx |- tm <= ty

Lastly, we define the denotational semantics of the language.

[[ true()          ]] = true()
[[ false()         ]] = false()
[[ annot(tm; ty)   ]] = [[ tm ]]
[[ ite(t1; t2; t3) ]] = case([[ t1 ]]; true() -> [[ t2 ]]; false() -> [[ t3 ]])
[[ lam(x. body)    ]] = lam(x. [[ body ]])
[[ app(fun; arg)   ]] = app([[ fun ]]; [[ arg ]])

Given all of these pieces, we can automatically produce an interpreter that typechecks and evaluates expressions.

Meaning of the name

  1. LVCA is an acronym for Language Verification, Construction, and Automation

  2. In biology, LUCA stands for Last Universal Common Ancestor -- the most recent common ancestor of all life on earth. LVCA occupies a somewhat analogous position as it can be used to implement any programming language.

lvca-hs's People

Contributors

joelburget avatar

Stargazers

 avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

lvca-hs's Issues

parsing + pretty-printing literature

Literature review on declaring a parser and a pretty-printer at the same time.

Declarative Specification of Indentation Rules (2018)

Example 1

  • source code

    do stm1
       do stm2
          stm3
    
  • layout constraints

    Exp.Do = "do" stmts:Stmt+
      {layout(align-list stmts)}
    

Example 2

  • source code

    do e1
       + e2
    
  • layout constraints

    Stmt.OffsideExp = exp:Exp
      {layout(offside exp)}
    

This paper basically handles concerns like alignment and haskell offside rule using constraints on named subtrees. For pretty-printing it uses the Box language (they cite Declarative Specification of
Template-Based Textual Editors
but I believe this is originally from Syn) with vertical, horizontal, and z-composition.

This mentions SDF3 as a system to look into.

Transforming this box into a string and parsing that string results in a syntax error, since the statements inside the do-expression do not start at the same column. To ensure correct use of layout in the pretty-printed string, we apply a layout fixer that traverses the boxes and fixes the indentation where necessary.

This layout fixer is unspecified?

Invertible Syntax Descriptions: Unifying Parsing and Pretty Printing (2010)

This is all built on partial isomorphisms. What are partial isomorphisms?

A partial isomorphism between a and b is represented as a pair of functions f :: a -> Maybe b and g :: b -> Maybe a so that if f a returns Just b, g b returns Just a and the other way around.

A more modern interpretation is found in semi-iso : SemiIso:

type SemiIso s t a b 
  = forall p f. 
    (Exposed (Either String) p, Traversable f) 
  => p a (f b) -> p s (f t)
apply i   >=> unapply i >=> apply i   = apply i
unapply i >=> apply i   >=> unapply i = unapply i

Every Prism is a SemiIso. Every Iso is a Prism.

What's the relationship between this and an adjunction?

discussion

Examples:

many :: Syntax d => d a -> d [a]
many p
  = nil <$> pure ()
  <|> cons <$> p <*> many p

integer :: Syntax d => d Integer
integer = Iso read' show' <$> many digit where
  read' s = case [x | (x, "") <- read s] of
    []    -> Nothing
    (x:_) -> Just x

  show' x = Just (show x)

ifzero = keyword "ifzero"
   *> optSpace *> parens expression
  <*> optSpace *> parens expression
  <*> optSpace *> keyword "else"
   *> optSpace *> parens expression

There's an implementation in Haskell which is apparently no longer maintained (pavelchristof/syntax#3)

FliPpr: A Prettier Invertible Printing System (2013)

In this paper the user provides a pretty-printer ("slightly annotated with some additional information for parsing"), from which the parser is derived.

To annotate pretty-printers with information about non-pretty layouts, we introduce the choice operator <+. In pretty-printing the operator behaves as e1 <+ e2 = e1, ignoring the non-pretty alternative e2; in parser derivation the operator is interpreted as a nondeterministic choice, which accepts both branches.

Example:

data E = One | Sub E E

nil = text "" <+ space -- zero or more whitespaces in parsing
space = (text " " <+ text "\n") <> nil -- one or more whitespaces in parsing

ppr x = ppr_ x <+ text "(" <> nil <> ppr x <> nil <> text ")"

ppr_ One = text "1"
ppr_ (Sub e1 e2) = group (ppr e1 <> nest 2 (line' <> text "-" <> space' <> pprP e2))

pprP = pprP_ x <+ text "(" <> nil <> pprP x <> nil <> text ")"

pprP_ One = text "1"
pprP_ (Sub e1 e2) = text "(" <> nil <> group (ppr e1 <> nest 2 (line' <> text "-" <> space' <> pprP e2)) <> nil <> text ")"

space' = space <+ text ""
line' = line <+ text ""

This paper also points at the functional lens literature, in particular quotient lenses

Embedded parser generators (2011)

Both embedded (parser generators) and (embedded parser) generators.

We use a variant of BNF called Labeled BNF (LBNF). In LBNF each production rule represents a single choice (there is no vertical bar operator) and each production rule carries a descriptive label.

BNF

Foo ::= "Foo!" Foo
    | Bar

Bar ::= "Bar."

Labeled BNF

FooCons. Foo ::= "Foo!" Foo;
FooNill. Foo ::= Bar;
BarDot.  Bar ::= "Bar.";

The primary advantage of LBNF is that a system of algebraic data types can be extracted from the rules by using categories as types and labels as constructors.

This paper basically ignores pretty-printing; only talking about parsing.

Other papers

Correct-by-construction pretty-printing

Zephyr (1997)

ASDL represents a simple and powerful subset of ASN.1

stm = Compound(stm head, stm next)
    | Assign(identifier id, exp exp)
    | Print(exp* args)
exp = Id(identifier id)
    | Num(int v)
    | Op(exp lval, binop bop, exp rval)
binop = Plus | Minus | Times | Div

Doesn't talk about parsing / pretty-printing but does include a nice AST browser.

Syn (Richard J Boulton) (1996)

Kind checking

Add support for kind checking. Something like this should be rejected as f has an inconsistent kind:

funny f := funny(f; f integer)

Token parsing

I'd like to provide, as an alternative to the current (character) parser, a tokenizing parser. I'm picturing something like using happy and alex together (another example).

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.