GithubHelp home page GithubHelp logo

assignment1's Introduction

CS294-141: Building your first compiler

For the next few weeks, we will be working through a project to build a basic compiler. In doing this, we will gain experience with several useful tools for practical compiler development (namely, the Python AST library for embedded front-end development, and LLVM for back-end code generation). Along the way we will also encounter common data structures, patterns, and idioms in building compilers and DSLs.

What are we going to build?

We will build a simple compiler from a subset of Python to optimized machine code. If you are familiar with Numba, you can think of this assignment as building our own toy Numba JIT. With that, we will be able to write code like the following:

@Compile
def square(x : int) -> int:
    return x*x

Using the @Compile decorator will invoke our compiler on the square function, parsing, analyzing, JIT compiling, and ultimately replacing it with native machine code.

Background

If you are not very familiar with Python, the @Compile syntax may look magic or unclear. Don't worry, like most things in Python, it's actually very simple: @Compile is just a "decorator" we will define to package up our functionality. Decorators are just syntactic sugar for applying a higher-order function (a function that takes and returns other functions, in this case Compile) to another function (in this case, square) immediately after its definition, and replacing that definition with the result of the higher-order function applied to it. In other words, this is just syntactic sugar for:

# define square just like any other Python function:
def square(x : int) -> int:
    return x*x

# replace the definition of square with a compiled version of itself:
square = Compile(square)

If you are familiar with Python and that code still looks strange to you, it's probably because of the type annotations. Yes, this is real, vanilla Python, it's just using a very new addition to the syntax called "type hints" (we will be using Python 3.6, which is the first version where type hints are officially standard).

PART 1: THE FRONT-END
Or, parsing Python into our own IR

To begin, we need a way to turn actual Python code into something we can work with. This will be our job for the first assignment.

Generally, the front-end of a compiler is responsible for mapping from raw input into an Abstract Syntax Tree (AST), an Intermediate Representation (IR) which corresponds more cleanly to the logical level at which we want to think about user code. (The "abstract" name distinguishes an AST from a literal parse tree. An AST is generally simplified and normalized beyond the raw output of a parser in external languages, or from operator traces or other information in embedded languages.)

Python is a nice platform for building language extensions because the standard library includes rich tools for parsing, representing, and manipulating the complete Python syntax within the language itself. Using this, Python programs can relatively represent and manipulate their own ASTs.

For our compiler, we are only going to worry about a simplified subset of Python. Specifically, we're only going to compile constructs which trivially map into the following simple IR:

Expr = BinOp(Bop op, Expr left, Expr right)
     | CmpOp(Cop op, Expr left, Expr right)
     | UnOp(Uop op, Expr e)
     | Ref(Str name, Expr? index)
     | FloatConst(float val)
     | IntConst(int val)

Uop = Neg | Not
Bop = Add | Sub | Mul | Div | Mod | And | Or
Cop = EQ | NE | LT | GT | LE | GE

Stmt = Assign(Ref ref, Expr val)
     | Block(Stmt* body)
     | If(Expr cond, Stmt body, Stmt? elseBody)
     | For(Str var, Expr min, Expr max, Stmt body)
     | Return(Expr val)
	 | FuncDef(Str name, Str* args, Stmt body)

This is a description of our intermediate representation as an algebraic data type. You should read this as "An Expr[ession] can be either a Bin[ary]Op, containing an op and left and right child Exprs, or a CmpOp, containing..., or an Int[eger]Const[ant] containing an int value." The ? suffix means that a field is optional (so a Ref[erence] has an optional index expression for when it is referencing an element of an array rather than a scalar variable), while a * suffix means that it is a list of 0 or more items of that type. (You will see similar notation, written in an actual data structure generation DSL called ASDL, in the official documentation of the Python AST.) ADTs are an especially useful notation when describing tree-structured data, where each tree node can be one of many types, as is usually the case in compiler IRs.

This IR is similar in structure to many imperative languages like C or Python: it has separate notions of "expressions" (trees of basic math operations, reads from variables, and constants) and "statements" (the top-level, sequentially-ordered operations which are delimited by semicolons in C or separate lines in Python). A statement can Assign an expression to a variable reference, Return a value, perform an If/else branch or a For loop over more statements, or encapsulate a list of potentially many statements in a Block (like the contents of a pair of braces { } in C).

The corresponding data structures are implemented in the code as a set of classes based on the Python base ast.AST node type. We use this not because we're going to intermingle our IR nodes with the original Python AST nodes, but because ast.AST embodies a nice, simple design pattern for representing these kinds of ADTs by simply declaring the list of fields we want each class to have, as well as utility libraries for recursively traversing the AST using the visitor pattern, namely the ast.NodeVisitor. They also integrate naturally with the astor AST utility library, which includes useful utilities for pretty-printing Python ASTs (astor.dump), and re-generating Python code from an AST.

Here's a quick overview of what you can do with IR nodes based on ast.AST:

>>> import ast, astor
>>> class MyIRNode(ast.AST):
...   _fields = ('foo', 'bar')
...   

# construct with no fields assigned yet
# fields become attributes of the node object
>>> a = MyIRNode()
>>> a.foo = 'hi'
>>> a.bar = 'bye'

# use astor to pretty-print the IR
>>> astor.dump(a)
"MyIRNode(foo='hi', bar='bye')"

>>> b = MyIRNode('left', 'right') # construct fields by positional order
>>> c = MyIRNode(foo=b, bar=a)    # construct fields by name

# c has a and b as child nodes:
>>> astor.dump(c)
"MyIRNode(foo=MyIRNode(foo='left', bar='right'), bar=MyIRNode(foo='hi', bar='bye'))"

# Build a custom IR visitor.
# For each node they visit, NodeVisitors try to dispatch to a method named
# visit_[NodeClassName]. If it doesn't exist, they fall back to generic_visit.
>>> class MyIRVisitor(ast.NodeVisitor):
...   def visit_MyIRNode(self, node):
...     return str.format("MyIRNode(foo={}, bar={})",
...                       self.visit(node.foo),
...                       self.visit(node.bar))
...   def generic_visit(self, node):
...     return str(node)
...     
>>> MyIRVisitor().visit(c)
'MyIRNode(foo=MyIRNode(foo=left, bar=right), bar=MyIRNode(foo=hi, bar=bye))'

Setup

  • Install Python 3.6. I highly recommend Anaconda, on which I will build a distribution for later parts of this project, but for part 1 most versions of Python 3 should work fine.
  • Install astor: pip install astor.
  • Fork and clone your own copy of this repository.
  • If you're rather new to Python and aren’t already familiar with it, I highly recommend IPython: conda install ipython or pip install ipython to be sure you have it. It includes both the now-popular Jupyter Notebook interactive web interface, and just a much better REPL for interactively writing and testing Python on the command line. Run ipython in your shell. Experience the wonders of tab completion, syntax highlighting, shell history, etc. Run ipython notebook to get the web notebook interface.

Step 1: Converting Python ASTs into our IR

Our first task is to translate Python code into our simplified IR. We will do this by building an ast.NodeVisitor to recursively walk over a piece of Python AST and construct the corresponding simplified IR. The skeleton for this is defined in the PythonToSimple class in the starter code.

Running python compiler.py executes this on a trivial function using the simple test code at the bottom. The starter code implements just the bare minimum necessary to translate the function:

def trivial():
    return 5

Your first job is to complete the PythonToSimple visitor to translate any reasonably expressible Python.

I haven't yet provided any additional tests, but you can extend this arbitrarily to write and test your own richer examples.

To understand the Python AST, and how to build ast.NodeVisitors, I highly recommend not only the official module documentation, but also the GreenTreeSnakes unofficial documentation.

Bonus: add explicit NotImplemented exceptions when attempting to translate non-translatable Python IR.

Step 2: Executing our IR (PENDING)

To test this, and to exercise our skills working with tree-structured IRs, we will also build a simple interpreter for our IR, which we can execute and compare with the original.

Submission (PENDING)

You should submit your code by forking this repository on GitHub. Precise instructions will follow.

assignment1's People

Contributors

jrk avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.