GithubHelp home page GithubHelp logo

pycoolc's People

Contributors

aalhour avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

pycoolc's Issues

Implement Code Optimisation Stage, Part 1 - Local Optimisation

Code Optimization

Three main types in Compiler Research:

  1. Local Optimization: Basic Blocks level.
  2. Regional Optimization - won't be implemented.
  3. Global Optimization.
  4. Inter-Procedural Optimization - won't be supported.

Local Optimisation:

  • Algebraic Expressions Simplification:

     x = x + 0    =>   x = x
     x = x * 1    =>   x = x
     x = x * 0    =>   x = 0
     x = x + 8    =>   x = x << 3
     x = x ** x   =>   x = x * x
    
  • Constant Folding:

      a = 2 * 3   =>   a = 6
      b = 1 + 2   =>   b = 3
    
  • Dead Code Elimination. Dead code is code that does't contribute to the program's result.

  • Peephole Optimisations.

  • SSA-based Optimisations:

    • Ensure registers (or variables in IR) are assigned once, in all computations.
    • Common Subexpression Elimination:
     a = x * x
     ...
     d = x * x    =>   d = a
    
    • Copy Propagation:

      b = x       =>    gets eliminated and all references changed to "x"
      c = b * 2   =>    c = x * 2
      
    • Constant Propagation:

      a = 2       =>   *eliminated*
      b = a + 2   =>   b = 2 + 2   =>    b = 4
      c = b * 3   =>   c = 4 * 3   =>    c = 12       
      

Implement Parser using PLY.

Parser Implementation Notes:

  • Design the Parser as an isolated component.
  • Bind the Parser component to the Lexer and use Lexer output as input for syntactic analysis.
  • Implement the Abstract Syntax Tree. Design the nodes as Classes.
  • Implement Syntax Analysis using PLY for LALR parser generation.
  • Parser will generate a complete Abstract Syntax Tree representation of the input COOL program.

Unable to compile

I followed this readme but I cannot compile hello_world.cl file. I am using python 3.8 with a virutal env(conda). when i run the command to compile and generate asm file, nothing happens
Screen Shot 2020-09-22 at 11 35 49 AM
.

Implement Lexer using PLY.

Lexer Implementation Notes:

  • Design the Lexer as an isolated component.
  • Implement Lexical analysis using PLY to tokenize COOL programs source code.
  • Return list of tokens for every input program source code.

Lexer Improvements

  • Convert single character tokens from regular expressions to a list literals, PLY already provides the functionality of lexing literals: http://www.dabeaz.com/ply/ply.html#ply_nn11.
  • Refactor token() to save the last returned token in the instance, and then return it rather than acting as a transparent proxy wrapper around the self.lexer.token() method.
  • Implement a customized LexError class that prints meaningful messages if a lexing error happened with the illegal or error-causing token, it's line and lex position.

Implement Semantic Analysis Stage

Semantic Analysis

Checks

  1. All identifiers are declared.
  2. Types.
  3. Inheritance relationships.
  4. Classes defined only once.
  5. Methods in a class defined only once.
  6. Reserved identifiers are not misused.

Scope

Identifier Bindings:

Cool Identifier Bindings are introduced by:

  • Class declarations (introduce class names)
  • Method definitions (introduce method names) – Let expressions (introduce object id’s)
  • Formal parameters (introduce object id’s)
  • Attribute definitions (introduce object id’s)
  • Case expressions (introduce object id’s)

Class Definitions:

  • Cannot be nested.
  • Are globally visible throughout the program.
  • Class names can be used before they are defined.

Class Attributes:

  • Attribute names are global within the class in which they are defined

Class Methods:

  • Method names have complex rules.
  • A method need not be defined in the class in which it is used, but in some parent class.
  • Methods may also be redefined (overridden).

Type System

Type Operations:

  • Type Checking. The process of verifying fully typed programs
  • Type Inference. The process of filling in missing type information

Types in Cool:

  1. Class names: Builtins (Int; String; Bool; Object; IO) and User Defined.
  2. SELF_TYPE.

Sub-Typing:

  • Types can be thought of as sets of attributes and operations defined on these sets.
  • All types are subtypes of the Object type.
  • Types can inherit from other types other than the Object type.
  • No type is allowed to inherit from the following types only: Int, Bool, String and SELF_TYPE.
  • All type relations can be thought of as a tree where Object is at the root and all other types branching down from it, this is also called the inheritance tree.
  • A least upper bound (lub) relation of two types is their least common ancestor in the inheritance tree.
  • Subclasses only add attributes or methods.
  • Methods can be redefined but with same type.
  • All operations that can be used on type C can also be used on type C', where C' <= C, meaning C' is a subtype of C.

Typing Methods:

  • Method and Object identifiers live in different name spaces.
    • A method foo and an object foo can coexist in the same scope.
  • Logically, Cool Type Checking needs the following 2 Type Environments:
    • O: a function providing mapping from types to Object Identifiers and vice versa.
    • M: a function providing mapping from types to Method Names and vice versa.
  • Due to SELF_TYPE, we need to know the class name at all points of Type Checking methods.
    • C: a function providing the name of the current class (Type).

SELF_TYPE:

SELF_TYPE is not a Dynamic Type, it is a Static Type.

SELF_TYPE is the type of the self parameter in an instance. In a method dispatch, SELF_TYPE might be a subtype of the class in which the subject method appears.

Usage:

  • SELF_TYPE can be used with new T expressions.
  • SELF_TYPE can be used as the return type of class methods.
  • SELF_TYPE can be used as the type of expressions (i.e. let expressions: let x : T in expr).
  • SELF_TYPE can be used as the type of the actual arguments in a method dispatch.
  • SELF_TYPE can not be used as the type of class attributes.
  • SELF_TYPE can not be used with Static Dispatch (i.e. T in m@T(expr1,...,exprN)).
  • SELF_TYPE can not be used as the type of Formal Parameters.

Least-Upper Bound Relations:

  • lub(SELF_TYPE.c, SELF_TYPE.c) = SELF_TYPE.c.
  • lub(SELF_TYPE.c, T) = lub(C, T).
  • lub(T, SELF_TYPE.c) = lub(C, T).

Semantic Analysis Passes

[incomplete]

  1. Gather all class names.
  2. Gather all identifier names.
  3. Ensure no undeclared identifier is referenced.
  4. Ensure no undeclared class is referenced.
  5. Ensure all Scope Rules are satisfied (see: above).
  6. Compute Types in a bottom-up pass over the AST.

Error Recovery

Two solutions:

  1. Assign the type Object to ill-typed expressions.
  2. Introduce a new type called No_Type for use with ill-typed expressions.

Solution 1 is easy to implement and will enforce the type inheritance and class hierarchy tree structures.

Solution 2 will introduce further adjustments. First, every operation will be treated as defined for No_Type. Second, the inheritance tree and class hierarchy will change from being Trees to Graphs. The reason for that is that expressions will ultimately either be of type Object or No_Type, which will make the whole representation look like a graph with two roots.

Resolve shift/reduce conflicts

Parsing the /examples/graph.cl program leads to shift/reduce conflicts in the parser.

Parser output sample:

Generating LALR tables
WARNING: 9 shift/reduce conflicts

Implement Code Optimisation Stage, Part 2 - Global Optimisation

Global Optimisation

Assumes information exists on all points of the program execution, which is typically gathered through Data Flow Analysis.

Data Flow Analysis

Data Flow Analysis runs in terms of relating information between adjacent program points by either transferring information or pushing information. Information about expressions gets transferred from the output of predecessor expressions to the input of their proceeding ones. Pushing information runs in terms of the body of each and every expression from the input of that expression until its output. Transferring information is an analysis of information running from one expression to the other, as adjacent points in the program, whereas pushing information is analysis of the flow of information within the scope of every expression.

Typically Data Flow Analysis tags all SSA-expressions in the execution of the program with 3 tags: C for constant, T for top and B for bottom. Constant means that the value of the expression up until the given point of execution is a constant and therefore can be propagated with a constant. Top means that up until the given point of execution we don't know the value of the expression. Bottom means that up until the given point of execution, the subject expression never excuses and therefore can be eliminated.

The ordering of these tags is as follows: B < C < T.

The Least-Upper Bound logical operation calculates the upper bound in the previous ordering. For example:

  • lub(B, 1) = 1, where 1 and 2 are constants, and therefore tagged with C.
  • lub(B, C) = C.
  • lub(1, 2) = T. Constants are incomparable.
  • lub(T, B) = T.

Global Constant Propagation

Given an ordered set of predecessor statements (SSA-expressions) called ps, and a successor statement we are analysing called s then we can define the following Constant Propagation rules.

We begin by defining the rule that relates information between adjacent points in the program:

  • C(s, x, in) = lub { C(p, x, out) | p is a predecessor of s }; where:
    • in is the input to point of s.

    • out is the output point of every predecessor p the belongs to the set ps.

    • x is an assignment statement target propagated until the execution point of s, for example:

      x := 1
      y := x * 2
      ...
      s := ... 
      
    • This rule relates the out of one statement to the in of the next statement.

Next, we define the following rules which relate the in of a statement to the out of the same statement:

  • C(s, x, out) = B if C(s, x, in) = B.
  • C(x := c, x, out) = c if c is a constant (C).
  • C(x := f(...), x, out) = T. We don't evaluate the complex inner statement.
  • C(y := ..., x, out) = C(y := ..., x, in) if x <> y; where:
    • y := ... means that it's an statement that does't read nor update x.

Algorithm:

  1. For every entry s to the program, set C(s, x, in) = T.

  2. Set C(s, x, in) = C(s, x, out) = B everywhere else.

  3. Repeat until all points satisfy the above rules (1-5):

    3.1. Pick s, where s is not satisfying rules 1-5 and update it using the appropriate rule.

Global Constant Propagation is a forward-probagation analysis. This analysis runs from earlier points in the program to later ones.

Global Liveness Analysis

Also know as Live Variables Analysis.

A variable x is live at statement s if:

  • There exists a statement s' that uses x.
  • There is a path from s to s'.
  • That path has no intervening assignment to x.

A statement x := ... is dead code if x is dead after the assignment.

We can express Livelinessin terms of information transferred between adjacent statements, just as in copy/constant propagation. Except that it is much simpler. Liveliness is a boolean: True or False. A statement is either live or not.

Given a predecessor statement p and a set of successor statements ss, we can define the following Liveliness Analysis rules:

  • L(p, x, out) = v { L(s, x, in) | s a successor of p }; where v is the logical OR operator.
  • L(s, x, in) = True if s refers to x on the rhs.
    • Example:

      ...          => x is live!
      ... := f(x)
      ...          => x is live!
      
  • L(x := e, x, in) = False if e does not refer to x.
  • L(s, x, in) = L(s, x, out) if s does not refer to x.

Algorithm:

  1. Let all L(...) = False initially.

  2. Repeat until all statements s satisfy rules 1-4:

    2.1. Pick s where one of 1-4 does not hold and update using the appropriate rule

Global Liveness Analysis is a backward-probagation analysis. This analysis runs backwards from later points in the program to earlier ones.

Parser Improvements

  • Improve the error handling.
  • Improve error reporting via Custom Error (Exception) classes. Report the parsing problem, the token(s) that caused it in addition to the line number.
  • Resolve all possible shift/reduce, reduce/shift problems. More testing is needed.

Design the Formal Language Grammar of COOL in BNF.

  • Consult the COOL Reference Manual.
  • Formulate COOL Grammar as a Context-Free Grammar (CFG) in Backus–Naur Form (BNF).
  • Write the CFG in a markdown document.
  • Write the CFG to be parsed by an LALR parser.

Compiler doesn't produce outfile

Bug. The compiler pycoolc doesn't generate outfile when using the --outfile option.

Description. I installed pycoolc on my Linux Ubuntu system, and it installed correctly. Then, I tried to compile a simple "hello world" program in COOL using pycoolc, and it compiled. But when I used pycoolc to generate an outfile so that I could run it using spim, it didn't generate any.

Terminal interaction. This is exactly what happened when I tried using your compiler on my hello.cl file:

root@rafier:/media/rafi007akhtar/Stuff/UEM/Sem 7 Stuff/COOL# # Finding all files staring with 'h'
root@rafier:/media/rafi007akhtar/Stuff/UEM/Sem 7 Stuff/COOL# find h*
hello.cl
root@rafier:/media/rafi007akhtar/Stuff/UEM/Sem 7 Stuff/COOL# # hello.cl exists; now compiling using pycoolc
root@rafier:/media/rafi007akhtar/Stuff/UEM/Sem 7 Stuff/COOL# pycoolc hello.cl
root@rafier:/media/rafi007akhtar/Stuff/UEM/Sem 7 Stuff/COOL# # compiled; now generating outfile
root@rafier:/media/rafi007akhtar/Stuff/UEM/Sem 7 Stuff/COOL# pycoolc hello.cl --outfile helloAsm.s
root@rafier:/media/rafi007akhtar/Stuff/UEM/Sem 7 Stuff/COOL# # Let's see if this file exists
root@rafier:/media/rafi007akhtar/Stuff/UEM/Sem 7 Stuff/COOL# find h*
hello.cl
root@rafier:/media/rafi007akhtar/Stuff/UEM/Sem 7 Stuff/COOL# find helloAsm.s
find: ‘helloAsm.s’: No such file or directory
root@rafier:/media/rafi007akhtar/Stuff/UEM/Sem 7 Stuff/COOL# # Clearly, compiler doesn't generate outfile

In case you're interested, here is source code of hello.cl.

Looking forward to seeing the issue resolved!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.