GithubHelp home page GithubHelp logo

chocolateloverraj / ezc Goto Github PK

View Code? Open in Web Editor NEW
1.0 1.0 0.0 399 KB

Ez coding, easy installing

JavaScript 0.09% TypeScript 96.29% Shell 3.63%
typescript llvm llvm-bindings javascript code compiled-language llvm-ir

ezc's Introduction

ezc

TS-Standard - Typescript Standard Style Guide

๐Ÿšง In Progress ๐Ÿšง

This project is not ready to use (and probably not ready for contributions) yet. If u have any ideas, create an issue or discussion or comment on one.

What

Ezc is a converter that converts LLVM IR with custom plugin to normal LLVM IR.

Why

Ezc is supposed to be fun and easy coding that isn't interpreted. My favorite coding language is TypeScript. Python is another common coding language. Both TypeScript and Python are interpreted.

Why not use interpreted code?

  • Not usable on all computers. U can't use JavaScript on a Raspberry Pi Pico
  • Slower
  • U can make libraries that are compatible with basically every other coding language
  • I feel like using compiled code

Why not use C?

  • There are some things u can't do in C. You can't have binded functions.
  • It's very hard to modularize C code. U end up with files which are 1000 lines long!
  • C is not for me

Why not use C++

  • Hard to modularize
  • Some things u can't do

Why not use Java

  • Very big compile times
  • I don't like Java
  • Java is still interpreted

Plugins

Plugins can:

  • Have custom tokens and key words
  • Have custom nodes
  • Do custom type checking
  • Transform their custom features into normal LLVM IR code

ezc's People

Contributors

chocolateloverraj avatar

Stargazers

 avatar

Watchers

 avatar

ezc's Issues

Rename?

  • Update description on GitHub
  • Explain things in README.md
  • Maybe rename it to just 'ez' or something else (originally, ezc was going to output C code, which would be used with a C compiler. That plan got changed to output LLVM IR instead)

Easier / less explicit function calling

  • No need to say call
  • No need to specify return type and input types

Before:

@0 = private unnamed_addr constant [13 x i8] c"Hello World!\00"

declare i32 @puts(ptr)

define i1 @main() {
EntryBlock:
  %0 = call i32 @puts(ptr @0)
  ret i1 0
}

After:

@0 = private unnamed_addr constant [13 x i8] c"Hello World!\00"

declare i32 @puts(ptr)

define i1 @main() {
EntryBlock:
  %0 = @puts(@0)
  ret i1 0
}

Better comparisons

  • <
  • >
  • ==
  • <=
  • >=
  • !=

Before:

%boolean = icmp eq i8 %a, %b

After:

%boolean = %a == %b

Less than and greater than needs to know if integer is signed, so this kind of needs #13

Parse open / close as tokens instead of a special token

This is about (, ), {, }, [, ]

Pros

  • Less code when parsing tokens
  • Less code when parsing nodes which require opening and closing tokens
  • There are currently no benefits being used of having the opening and closing tokens grouped together.

Cons

If there are any put them here

Named struct elements with dot element syntax

Syntax like c or TypeScript or something else, not decided for sure.

One problem is that %variable.property conflicts with the ability to have . in variable names in llvm. We don't know if %variable.property is a variable named 'variable.property' or if it's a variable named 'variable' and a property named 'property'.

Possible solutions to the problem:

Parenthesis syntax: (%variable).property

Pros

  • Doesn't break current syntax

Cons

  • Annoying to type and read
  • Should elements also be prefixed with %? It would be inconsistent to not be prefixed with % but typing % would be annoying.

Don't allow . in variable names

Pros

  • Cleaner looking syntax
  • Less characters
  • More similar to other programming languages

Cons

  • Breaking change
  • Could look like it's just one variable

Before:

%Struct = type { i32, i32 }

define i1 @main() {
  EntryBlock:
    %ptr = alloca %Struct
    store %Struct { i32 25, i32 1234 }, ptr %ptr
    %data = load %Struct, ptr %ptr
    
    %a = extractvalue %Struct %data, 0
    %b = extractvalue %Struct %data, 1
    
    ret i1 0
}

After (exact syntax not decided yet):

%Struct = type { 
  i32 a, 
  i32 b
}

define i1 @main() {
  EntryBlock:
    %ptr = alloca %Struct
    store %Struct { 
      a = 25, 
      b = 1234
    }, ptr %ptr
    %data = load %Struct, ptr %ptr
    
    %a = (%data).a
    %b = (%data).b
    
    ret i1 0
}

Signed integers

Why

It makes sense that LLVM doesn't have signed / unsigned integers, but for the programmer, it's nice to have signed / unsigned integers. Then when you multiply or divide, you don't need to specify signed or unsigned.

Without Plugin

%signedInt = load i32, ptr %ptr
%unsignedInt = load i32, ptr %ptr

Possible syntaxs

u / s

%signedInt = load si32, ptr %ptr
%unsignedInt = load ui32, ptr %ptr

signed / unsigned

%signedInt = load signed i32, ptr %ptr
%unsignedInt = load unsigned i32, ptr %ptr

Default

It would be confusing to have the default be signed or unsigned, so the plugin would require to specify signed or unsigned.

If / else

Before:

define void @someFunction() {
  EntryBlock:
    br i1 %cond, label %IfEqual, label %IfUnequal
  IfEqual:
    ; Do something
    br label %After
  IfUnequal:
    ; Do something
    br label %After
  After:
    ; Do something
}

After:

define void @someFunction() {
  EntryBlock:
    if (%cond) {
      IfEqualBlock:
        ; Do something
    } else {
      IfUnequalBlock:
        ; Do something
    }
    ; Do something
}

Array bracket syntax (`myArray[0]`)

Before:

@msg = private unnamed_addr constant [13 x i8] c"Element: %i\0A\00"

declare i32 @printf(ptr, ...)

@array = private unnamed_addr constant [3 x i32] [i32 111, i32 222, i32 333]

define void @getElementInLocalMemory () {
  EntryBlock:
    %array = load [3 x i32], ptr @array
    %element = extractvalue [3 x i32] %array, 2
    call i32 @printf(ptr @msg, i32 %element)
    ret void
}

define void @getElementWithPtr () {
  EntryBlock:
    %elementPtr = getelementptr [3 x i32], ptr @array, i64 0, i64 2
    %element = load i32, ptr %elementPtr
    call i32 @printf(ptr @msg, i32 %element)
    ret void
}

define i1 @main () {
  EntryBlock:
    call void @getElementInLocalMemory()
    call void @getElementWithPtr()
    ret i1 0
}

After:

@msg = private unnamed_addr constant [13 x i8] c"Element: %i\0A\00"

declare i32 @printf(ptr, ...)

@array = private unnamed_addr constant [3 x i32] [i32 111, i32 222, i32 333]

define void @getElementInLocalMemory () {
  EntryBlock:
    %array = load [3 x i32], ptr @array
    %element = %array[2]
    call i32 @printf(ptr @msg, i32 %element)
    ret void
}

define void @getElementWithPtr () {
  EntryBlock:
    %elementPtr = @array[2]
    %element = load i32, ptr %elementPtr
    call i32 @printf(ptr @msg, i32 %element)
    ret void
}

define i1 @main () {
  EntryBlock:
    call void @getElementInLocalMemory()
    call void @getElementWithPtr()
    ret i1 0
}

Note:

Parsing identifiers

In LLVM there are 3 types of identifiers:

  • %variable, referenced as %variable
  • @variable, referenced as @variable
  • NameOfBlock:, referenced as %NameOfBlock

These need to be parsed into tokens and nodes.

Parsing Options

All identifiers are part of 1 token

%variable -> Token that is a % type with the name 'variable'
@variable -> Token that is a @ type with the name 'variable'
'EntryBlock:' -> Token that is a blockName type with the name 'EntryBlock'

Pros

Parsing node is basically just reading 1 token and keeping the same data

Cons

Incompatible with #10

Parse the name of the identifier as it's own token

%variable -> ['%', 'variable']

Pros

Compatible with #10

Cons

Literally any keyword like alloca would be counted as a 'name' instead of a keyword. parseName would have to be tried last after all other token parsers.

Example if parseName was before parseKeyword:
alloca -> name called 'alloca'

Example if parseName was before parseNumberLiteral
77 -> name called '77'

Example if parseNumberLiteral was before parseName
@0 -> keyword @, number literal 0

This problem has a messy solution: Treat all keywords and number literals as names, and then when it's time to parse tokens into nodes convert names to keywords or number literals.

No % in variable names

Before:

%product = mul i32 %a, %b

After:

product = mul i32 a, b

Pros:

  • Less typing
  • Smaller files
  • More like other programming languages
  • Can use % as a remainder operator. For example:
; This could work without this plugin, with a different plugin:
%remainder = i32 7 % 4 ; 3
; This wouldn't work without a space:
%remainder = i32 7%4

; This can work with this plugin;
remainder = i32 7 % 4
remainder = i32 7%4

Cons:

  • Conflicts with keywords. For example:
    This works:
%alloca = mul i32 %a, %b

But without the %, it would be confusing for the programmer and the parser:

alloca = mul i32 %a, %b
  • There would have to be variable names that are not allowed
  • Using %variable you can easily tell that it's a variable. Without the %, it might be harder to understand the code

Typed pointers

Examples

Add 2 numbers given the pointers to the numbers

Before:

define i32 @addWithPtrs (ptr %aPtr, ptr %bPtr) {
  EntryBlock:
    %a = load i32, ptr %aPtr
    %b = load i32, ptr %bPtr
    %result = add i32 %a, %b
    ret i32 %result
}

After:

define i32 @addWithPtrs (i32* %aPtr, i32* %bPtr) {
  EntryBlock:
    %a = load %aPtr
    %b = load %bPtr
    %result = add i32 %a, %b
    ret i32 %result
}

Malloc

In c, void* is used, but void* doesn't make sense because void is like having no type, when we need a pointer which is pointing to any type. So any* can be used instead to be clearer.
Before:

declare ptr @malloc (i64)

After

declare any* @malloc (i64)

Pros

  • You can't accidentally input a pointer pointing to a wrong type, for example passing a i8* pointer where i32* is expected would be bad.
  • Less repetition - just load type can be automatic based on the pointer type
  • More like other programming languages

Cons

  • Need to explicitly cast pointer type

Type checking

Examples of type checking:

define i1 @main () {
  EntryBlock:
    ; Should not be allowed to return i32
    ret i32 0
}
define void @someFn () {
  EntryBlock:
    ; Need some sort of exiting instruction, like ret or br
}
@var = private constant unnamed_addr i1 [] ; Array is not i1

How to implement:

Checking if a value matches a type

  • Any integer can be used for integer types
  • Any number can be used for float types (except actually in LLVM u need a . in the number, but we don't store that in ezc)
  • Strings match [length x i8] type
  • Struct literals match struct types
  • Array literals match array types
  • undef (not implemented yet) can be used for any type
  • @identifier always has ptr type

Nodes which check types

  • Global variable checks that type matches value
  • Block checks that block is exited
  • Function / block checks that returned value's type matches return type
  • Return instruction checks that type matches value
  • Assignment instruction checks that the type matches the assignable's value
  • Call assignable doesn't check that call types matches define or declare function type!

Checking for duplicate identifiers

declare void @someFn ()
declare void @someFn () ; Duplicate `@someFn`
declare i1 @fn ()

define i1 @main () {
  EntryBlock:
    %var = call i1 @fn()
    %var = call i1 @fn() ; Duplicate `%var`
    ret i1 0
}

Checking for references to undefined variables

define i1 @main () {
  EntryBlock:
    call void @coolFn() ; No `@coolFn`
    ret i1 0
}
define i1 @main () {
  EntryBlock:
    ret i1 %result ; No `%result`
}

Checking types and getting error locations

Checking each node

Then certain nodes would actually check types. Idk how the incorrect return type would be checked yet. Each node will only check sub-nodes if it is needed (ex to check if a type and value match). This is how some things would work:

Check function:

  • Check if returned type from a return instruction matches function return type

Check return instruction:

  • Check if the returned value matches the type

Check file:

  • Make sure there are no duplicate identifiers, like two functions with the same name

Check identifier:

  • Make sure that the identifier is defined, and it is being used as a type / value correctly

Check call assignable:

  • In each input, check that the type matches the value

Error figuring out types

Example:

define ptr @fn () {
  EntryBlock:
    ret ptr @something
}

In this file, @something is not defined. This would lead to an error when checking the return instruction. To be specific, the error is part of just the identifier (@something) and the return instruction just depends on knowing what @something is to do its checking. It would be very annoying to have errors lead to confusing behavior, or to have one error result in 20 errors. For example, we don't want this:

  • Error: @something is undefined (from identifier node)
  • Error checking if type and value match (from return instruction)
  • Error in a instruction (from block)
  • Error in a block (from function)
  • Error in a sub-node (from file)

Instead, we just want this:

  • Error: @something is undefined (from identifier node)

This is part of the reason why we just return sub-nodes in a seperet function instead of doing recursive type checking for each node.

No "c" needed for strings

Before:

@0 = private unnamed_addr constant [13 x i8] c"Hello World!\00"

After:

@0 = private unnamed_addr constant [13 x i8] "Hello World!\00"

Nicer looking math (+, - , *, /)

Before:

%result = add i32 %a, %b
%result = sub i32 %a, %b
%result = mul i32 %a, %b
%result = udiv i32 %a, %b
%result = sdiv i32 %a, %b

After:

%result = %a + %b
%result = %a - %b
%result = %a * %b
%result = %a u/ %b
%result = %a s/ %b

Dealing with signed / unsigned

+, -, and * all work with both signed and unsigned numbers. / has two separate operations depending on if the numbers are signed or unsigned.

u/ and s/

u/ for unsigned division and s/ for signed division. This looks ugly, but I like it better than the normal LLVM way.

Automatic

Combined with #13, you can just do

%result = %a / %b

and it will be transformed into udiv or sdiv based on the type of %result.

Create a seperet node for inputs

Rn in something like:

declare void @someFn(ptr nofree %pointer)

And this:

define void @someFn(ptr nofree %pointer) {
  EntryBlock:
    ret void
}

Both have a similar input (ptr nofree %pointer). It would be convenienter and less messy to have input be a seperet node with type, flags, and an optional name (btw, even in functions names can be optional).

CLI

  • Input files
  • Output files
  • Parses input files, including plugins
  • Plugins transform custom nodes into core nodes
  • Plugins check types and stuff and report error messages
  • Core type checking and error messages
  • Transform nodes into output file

Something like `size_t`

The most annoying thing in LLVM IR is that there is no size_t, so you have to type i64 for size_t (if size_t is a i64). The plugin can either introduce a key word called size_t, sizeT or something else, or have an import which can import size_t.

Before:

declare ptr @malloc(i64)

After

declare ptr @malloc(size_t)

Automatic string types

Before

@0 = private unnamed_addr constant [13 x i8] c"Hello World!\00"

After

@0 = private unnamed_addr constant {"auto" or something or blank I didn't decide yet} c"Hello World!\00"

Create a seperet node for input flags

  • Seperet logic for parsing key words into input flags and converting input flags into key words
  • Input flags could be used for other things like attributes

Enums

  • Enums have an integer type
  • The bit size is automatic so all items have a unique numbers without overflowing, using Math.ceil(Math.log(numberOfItems) / Math.log(2))
  • For example if there are 5 items, i3 is used because 2^2 = 4 and 2^3 = 8

Example:

%Dessert = enum {
  COOKIE,
  DONUT,
  CAKE,
  BROWNIE,
  ICE_CREAM,
}

define %Dessert @main() {
  EntryBlock:
    ret %Dessert.DONUT
} 

Turns into

%Dessert = type i3
@Dessert.COOKIE = private unnamed_addr constant %Dessert 0
@Dessert.DONUT = private unnamed_addr constant %Dessert 1
@Dessert.CAKE = private unnamed_addr constant %Dessert 2
@Dessert.BROWNIE = private unnamed_addr constant %Dessert 3
@Dessert.ICE_CREAM = private unnamed_addr constant %Dessert 4


define %Dessert @main() {
  EntryBlock:
    %donut = load %Dessert, ptr @Dessert.DONUT
    ret %Dessert %donut
} 

References to spot in source code for error reporting

Some errors are detected when parsing tokens, like when there is no known token:

return
^ Unknown token

Some errors are detected when parsing nodes:

define @main () {
             ^ No return type
  ...
}

Some errors are detected when checking nodes:

define i32 @fn () {
  EntryBlock:
    ret i1 0
        ^ Wrong type returned

The location of the error should be shown.

Progress:

  • Parse token error
  • Parse node error
  • Check node error

Options for getting parse node error locations

Node parsers return token index and message

@0 = private unnamed_addr constant [13 x i8] c"Hello World!\00"

declare @puts(ptr) ; No return type!

define i1 @main() {
  0:
    call i32 @puts(ptr @0)
    ret i1 0
}

parseFile would return something like this:

{
  error: true,
  result: {
    type: File, // Would be EnumItem
    index: 12, // Couldn't parse node
    message: "Couldn't parse any sub-node",
    subAttempts: [
      {
        type: GlobalVariable,
        index: 0,
        message: 'No @identifier'
      },
      {
        type: Declare,
        index: 1,
        message: 'Expected return type',
        subAttempts: [
          {
            type: IntegerType,
            index: 0,
            message: 'Expected integer type'
          },
          {
            type: ArrayType,
            index: 0,
            message: "Expected '['"
          },
          // ... - There are a lot of type parsers
      },
      {
        type: Function,
        index: 0,
        message: 'Expected define'
      }
    ]
  }
}

Then the CLI would say something like this:

Error parsing file: Couldn't parse any sub-node at file:2:1
- If u meant it to be global variable: No @identifier at file:2:1
- If u meant it to be declare: Expected return type at file:2:9
  - If u meant it to be number literal: Expected integer type at file:2:9
  - If u meant it to be array type: Expected '[' at file:2:9
  - ...
- If u meant it to be function: Expected define at file:2:1

This message could be really long! Some additional logic would have to be implemented to not have a super long message where u have to scroll a lot in the terminal.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.