GithubHelp home page GithubHelp logo

sunjay / brain Goto Github PK

View Code? Open in Web Editor NEW
167.0 6.0 12.0 552 KB

A high level programming language that compiles into the brainfuck esoteric programming language

License: MIT License

Rust 100.00%
brainfuck-interpreter brain rust brainfuck-programs brainfuck esoteric-programming-language language programming-language

brain's Introduction

brain

Crates.io Crates.io Build Status Build status Dependency Status Gitter

brain is a strongly-typed, high-level programming language that compiles into brainfuck. Its syntax is based on the Rust programming language (which it is also implemented in). Though many Rust concepts will work in brain, it deviates when necessary in order to better suit the needs of brainfuck programming.

brainfuck is an esoteric programming language with only 8 single-byte instructions: +, -, >, <, ,, ., [, ]. These limited instructions make brainfuck code extremely verbose and difficult to write. It can take a long time to figure out what a brainfuck program is trying to do. brain makes it easier to create brainfuck programs by allowing you to write in a more readable and understandable language.

The type system makes it possible to detect a variety of logical errors when compiling, instead of waiting until runtime. This is an extra layer of convenience that brainfuck does not have. The compiler takes care of generating all the necessary brainfuck code to work with the raw bytes in the brainfuck turing machine.

The brain programming language compiles directly into brainfuck. The generated brainfuck code can be run by a brainfuck interpreter. brain only targets this interpreter which means that its generated programs are only guaranteed to work when run with that. The interpreter implements a brainfuck specification specially designed and written for the brain programming language project.

Optimization Goals

The brain compiler is designed to optimize the generated brainfuck code as much as possible.

  1. Generate small brainfuck files (use as few instructions as possible)
  2. Generate memory efficient code (use as few brainfuck cells as possible)

Optimization is an ongoing effort. As the project matures, these goals will become more expressed in the compiled output of the program.

brain syntax

For full examples, please see the examples/ directory. Some examples aren't fully implemented yet in the compiler.

cat program (examples/cat.brn)

// cat program
let mut ch: [u8; 1];

while true {
  // stdin.read_exact() panics if EOF is reached
  stdin.read_exact(ch);
  stdout.print(ch);
}

Compile this with brain examples/cat.brn.

Run this with brainfuck cat.bf < someinputfile.txt.

Reading Input (examples/input.brn)

// input requires explicit sizing
// always reads exactly this many characters or panics if EOF is reached before then
// if this many characters aren't available yet, it waits for you to send that many
let mut b: [u8; 5];
stdin.read_exact(b);
stdout.print(b"b = ", b, b"\n");

let mut c: [u8; 1];
stdin.read_exact(c);
stdout.print(b"c = ", c, b"\n");

// You can reuse allocated space again
stdin.read_exact(b);
stdout.print(b"b = ", b, b"\n");

Compile this with brain examples/input.brn.

This compiles into the following brainfuck:

,>,>,>,>,>+++++++++++++++++++++++++++++++++
+++++++++++++++++++++++++++++++++++++++++++
++++++++++++++++++++++.--------------------
-------------------------------------------
---.+++++++++++++++++++++++++++++.---------
--------------------.----------------------
----------<<<<<.>.>.>.>.>++++++++++.-------
---,>++++++++++++++++++++++++++++++++++++++
+++++++++++++++++++++++++++++++++++++++++++
++++++++++++++++++.------------------------
-------------------------------------------
.+++++++++++++++++++++++++++++.------------
-----------------.-------------------------
-------<.>++++++++++.----------<<<<<<,>,>,>
,>,>>++++++++++++++++++++++++++++++++++++++
+++++++++++++++++++++++++++++++++++++++++++
+++++++++++++++++.-------------------------
-----------------------------------------.+
++++++++++++++++++++++++++++.--------------
---------------.---------------------------
-----<<<<<<.>.>.>.>.>>++++++++++.----------

Run this after compiling with brainfuck input.bf < someinputfile.txt.

Installation

For people just looking to use brain, the easiest way to get brain right now is to first install the Cargo package manager for the Rust programming language.

NOTE: Until this version is released, these instructions will NOT work. Please see the Usage instructions below for how to manually install the compiler from the source code.

Then in your terminal run:

cargo install brain
cargo install brain-brainfuck

If you are upgrading from a previous version, run:

cargo install brain --force
cargo install brain-brainfuck --force

Usage

For anyone just looking to compile with the compiler:

  1. Follow the installation instructions above
  2. Run brain yourfile.brn to compile your brain code
  3. Run brainfuck yourfile.bf to run a brainfuck interpreter which will run your generated brainfuck code

You can also specify an output filename. Run brain --help for more information.

For anyone looking to build the source code:

This project contains both the brain compiler and a basic brainfuck interpreter.

Make sure you have Rust and cargo (comes with Rust) installed.

brain compiler

To compile a brain (.brn) file into brainfuck (.bf)

cargo run filename.brn

where filename.brn is the brain program you want to compile

Use --help to see further options and additional information

cargo run -- --help

If the brain compiler seems to be taking too long or "hanging", try running cargo build first to see if the Rust compiler is just taking too long for some reason.

You can also install the compiler from the source code using this command in the repository's root directory:

cargo install --path .

Examples

There are various brain examples in the examples/ directory which you can compile into brainfuck using the usage instructions above.

Thanks

This project would not be possible without the brilliant work of the many authors of the Esolang Brainfuck Algorithms page. The entire wiki has been invaluable. That page in particular is the basis for a lot of the code generation in this compiler. I have contributed many novel brainfuck algorithms to that page as I come up with them for use in this compiler.

brain's People

Contributors

sunjay avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

brain's Issues

Comparison Operators

  • == operator
  • != operator
  • >= operator
  • <= operator
  • > operator
  • < operator
  • Operators only work on types that can actually implement them

Test Runner

Much like the test runner in Rhino, we should have something that runs the example brain files.

We could structure it like this:

examples/
    brain/
        test1.brain
        test1.in
        test1.out

The .in file is optional and can be used to provide input to the test .brn file. The test runner would be in tests/example_runner.rs or something. It would run each test with its input (if available) and compare it to the output to always ensure that our examples work.

  • Be able to test errors as well like syntax errors, etc.
  • Make sure all .brn examples at least compile (add this as a test)

break and continue

To implement break and continue, add a step to the desugaring process which replaces break and continue statements with the appropriate dummy variables and if statements.

From the Brainfuck algorithms wiki:

break and continue

To implement break and continue statements in loops, consider that the following two pieces of pseudocode are functionally equivalent:

while (foo) {
 if (bar == foo) {
  if (x > 2) {
   break;
  }
  else {
   // do stuff
  }
  // do stuff
 }
 // update foo for the next iteration
}

// Equivalent without break statement:
while (foo) {
 shouldBreak = false
 if (bar == foo) {
  if (x > 2) {
   shouldBreak = true
  }
  else {
   // do stuff
  }
  
  // don't evaluate any more code in the loop after breaking
  if (!shouldBreak) {
   // do stuff
  }
 }
 if (shouldBreak) {
  // so that the loop stops
  foo = 0
 }
 else {
  // update foo for the next iteration
 }
}

Notice that we need to guard all code after the break statement in the loop to prevent it from running. We don't need to guard in the else statement immediately after the break statement because that will never run after the break statement has run.

This approach allows us to implement break and continue statements in brainfuck despite the lack of sophisticated jump instructions. All we're doing is combining the concept of an if statement (defined below) with the while loop we just defined and applying it here.

Implementing a continue statement is the same thing except you never guard the loop updating code:

while (foo) {
 if (bar == foo) {
  if (x > 2) {
   continue;
  }
  else {
   // do stuff
  }
  // do stuff
 }
 // update foo for the next iteration
}

// Equivalent without continue statement:
while (foo) {
 shouldContinue = false
 if (bar == foo) {
  if (x > 2) {
   shouldContinue = true
  }
  else {
   // do stuff
  }
  
  // don't evaluate any more code in the loop after continuing
  if (!shouldContinue) {
   // do stuff
  }
 }
 
 // This code stays the same after a continue because we still want to move on to the next iteration of the loop
 // update foo for the next iteration
}

To implement both break and continue, you can compose the concepts here and make any combination you want. You can consider break and continue statements to be "sugar" that needs to be "desugared" in your brainfuck code.

Tuples and Tuple Structs

Extension of #39 with support for arbitrary tuples.

Tuples can have heterogeneous types.

Tuples:

let foo: (u8, u16, u32) = (1, 10000, 1000000);

Tuple struct:

struct Foo(u8);

struct Foo(u8, u16);

Accessing tuple fields will use special number syntax like Rust.

// get
foo.0;
// set
foo.0 = 2;
  • Add numeric field access to grammar

High Level Libraries

Brainfuck doesn't really have anything other than basic IO. It would be cool if we could design some libraries in another library that could interface with a brainfuck program.

Example: a graphics program written in some language that interprets output produced by the brainfuck executable to perform graphics commands. With something like this, we could potentially create graphics or drawings or animations using a brain program.

brain program -> brainfuck output -> brainfuck interpreter -> graphics commands -> graphics program

Tagged Unions / Enums

Needs #39 and optionally #40 before implementation.

Support tagged unions / enums with arbitrary nesting.

enum Foo {}

enum Bar {
    Foo(Foo),
    NotFoo,
    DefinitelyNotFoo {a: u8, b: u8},
}

enum Direction {
     North,
     East,
     South,
     West,
}
  • Size is of the largest variant + tag which is used to determine type
  • Accessing variants using paths Bar::NotFoo
  • Bring variants into the current namespace use Bar::*

32-bit integers

It would be really cool to support 32-bit integers in Brain. That means four 8-bit cells.

It should support the following syntax:

// Integers are 32-bit and are declared without sizing brackets
int num;
// inputs on integer variables are automatically coerced
stdin.read(int num2);
  • positive and negative numeric literals
  • Basic operations like addition, subtraction, multiplication, integer division, etc.
    • divide by zero?
  • Any expression with a type int can be used for indexing/slices
  • for loops with range and counter values that you can print
    int a = 10000;
    for i of 0..a {
        stdout.write(a, "\n");
    }
    
  • Implement FizzBuzz and add it to examples
  • If possible, generalize this to any integer multiple of 8 and implement u8, u16, u32, u64, i8, i16, i32, and i64, usize, isize
  • Methods for min/max limits: https://doc.rust-lang.org/std/primitive.u32.html#method.min_value
  • Support augmented arithmetic operators like +=, -=, etc.
  • panic!() if operations overflow (panic is defined as an infinite loop "+[]") - Brainfuck interpreter should know to exit if panic is encountered

The brainfuck algorithms wiki page is extremely useful for some of these operators.

Rust integer implementations: https://doc.rust-lang.org/beta/src/core/num/mod.rs.html#181

Future:

Casting Between Types

Example:

32u8 as u32

This requires allocating a new buffer and copying or possibly truncating the current buffer.

Help Wanted: Examples

We need some examples of writing brain code. These will end up in the examples/ directory in the project.

You can draw inspiration from things like:

  • your own programming
  • online programming questions/challenges

If you find that you need a feature that has not been implemented yet, please add a ๐Ÿ‘ reaction to the issue for that feature indicating that you want that. It also helps if you add a comment on that issue describing your use case and the problem you were trying to solve so when we implement that feature we know we did it right. If such an issue does not exist, you can create one requesting what you need and why you need it.

Add your example as a comment in this thread or open up a pull request adding it to the examples directory (make sure you use the .brn file extension).

Feel free to add a line at the top of the file with your GitHub profile (or whatever) and even a license if you want. (Try to keep the license permissive.) If you don't provide an explicit license, your example will be added to the repo as is and the license of the repo (MIT) will apply.

Package Manager and Build Tool

Something similar to the awesome Cargo

Ideas:

  • The brain package manager is called "nerve"
  • Nerve.toml for configuration (same as Cargo.toml)
  • Should resolve the dependency tree and build dependencies in the correct order
  • Anything that can be done in parallel should probably be done in parallel
  • Reject circular dependencies until we have a solution for that

Book / Tutorials

Nothing helps people use something more than documentation. It would be nice to provide something in addition to the examples which sequentially walks through the syntax and features of the brain programming language.

This could even start out as a simple cheat sheet: a single long example that demonstrates every concept and feature. We could then expand it out into multiple sections as needed.

https://www.gitbook.com/

  • Cover using the compiler to compile into brainfuck and then using a brainfuck interpreter to run the code

Functions / Closures

Functions can only be defined at the top level of modules and have access to global variables and their function arguments.

Closures are anonymous functions that capture values of variables from their outer scope(s).

Without a more advanced jump instruction, we really have no choice but to inline all functions. Recursive functions cannot be inlined, so we will have to implement some sort of tail recursion or something for those (or disallow recursion completely).

Low priority until MVP.

  • Disallow code outside of a function, make main() the entry point
  • Automatically declaring stdin/stdout may not be viable anymore because that runs code and allocates memory so we may have to define some std functions to create them just like in Rust
  • Function type definitions in grammar
  • Store literals in a temporary memory location when passing to non-built-in functions
  • Store functions in an IR that is independent of the position of each of the function's arguments
  • Use TypeIds to make sure the types used in that function cannot be accidentally overridden by later declarations
    • If Foo is declared in the same module as a function and then another Foo is declared in another module, the first Foo should still be used in the function no matter what context the function is called in
  • Make sure functions can be declared in variables
  • Error when wrong number of arguments is applied to a method
  • Functions can be assigned to variables with type fn(foo) -> ReturnType like Rust

Unicode Compatible String

The reason brain has been using u8 instead of a type called char is because we have been reserving char for when we finally have time to implement unicode support. The problem is that the concept of a "character" extends far beyond ascii so if we ever want this to be useful we should add support for unicode from the get go.

See Rust's implementation of a unicode char: https://doc.rust-lang.org/beta/std/primitive.char.html

The char type represents a single character. More specifically, since 'character' isn't a well-defined concept in Unicode, char is a 'Unicode scalar value', which is similar to, but not the same as, a 'Unicode code point'.

Possibly useful regarding the memory layout: https://www.youtube.com/watch?v=kPR8h4-qZdk

  • char type
  • str type

Reserved keywords

Reserved keywords should result in a syntax error. This saves us from problems with migrations later on.

Could use cond_with_error!() or something else to implement

  • brainstorm which keywords should be reserved based on future requirements
  • Ban keywords in identifiers
    • byte, int, in, out, for, while, do, move, if, else, break, continue, loop, abstract, struct, as, become, const, let, import, use, export, do, enum, extern, pub, true, false, final, impl, class, mod, match, mut, ref, return, self, static, super, trait, unsafe, yield, typeof, type, where, of, raw, eval, clear, read

Most of these keywords are just reserved in case we need them for future language features.

Optimize brainfuck output

We want to generate the smallest brainfuck files possible while still using as few brainfuck cells as possible. Less instructions is a greater priority than memory efficiency since brainfuck doesn't use that much memory to begin with. That being said, memory is still important and must be taken into account.

  • Make optimization a pure operation Instructions -> Instructions and write tests
  • Make optimization level a command line argument
  • Cancel out opposite operations with no other instruction between them (cancel out opposing >< or <> and opposing +- or -+)
  • Remove any instructions on cells that never get written out or get overwritten by reads (especially at the end of the instructions)
    • At the end, you can safely remove any ><+- instructions since these have no consequences at the end of the program
    • Dead code removal
    • Example: ++++++++++++++, is completely useless since , overwrites all the +
  • Modify the order in which cells are modified in order to avoid unnecessary movement instructions (e.g. when copying)
  • Use allocated cells as temporary cells when copying string
    • Example: Let's say you want to copy "hello". The memory might look like [h][e][l][l][o][ ][ ][ ][ ][ ][ ] where you have the original cells for "hello", enough cells for the copy and a temporary space to use during the copy to keep the original letter. Instead of using a temporary cell at the very end, you can use the adjacent spot to the letter you are copying. So if you were copying "h" into its destination cell, instead of using the temporary cell allocated at the very end, you could use the space for "o". That cell would automatically be cleared at the end of the copy so this is a safe operation. This saves a lot of instructions going back and forth over completely empty cells and will produce even higher savings when the cells to copy aren't right next to each other on the tape. Note: You still need an empty temporary cell at the end for the last character.
  • Interlacing: Instead of ,>,>,>,<<<.>.>.>., we can probably do ,.>,.>,.>,. (needs thought on the consequences of this since it might not be 100% safe)
  • Replace long strings of repeated instructions with loops
  • Loop fusion
  • Expanded output mode of compilation - add a mode to the compiler that explicitly outputs large amounts of repeated instructions instead of rolling those into loops using temporary cells. This saved on memory while potentially making the binary huge. The real reason for introducing this however is because long strings of the same instruction are easier to optimize and more decidable than loops which can only be unrolled in the simplest of cases.

Resources:

User Defined Structs

Support basic struct syntax including nested structs:

struct Foo {
    x: u8,
    y: u8,
}

struct Bar {
    pos: Foo,
    data: u8,
}

struct Unit;

Memory layout will likely be sequential though we shouldn't put any guarantees on ordering. This is essentially a way to group variables when passing them around to functions.

  • Be able to use structs where other types used to be used (arrays, functions #16, etc.)
  • Struct names cannot be paths
  • Struct names cannot be keywords

References, Mutable References, Moves/Copies

Reference: &Foo
Mutable reference: &mut Foo
Move/Copy: Foo

The data should be moved if type does not implement Copy, copy otherwise.

  • Fix all variables are mutable references #67
  • Consider making variables immutable by default and add a mut keyword
    • This may not constitute a major semver change depending on the breaking impact
  • Static analysis should prevent using a value after it has been moved
    • Could start by not supporting moves if the static analysis isn't mature enough yet (this would mean non-copy types would always have to be passed by reference--not bad)
  • Automatic referencing/dereferencing

Verbose errors

Enable nom's verbose errors feature and produce syntax errors that actually contain the line and character where things went wrong.

  • Add errors to operation generation
  • remove any bare unwrap() calls from the brain.rs executable
  • Error messages contain the filename, line number and character number of the error
  • AST nodes all have Span structs with information about the source line they came from (for code generation errors)
  • Pass source position information through every stage of compilation (as much as possible)
  • Pass source position information back through errors when errors are detected

Dynamically sized array

There needs to be some thought put into the design of this and how the required operations would work.

  • Moving the array to another memory location
  • Pushing elements to the beginning of the array
  • Inserting elements at certain indexes
  • Dynamically sized on the stack? (Might not even be possible)

Possibly useful: https://www.youtube.com/watch?v=kPR8h4-qZdk

On hold until #11 is complete.

Traits and Operator Overloading

Currently, we fake traits in the compiler and just assume a bunch of stuff about every type. No custom types are allowed to decide which traits the fulfill.

  • Trait resolution
  • Defining new traits
  • Generic traits
  • Implementing traits for custom types
  • Traits for operators
  • Trait objects as function parameters
  • Static vs. dynamic dispatch

Basic loops and conditional statements

Support the following syntax (and the examples below):

// cat program
// while condition can be an `in` statement, or valid expression of size 1 byte
// Continues so long as the given byte is not zero
while in ch[1] {
    out ch;
}
  • basic if statements
  • while loops
  • infinite loops
  • nested loops

Implementation design

if statements

// general form
if <expr (size = 1)> {
    // <statements>
}

// examples:
in b[1];
if b {
    ...statements...
}

if "a" {
    ...statements...
}
  • evaluates the given expression, the resulting size must be 1
  • wraps the body of the if in brainfuck loop instructions so that the body is only evaluated if the result of the expression is non-zero
  • the first part of this loop body must set the condition result cell to zero so that it doesn't run again
  • else if and else statements will be implemented in #20

while loops

// general form
while <expr (size = 1)> {
    // <statements>
}

// examples:
in b[1];
while b {
    ...statements...
}

// not actually a good idea...
while "a" {
    ...statements...
}
  • pretty much exactly a brainfuck loop using the [ and ] instructions
  • implemented much like an if statement
  • steps:
    1. generates code for the condition expression and puts the resulting value in a cell (the condition cell)
    2. starts a brainfuck loop [
    3. generates code for the loop body statements
    4. then generates code to zero out the condition cell
    5. generates code for the condition expression again
    6. ends the brainfuck loop ]
  • break and continue are not supported yet. We may be able to implement them based on the clever ideas in #20 but there are no concrete plans to support these yet. If someone can figure out the branching logic, please comment in #20.
  • while loops also support type declarations with the in keyword before them as the condition which is basically equivalent to:
    in a[b];
    while a { ...stuff... }
    

nested loops

  • nested loops should work with this design out of the box with no further consideration

Generic Types

Depends on #16 before this can be implemented. Probably need #44 before this is particularly useful.

  • allow definition of generic type parameters in functions, structs (#39), enums (#41), etc.
  • generic type bounds
  • Remove ItemType::Any and replace any usages with type bounds
  • Make sure types work at declaration and don't have to be ordered in a specific way (you can order types in any order in the file)
  • Defaults for generic types

Explicit Type Declarations

Types! ๐Ÿ˜€

Initially types were left out because there was only one type: raw byte arrays.

Before 1.0.0, in order to avoid immediately jumping to 2.0.0, we should support the following syntax:

let a: byte[] = "Hello, world";
let b: byte[3] = "abc";

// these two are the same
let c: byte = "c";
let d: byte[1] = "d";

// this syntax needs work
in q: byte[1];
// might just want to support:
//let q: byte[1] = read();
// where read is a special function?
// if we do this, we should also change out:
//out("a =", a, "\n");

Eventually, we'll support more types like int and maybe even user defined types! This syntax will help us do that in the future.

  • Implement this in the parser
  • Update all examples to use the new syntax

Split into multiple crates

This needs some detailed investigation and planning. Priority is low until MVP is finished.

Cargo supports workspaces. Investigate splitting the repo into multiple crates. Brain and brainfuck are pretty different projects and don't need to share very much code. To deal with this, brainfuck has evolved to be one big file containing all of its code. It could benefit from some splitting and refactoring.

Ideally we should have a structure like so:

<repo root>
    - compiler/         (formally the files in src/)
    - brainfuck/
    - debugger/        (formally brainfuck-visualizer)
  • Split brain and brainfuck into different crates
  • Move brainfuck-visualizer into its own repo

Advanced Branching (if, else if, else)

There are some really smart people coming up with interesting ways to implement various features in brainfuck. The Brainfuck algorithms wiki page is particularly useful.

It contains several example implementations of if (x) { code1 } else { code2 }.

We can use these people's techniques to implement more advanced branching patterns into brainfuck.

  • Implement if, else if and else statements with support for an arbitrary number of branches
  • Investigate break and continue after this implementation and possibly create an issue for implementing those statements (see #28)

Low priority until MVP is finished.

Dynamic Memory Allocator

This is the beginning of being able to have dynamically allocated objects in brain (instead of the static things we have now).

We need a malloc-esque API for memory allocation (without the downsides)

Once we have this kind of "heap" memory, not every type will be copy anymore. The code generation will need to be fixed to only copy when it is appropriate.

The heap could start at the index before 0 since all static allocation happens to the right of that and the Brainfuck tape is infinite in both directions.

Resources:

Atom Grammar

Create an atom grammar for the brain programming language. Would be nice to have some decent syntax highlighting for all .brn files. Looks like the easier way would probably be to convert a TextMate grammar to atom syntax. Still need to research this more.

Publish under language-brain

More information here: https://discuss.atom.io/t/creating-a-custom-grammar-or-language-package/16711

Could just copy language-rust to a new repo and change it sightly until the language actually significantly deviates from rust syntax.

Modules

Must be implemented in tandem with: #59

Support the following syntax.

For declaring modules within a package.

// private module
mod motion;

// inline module
mod foo {
    // code for this module goes here
    // can refer to scope above using super
    use super::*;
    //pub blah = ...
}
// foo::blah can be used in this module

// Exposed as just modules since these are less often used and more sensible to access this way
pub mod screen;
pub mod shapes;
pub mod pen;

pub use motion::*;

For linking an external library:

// use package is a compound keyword specially used for linking
// It is only valid to use this statement in the root module for a package
use package something;

use something;
use something::foo;
use something as bar;
use something::foo as bar;
use something::{foo as bar, spam};
use something::{self, foo as bar, spam};
use something::*;
  • support spans across files (implement a code map like rust)
  • local module resolution
  • being able to use functions from other files/directories
  • using external libraries (depends on #38)
  • every module has its own scope/namespace
  • able to access module public items using paths
  • make prelude modules actual modules instead of just abusing scopes
    • namespace prelude modules under std:: and use std::<modname>::* for each of them in prelude's root

Implementation Notes

  • Scopes/modules really only need to contain type definitions and size definitions. The definitions should all be compiled once into a reasonable intermediate form and then reused during code generation whenever that function is called in another module.

  • A module is really just a synonym for a single isolated level of scope.

  • The use keyword is a type of declaration which declares a name in the current scope that maps to a module definition

  • When as is used in use to rename a module in the current scope, it's really just changing the name that the module is being declared as

  • Any path can be included with the use keyword because it simply declares things in the current scope

  • To avoid unnecessary copying, it may be prudent to optimize for when an entire module is included in a scope (using *). Rather than copying the declarations of the entire module into the current scope, we should implement "linked scopes". With linked scopes, each scope level in the scope stack also maintains a vector of references (ids or otherwise) to other scopes. These are used as backups for lookups that aren't found in that scope level before traversing down to other scope levels. This could be implementing by upgrading the Scope type to an actual struct rather than just a type alias. Since modules are just single scopes, they can work well with this idiom. Each scope would have to have an ID and modules would have to be declared elsewhere in order to avoid re-evaluating the same module over and over again.

    • If this is too complicated, it may be sufficient in the short term to just do the long copy
    • something like this is also necessary to avoid accidentally overwriting things from other modules. We still want all of those options to match. The linked modules should be added in order so that the correct definitions take precedence over others. Test this.

Compiling Libraries and Linking

In order to support libraries, we want to be able to compile brain code into a special library format akin to C/C++ object files and Rust rlib files. There should be a command line argument which compiles some code into a library and outputs a file in the library file format.

Library Format

The library file format should be well defined. It should at minimum contain:

  • A magic number or pair of numbers to differentiate it from any other binary file
  • Version number that updates if and only if there are breaking changes made to the structures that will get stored in the file
  • All public functions exposed by the library and ideally only those functions (this might lead to some bloat)
  • Type definitions for all functions in the file
  • Body definitions which are low-level enough to not take a toll on compilation but high level enough for optimization to still take place
    • In a compact binary format like msgpack
    • Parameters should be in a format that is position independent (see #16)
    • Format should be resiliant enough to not change too dramatically when things like structures and unions/enums are added
  • Debug information if at all possible

TODO:

  • Add linking stage to compiler which resolves external types for static analysis and external definitions for actual compilation and optimization
  • Precompiled prelude distributed with the compiler
  • Be able to distinguish between user code and library code for better error messages and compiler errors (don't complain about outside crate warnings)

Full Type Inference

Currently, we require every variable to have an explicit type annotation. This can get quite tedious and add a lot of extra noise to programs.

Support:

let foo = 8;

Derive the type from the usage of variables in expressions and function types.

Indexing and Slicing

Depends on #46 before implementation.

The assignment syntax on its own is quite limiting because you can only assign to buffers that are the same size as each other. Indexing allows us to slice and copy parts of buffers into parts of other buffers. This is convenient while remaining memory-safe and easy to reason about.

Support the following syntax:

let a: [u8; _] = "Hello, world";
let b: [u8; _] = "abc";

// outputs "ello"
stdout.write(a[1..5]);

// still invalid:
// a = b;
// b = a;

// After this, a will be "Habco, world"
a[1..4] = b; // equivalent a[1..4] = b[..]

// After this, b will be "abd"
b[-1] = a[-1];

// Error to assign different sizes
//a[1..5] = b[1..];

The range syntax [start..end] contains the range of cells from start to end-1. You can leave off the start or end of the range to imply either 0 or (size - 1) respectively. Using a negative index (idx) is the same as saying length - idx.

Brain Debugger / Visualizer

An important part of every great language is its tooling. We need a debugger so we can figure out what is going wrong in our brainfuck programs.

1. Brainfuck Interpreter Better Implementation + Debug Mode

  • Implement an infinite tape in the brainfuck interpreter using VecDeque
  • Implement a debug mode (-D/--debug) which given a program, runs that program, then outputs to stderr a CSV of info like instruction number, instruction as a char, pointer address (index in the tape), memory dump (dump of the entire tape so far)
  • Additional arguments setup IPC so debugging commands can be sent for (pausing, resuming, stepping, setting delay, and setting breakpoints)

This will give people the ability to hand debug their brain programs without the interface if they so desire.

2. Brainfuck Visualizer

An electron UI that uses the debugger in a subprocess to output this information.

  • Breakpoints
  • View brain statements and corresponding brainfuck instructions
  • Highlight statements and instructions as they run
  • Display output and errors

3. Sourcemaps (optional)

Booleans and Advanced Conditional Expressions

  • Add a boolean type and the two keywords true and false which evaluate to a single cell with values 00000001 and 00000000 respectively
  • Support not x
  • Support x and y
  • Support x or y
  • Support x == y for all types
  • Support x != y for all types
  • Support x <= y for numeric types #11
  • Support x < y for numeric types
  • Support x >= y for numeric types
  • Support x > y for numeric types
  • etc.

These are all new expressions that should be added to the expression parsing.

Unit Testing

We've been good to include some idea of what is valid or invalid in the example files, but we don't have any good tests to verify if these cases actually work as expected. We're only testing on perfect input.

We want to test things like

  • syntax errors
  • whitespace parsing robustness (allowing whitespace in all the right places)
  • string parsing edge cases (white space on the ends, escapes, etc.)
  • and more

Part of writing unit tests will depend on having #5 - Verbose errors implemented since we'll want to make sure we can give proper feedback when an error occurs instead of panicking.

  • Setup code coverage with kcov and codecov
  • Setup travis build and add badge to README

Variable Declaration

Support the following syntax:

// automatically determine the length of the right
a[] = "Hello, world\n";
// Outputs "Hello, world\n"
out a;
// error if rhs is not the length declared on the left
b[3] = "abc";
out a;
out b;
out "\n";

// invalid because different sizes (addressed in #3)
//a = b;

// copies a to c. c is the same size as a.
c = a;

out "b = ";
out b;
out "\n";
b = "dbd";
out "b = ";
out b;
out "\n";

// Should accept any [a-zA-z0-9_] as identifier names
// Cannot start with a number

Struct Methods

Depends on #39 before implementation.

Essentially enables the definition and use of methods on structs. The syntax is already supported for things like stdin and stdout.

  • Support implementation of these methods (impl)
  • Only allow one definition per method name (no overrides)
  • Static struct methods accessed via path (::) syntax (no self argument)

Built-in Testing Framework

Testing framework for writing and running simple unit tests.

Defined in a tests submodule in every module or in a separate tests directory (similar to Cargo and Rust).

  • Assert methods
  • Custom error messages
  • Automatic test discovery
  • Useful unit test output report
  • Add tests to some packages to try it out

Blocked on dynamic string support (#13) and package manager support (#31).

Online Playground

Similar to: https://play.rust-lang.org/

It would be cool if there could be a website where people could try the brain compiler. They should be able to type brain code, run the compiler and then see/download the brainfuck code. Eventually they should even be able to run the brainfuck in the browser and even provide input interactively. (cpp.sh is an excellent example of a site that does this really well)

This will eventually belong in a separate repo, however this issue is the starting point of planning the project. The repo will probably be called brain-lang/brain-playground or something.

Ideas

Backend

Server: https://rocket.rs/
Hosting: Heroku with Rust buildpack

  • Uses brain as a library and configures it using the options provided by the frontend
  • Returns the compiler output (error or not)

Frontend

Usual technologies
Editor: https://ace.c9.io
Hosting: GitHub pages

  • We could potentially buy a domain too

Input Statements

Supports the following code:

// input requires explicit sizing
// always reads exactly this many characters or panics if EOF is reached before then
// if this many characters aren't available yet, it waits for you to send that many
in b[5];
out "b = " b "\n";

c[1] = "c";
in c;
out "c = " c "\n";

// You can reuse allocated space again
in b;
out "b = " b "\n";

Use Rust compiler to avoid common mistakes

We should see if there is some way to get the compiler and type system to prevent us from forgetting things like closing a loop, freeing allocated memory, move back to the left after moving right, etc.

It's too easy to forget to do that right now and it's causing tons of bugs. See codegen/if_condition.rs for a file full of these problems.

The giant documentation comment at the top of codegen isn't enough.

Possible Options

Closures

instructions.move_to(pos, || {
     //...generate more instructions...
});

instructions.jumps(cond_pos, || {
     //...generate more instructions...
});

Pros:

  • nice syntax
  • automatically ensures balance declaratively, no type system magic required

Cons:

  • might run into lifetime issues if we have multiple of these in a row (needs to be tested)
  • not totally flexible since we might actually want one of those instructions without the other (we could probably just leave in the current functions for that use case)
    What's the equivalent of this?
    instructions.move_right_by(cond_cell);
    instructions.jump_forward_if_zero();
    instructions.move_left_by(cond_cell);

Structs

This is probably the best option. It's reasonably flexible and doesn't suffer the downsides of the closure method.

// must_use prevents this from being ignored when returned by a function
#[must_use]
struct Movement {
    right: usize,
}

impl Movement {
    // This consumes self, so the borrow checker will complain if this is used more than once
    fn left(self) {
        //...
    }
}

fn move_right_by(n: usize) -> Movement {
    //...
    Movement {right: n}
}

Pros:

  • Completely flexible, easy enough to add to existing code

Cons:

  • Lots of initial boilerplate to setup (individual structs for each operation) - not that bad

Pattern Matching and Match Statements

Pretty much required to make #41 usable.

  • new match statement added to grammar
  • pattern matching on struct fields, tuples, enum variants, etc.
  • Ignore struct fields with ..
  • Wildcard _ or identifier to match all
  • Multiple patterns separated by |
  • Bind names with @
  • irrefutable patterns can be used in assignment or as function parameters

Reference:

Refactoring

The first pass through implementing the compiler was good enough, but the approach isn't very robust. This prevents us from growing the codebase in a reasonable way without a lot of issues.

In particular, the following is hard right now:

  • implementing other types
    • everything was done with arrays of bytes in mind, so UTF-8 strings, numbers, custom data types, etc. are difficult to implement
  • error handling
    • the nom parsing framework doesn't generate the best error messages on its own, we also don't pass position information very well between stages of compilation
    • this prevents us from providing valuable error messages.
  • code generation
    • code generation is a very naive process
    • we are basically appending large amounts of Instruction enums to a Vec and then converting that to a string
    • we need something higher level to avoid using too much memory and to be able to generate smarter code more robustly (without so many silly bugs)
  • optimization
    • Complex optimizations are almost impossible because of the problems outlined above
    • Without context, it's difficult to make more interesting optimization decisions

As you can see, there are problems in nearly every stage of compilation.

This issue will outline a plan to address each of these issues and improve the overall quality of the code.

Plan

New Compilation Stages

  1. Source - the brain source code to be compiled
  2. Parsing --> AST
    • Position information is embedded into every node of the AST denoting the exact index in the source (and possibly the filename) where that node was found
    • This is the closest representation to the original syntax
    • Deguaring:
      • continue and break
      • operators like +, -, /, *
  3. Static Checking - "Complete" AST (AST, HashMap<Name, Item>)
    // Forces us to enforce that variables are initialized before used
    enum Item {
        Defined {type_def},
        Initialized {type_def},
    }
    • Desugared AST with all names resolved and complete type information
    • Looks up names and determines the type of each
    • Any type inference is resolved here
    • Any size inference is resolved here
    • We check for type compatibility and make sure everything makes sense
    • Types are associated with some Rust struct which knows how to perform operations
    • The AST is immutable from this point on -- we don't want to make any changes
    • Complains if names defined in a non-global scope are used where they are not accessible
  4. Operation IR --> Vec<Operation>
    • AST is transformed into a tree of operations
    • We define our own set of primitive operations which (for the most part) we can later translate into brainfuck
    • We use these operations to transparently handle memory without explicitly building the static memory layout here
    • These operations are easier to work with and more sophisticated than the default brainfuck instructions
    • A lot of the pitfalls are avoided like forgetting to close loops, dealing with relative movement and memory allocation
    • Rust/pseudocode Operation enum:
    enum Operation {
        /// Allocates the given size in bytes on the tape so that it is not used by any other code accidentally
        /// The id is generated automatically and represents the position which will eventually be determined when the memory layout is generated
        /// Nothing is guaranteed about the position other than that `size` consecutive cells including the position will be available for use without conflicts
        /// This is also used to ensure memory is automatically dropped at the end of its scope (stack frame)
        Allocate {id, size},
    
        /// While most allocations can be dropped at the end of their scope, temporary cells
        /// should be dropped as soon as possible so that they are available in the memory
        /// layout again as soon as possible
        /// This way we can avoid a lot of unnecessary move operations over cells
        /// that aren't being used anymore
        /// While this could be an optimization as well, temporary cells in particular are
        /// *known* to have this property since we generate temporary cells in the compiler itself
        /// The temporary cells are guaranteed to last for the duration of the given body
        /// They are then freed immediately afterwards
        TempAllocate {id, size, body: Vec<Operation>),
    
        /// Frees the given memory id and all cells associated with it
        /// Typically not used unless an explicit free is necessary before the end of the scope
        /// Freeing means both marking those cells available and zeroing their values
        Free {id},
    
        /// Move to the position associated with a particular memory id
        MoveTo {id},
    
        /// Perform a movement that returns to the reference position (usually cell 0)
        ReturnToReference,
    
        /// Increment the value of the current cell by a certain amount (relative to whatever the current amount in the cell is)
        Increment {amount},
    
        /// Decrement the value of the current cell by a certain amount (relative to whatever the current amount in the cell is)
        Decrement {amount},
    
        /// Read bytes into `size` consecutive cells
        /// Note: this generates both read instructions and move right instructions
        Read {size},
    
        /// Write bytes into `size` consecutive cells
        /// Note: this generates both write instructions and move right instructions
        Write {size},
    
        /// Set the value of `size` consecutive cells including the current cell to zero
        /// Note: this generates instructions to zero the value and move to the right
        Zero {size},
    
        /// Loop with the given operations as the loop body
        /// A loose restriction on loops is that they should return to the reference position whenever that makes sense
        /// In well behaved code, this can usually be done by inserting a ReturnToReference operation at the end of the loop body
        Loop {body: Vec<Operation>},
    
        /// Copy the cells at the position of sourceId to the position of targetId using a single temporary
        /// cell at the position of tempId
        /// Only copies up to the size of sourceId
        /// sourceId and targetId must have the same size
        Copy {sourceId, targetId, tempId},
    
        /// Relocate the value at the position of sourceId to the position of targetId
        /// Leaving zeros where the source data was originally
        /// sourceId and targetId must have the same size
        Relocate {sourceId, targetId},
    }
  5. Static Memory Layout --> (Vec<Operation>, HashMap<id, Cell {position, size}>)
    • We use the operations to figure out a memory layout and map the allocated ids to actual positions on the tape
    • This is a separate stage so we have the most information possible when doing this
    • Drops allocated memory at the end of its scope
      • The end of scope is the end of a loop or the end of the main program
    • Complains if memory is used after it is freed
  6. Optimization --> (Vec, HashMap<id, Cell {position, size}>)
    • Operations are performed on the operations tree
    • Each optimizer walks the tree and returns a new one with the operations performed
  7. Compilation --> Vec
    • Operations are transformed into Instruction instances representing the actual brainfuck instructions
    • Compilation can accurately keep track of position information and reliably generate code because it has the entire operation tree to interpret
    • Drops all values declared in loop bodies
    • Operations that can be composed of other operations should be converted first during generation
      • In other words, generation is a pipeline operation of first converting operations to lower level operations and then converting those lower level operations directly to instructions
  8. Brainfuck Optimizations (to be added later)
    • Further brainfuck optimizations specific to streams of brainfuck instructions are performed
  9. Brainfuck
    • The Instruction struct is transformed into a string representation which can then be written into a file

Representing Types

We need to do several things here:

  1. Associate type names like "u8" or "[u8; 7]" with internal structs which represent those types
  2. Be able to lookup method names like "write" in internal structs and type signatures
  3. Be able to apply functions to internal type structs

Approach:

/// Operators are things like Add, Subtract, Concatenate, Call, Write, Read, etc.
#[derive(Clone, Copy)]
enum Operator {
    /// Read values into the type using Read operations
    Read,

    /// Write the type using Write operations
    Write,

    /// Decrement the given type by 1
    ///TODO: This is essentially a placeholder for use before we get proper numeric support
    ///TODO: Eventually this will be replaced by a Subtract operator or something
    Decrement,

    /// Call a specific method name (e.g. len) on the type
    /// The method name to call is the first argument in the args Vec
    /// Return ApplyError::UnsupportedOperator if the given method name is not supported
    /// Return ApplyError::UnsupportedArgumentTypes if the types of the given arguments are not supported
    CallMethod,
}

trait BType {
    /// Returns the name that can be used in brain code to refer to this type
    fn type_name() -> &'static str;
    /// Apply the given function or operator
    fn apply(f: Operator, args: Vec<&BType>) -> Result<Vec<Operation>, ApplyError>;
}

enum ApplyError {
    /// When the given operator isn't supported by this type
    /// Example: Use this when CallMethod is passed with "len" but the type doesn't support len()
    UnsupportedOperator,
    /// When the types of the given arguments are not supported
    UnsupportedArgumentTypes,
}

To Do

  • Tests
  • Travis CI build

Declaration without initialization

Sometimes it's useful to declare a variable first and then initialize it later. The syntax for that would be something like what follows:

// declaration only cannot have an unspecified width
let a: [u8; 5];

a = "12345";
  • appropriate static checking to ensure that variables cannot be used uninitialized. (particularly with branches and loops)
  • Allocation occurs on initialization, not on declaration (so we don't allocate needlessly)
  • lint for when a variable is declared but never used, initialized but never used, or updated and then never used

For Loops and Range Syntax

// The type of i is usize
// i goes from 1 <= i < 10
// This currently must be a static, finite counter
// Variables are not supported as counter limits
// This is basically implemented as a brainfuck loop that goes from
// 0 to (end-start) and start is added to the cell *within* the loop
for i in 1..10 {
    stdout.writeln("i = ", counter);
}

New Syntax

The current syntax isn't perfect. It was designed when we only really had one type (bytes). We eventually want to support an entire type system. I'll probably base the design off of Rust since I love the language but I'm open to all ideas. Most importantly, this has to be a language suitable for generating brainfuck programs and meet the optimization goals of the compiler while still being ergonomic and usable.

All the syntax (syntax.brn)

// complete type inference isn't currently supported, so this doesn't work:
//let foo = "bar";
// statically allocated array of bytes initialized to the given string
// length is automatically determined by the compiler
let s: [u8; _] = "foo bar";
// We can get the length of a string using the len() property
// The type that len() returns is `usize` and for the time-being that is u8
// This mean
// writeln outputs a "\n" at the end
stdout.writeln(s.len());
// The write statement supports a variable number of both identifier and string literal arguments
stdout.write(s, "\n");

// if the length cannot be determined automatically, it must be specified
let a: [u8; 4];
// length must be greater than zero
let b: [u8; 1];
// read needs to be a statement so the type information can be used to determine the length
stdin.read(a, b);

// both sides have to have the same length
if b == "a" {
    stdout.writeln("equal");
}
else if a == "fooo" {
    stdout.writeln("foo");
}
else {
    stdout.writeln("not equal");
}

// A single byte-sized numeric type is supported
// value must be in the range for the type
let counter: u8 = 200;
// the while condition must evaluate to a boolean
while counter > 0 {
    // This is a placeholder function that mutates counter and subtracts one
    // This exists because we don't want to implement a complete set of numeric operations right now
    decrement(counter);
    stdout.writeln(counter);
}

// the type of i is usize
// i goes from 1 <= i < 10
// This currently must be a static, finite counter
// variables are not supported as counter limits
// This is basically implemented as a brainfuck loop that goes from
// 0 to (end-start) and start is added to the cell *within* the loop
for i in 1..10 {
    stdout.writeln("i = ", counter);
}

Name guesser (names.brn)

// Asks for the first letter of your name, then tries to guess your name
stdout.writeln("MAGIC NAME GUESSER");

let prompt: [u8; _] = "Enter the first letter of your name:";

let c: [u8; 1];
stdout.writeln(prompt);
stdin.read(c);

while c {
    stdout.writeln("My guess is that your name is:");
    if c == "a" {
        stdout.writeln("Alexander");
    }
    else if c == "e" {
        stdout.writeln("Emily");
    }
    else if c == "m" {
        stdout.writeln("Matthew");
    }
    else {
        stdout.writeln("I don't know.");
    }

    stdout.writeln(prompt);
    stdin.read(c);
}

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.