garrisonhh / fluent-prototype Goto Github PK

View Code? Open in Web Editor NEW

7.0 7.0 0.0 864 KB

a clear, explicit, and beautiful systems language.

License: Other

Zig 99.73% Python 0.19% Shell 0.09%

fluent-prototype's People

Contributors

Stargazers

Watchers

fluent-prototype's Issues

Images: a generic execution model

In the development of fluent, something I've realized that has created messiness and more haphazard solutions than I would like is the issue of where fluent values 'live'. Currently they persist in 3 places throughout the compiler backend: as Exprs in Env, as raw Values in SSA, and then as immediates in compiled bytecode. Since I've mostly been working on pure computation with numbers to get to a proof of concept stage, this hasn't been a gigantic issue, but now it's becoming clear to me that I must come up with a long-term solution.

the Image

Taking some inspiration from Smalltalk, the ELF executable format, and Linux processes:

All of the memory for Exprs and their associated data, as they are semantically analyzed and stored through the VM, should be stored in a simulated 'static' address space
- While the compiler and dynamic runtime should be able to manipulate this data, when code is statically generated this will be observable and addressable but not writable (e.g. .rodata)
The stack and heap for the VM can be given their own associated sections as well
Dynamic 'pointers' can be handles into these sections that looks something like this:

const Section = enum (u2) {
    static,
    stack,
    heap,
    /// relative to the current stack frame
    frame,
};

const Ptr = packed struct (u64) {
    section: Section,
    index: u62,
};

This design would allow me to compile pointers down to bytecode and back incredibly easily, and also means that pointers have a high-level understanding of where they come from. This would also allow me to implement error-checking for things like returning a pointer to a local function variable.

implementing Image

A super naive solution would be as simple as:

const Image = struct {
    static: std.ArrayListUnmanaged(u8),
    stack: std.ArrayListUnmanaged(u8),
    heap: std.ArrayListUnmanaged(u8),
};

Even with a dumb stupid implementation like this, Images would fit really nicely into Env, and would allow me to remove a lot of complexity from the Env scoping mechanisms and the SSA data format, and grant a lot of immediate power to the bytecode vm.

Aliases

Image immediately solves the issue of aliasing values outside of a function scope. Exprs can have a type of whatever they reference, and then have a variant field of alias: *Expr which points to some memory that outlives the current scope. This is trivially compilable to SSA/bytecode.

Static Codegen

With Image, Fluent's execution model gets even closer to ELF binaries, and it becomes very clear how to translate all of the data stored in the Env into a binary.

Aspirations

Replacing raw pointers with a handle-style implementation for the dynamic runtime means:
- At the time of static codegen, I should be able to mark and sweep the entire Image, allowing me to produce really tiny binaries.
- Pointers always know where they come from, so even in really complex cases I can very straightforwardly check whether a pointer lives, points to a dead stack address, etc.

quote/unquote

metaprogramming is a central goal of fluent. I think the syntax I want is something like this:

a :: '(3 + 4) # quote with unary `'`
b :: $a # unquote with unary `$`

this will compile to fluent data structures that represent the same information as a TExpr. the types of these expressions should be something like code T where T is the type of the code when executed.

`compiler_string`

In low-level languages, a common problem is how to semantically analyze a string literal. Under different circumstances, you may want this string as an array (value) or as a const slice. This leads to awkward situations.

In C, where string literals are const char *:

// I want to do this:
char []str = "my string";
// but I have to do this:
char [64]str;
strcpy(str, "my string");

// this works like I want it to:
const char *str = "my string";

In zig, where string literals are typed as *const [N]u8:

// I want to do this:
const str: [_]u8 = "my string";
// but I have to do this (I think this works?):
const str: [_]u8 = "my string".*;

// I want to do this:
const str: [*:0]u8 = "my string";
// but I have to do this:
const str = @ptrCast([*:0]u8, "my_string");

// this works like I want it to:
const str: []const u8 = "my string";

Zig made a step up above C here, and zig also innovated with the comptime_int and comptime_float types which allow you to express number literals in a manner that is both transparent linguistically and to the type checker. So here I want to propose that string literals are typed as comptime_string, which coerces to a variety of appropriate types:

// sometimes I want all of these things
str :: as []const u8 "my string"
str :: as [64]u8 "my string"
str :: as [*]u8 "my string"

This is basically just a simple addition to the type system that removes unnecessary noise.

Separate mapping names to types from mapping names to values

Currently, types and values are stored alongside each other in both the Env and the sema stage. This is intuitive, but immediately creates problems for things like out-of-order definitions. The inseparability of type and value also creates the problem that I can't typecheck an AST expression prior to generating the typed expression associated with it.

Expr itself is already just a wrapper over a TypeId and a pointer... this is a clean, data-oriented solution that fits these problems like a glove. This also carves the path for interesting language features like traits/typeclasses, which are type information that is agnostic from value semantics.

This is a perfect fit for the fluent compiler architecture. I need to just keep reading more compilers, wow.

static codegen

SSA generation is more than mature enough to start compiling it to static code. I think qbe is the natural choice for my first stab at this, my SSA maps almost 1-to-1 to it.

a better prophet

SSA's Prophecy and bytecode's RegisterMap both analyze fluent SSA IR to allocate registers for the bytecode VM, and by being disconnected they are limited in their own ways.

A better design to replace these two data structures would be to give Prophecy knowledge of the VM register count and track the stack size for SSA during lowering. because this would make function stack size a known value prior to compiling SSA, this has the additional consequence that instead of using alloca for every stack allocation, I can alloca a static amount at the beginning of the function and then just directly compile in stack offsets.

structured data types

structs
algebraic types
- atoms
- tuples (product types)
- variants (sum types, tagged unions under the hood)

handle table for managing TExprs

data representation in memory and TExpr have diverged a bit when it comes to things like pointers, arrays, and slices. This mostly stems from the current convention that TExprs own all of the data they reference, which is definitely not the execution model. I think the most intuitive replacement strategy is to use a handle table, which maps much more closely to the memory model.