GithubHelp home page GithubHelp logo

jparoz / huck Goto Github PK

View Code? Open in Web Editor NEW
4.0 4.0 0.0 2.87 MB

Purely functional language which compiles to Lua

Home Page: https://github.com/users/jparoz/projects/1

Rust 99.40% Shell 0.60%
functional-programming huck language lua programming-language transpiler

huck's People

Contributors

jparoz avatar

Stargazers

 avatar  avatar  avatar

Watchers

 avatar  avatar

huck's Issues

Add a Char type

Is this necessary? The Lua representation is likely to still be just a string; but maybe it might be useful to have the Huck type of some operations return a distinct type.

Lazy values

Maybe with lazy language keyword.

Implement a framework for program optimisations

Essentially we want to be able to provide plugins which transform IR to IR, which (hopefully) make the generated code better in some way. At this point in the compiler, the IR is guaranteed to be correct; so any of these optimisations must be correctness-preserving.

  • Implement the framework around these optimisation modules (called plugins?)
    • Decide on the mechanism used to determine which optimisation to run at any given moment, possibly multiple times or not at all.
  • Use property testing to ensure that all optimisations are semantics-preserving at a Lua level
  • Implement some optimisations

Testing framework

This might be nice to have; it's really nice in cargo! But unsure exactly what shape this would take.

Type-level binops

Enable creating types such as type a :+: b = Left a | Right b. Specifically, this does not refer to binop type constructors (see #12 for those), but type-level binary operators.

  • Make sure that precedence declarations work somehow (maybe type infix :+: 3;?)
  • The function type -> can probably be made an instance of these type binops.

Change the way that `foreign import "string"` works

Instead of passing the string to require without any processing, we should:

  • Require that any imported Lua files be passed (by expected filepath?) to huck
  • Build a collection of all the given Lua files
  • When we process e.g. foreign import path/init.lua, check each of the given Lua files to see if the end of their filepath matches the given path after foreign import
    • If there is exactly one match, give that file to loadfile (?) in the generated code
    • If there is more than one match, or zero matches, throw an error

Questions:

  • Is this the best way of resolving modules?
    • Note that with loadfile, there's no way to import a Lua module written in C. This is a significant loss.
    • Maybe the best way to handle all this is just to make the user pinky swear that their environment is configured such that all needed modules will be on their package.path or package.cpath as appropriate, including compiled Huck modules; and just give the module name to require everywhere. That is, for a Lua module from file foobar.lua, the module name is foobar; for a module compiled from baz.c, the module name is baz; and for a Huck module, the module name is defined at the top of its source, e.g. module Quux; has module name Quux. Thus, if require is given the module name require("Quux"), then it should find the compiled Huck module Quux;.
      • A consequence of this is that require("Foo.Bar") to search for a Huck module Foo.Bar; will look in the filepaths Foo/Bar.lua and Foo/Bar/init.lua by default. This will have to be expected as default behaviour, and should be documented so that environments with different searchers can adjust accordingly.
  • How does a user communicate that they actually need to use require, instead of loadfile?

After the above thoughts, closing in favour of #48.

REPL

Do we do this, or do we just say "use your runtime's Lua REPL"? I think there's probably good development value in having a dedicated REPL, although it will take a fair bit of work to implement; but being able to query types of things is great by itself, along with general convenience of typing code in a REPL. A lot of the work will probably be shared with LSP features anyway.

Possibly should be a separate executable, maybe hucki, or maybe part of a build tool #23 .

See also #32

Write even more tests

  • Parsing edge cases
  • Codegen edge cases
  • Typed expressions, i.e. foo = (bar : Int); bar = unsafe lua {nil};
  • foreign exports
  • Property testing?
  • Test that the behaviour of the compiled Lua code is expected (basically integration testing of compiled Huck libraries)
    • Standard library

Make a new build tool: `finn`

Separate out project management concerns into a build tool, which finds and prepares files according to some sort of project manifest, and then hands it all to huck. Similar to the separation between rustc and cargo.

Structuring this as a separate build tool means that different workflows can be customised as specifically as needed, but we can still provide a somewhat turnkey experience in general. The project manifest format should be able to be fairly flexible in the first place, but the build tool should technically be optional to use the language; i.e. the build tool should be replaceable without modifying the compiler.

Subcommands

  • finn new
    • .gitignore ignoring /output
    • src directory
    • finn.yaml with sensible defaults
  • finn build
  • finn repl #22
    • Should automatically bring all modules from the manifest into scope (qualified)
  • finn test
    • What should this do? #26
  • finn add https://github.com/example/Module (adds a dependency to the manifest)
    • Need a version number?
    • Should this instead refer to some central package registry?

Features

  • Package management
    • Probably using git repos as the hosting method
  • Local file structure
    • Need to be able to specify (with sensible defaults) for each module:
      • input filepath (.hk file)
      • output filepath (generated .lua file)
      • input file directory (folder of .hk files)
      • output file directory (folder of generated .lua files)
    • Input file search strategies
      • All .hk files in a given subdirectory of the project folder
      • Every input filename specified manually
    • Output filename generation strategies
      • Each output next to its input file
      • All outputs in one big directory
      • Every output filename specified manually
  • #21

Finalise language syntax

Decisions to make:

  • Semicolons and curly braces vs significant indentation
    • Both, or just one? Probably pick one and stick with it.
  • Comments
    • Probably change to -- comment to match Lua
    • Remove/change (* comment *) syntax?
  • Custom binops
    • A la Haskell (a ++ b = ...; infixr 4 ++;)?
      • PRO: Currently implemented this way
      • PRO: Haskell is a much bigger language to draw from
      • CON: Can never remember the order of the things
    • A la Purescript (concat a b = ...; infixr 4 concat as ++;)
      • PRO: Easier to remember the order (because the "as" goes at the end)
      • PRO: Functions are always defined with a name, which helps with documentation and readability
      • PRO: Possibly less ambiguity (mental, not parsing ambiguity) WRT what is a pattern match and what is part of the defined name on a LHS
      • CON: Not really necessary

Change the way output filenames work (by default)

Rather than copying the Huck file's name and just changing the extension, we should take the Huck module's path (e.g. module Foo.Bar;) and do something similar to what Lua does by default: replace dots with directory separators, and write the compiled output to the filepath Foo/Bar.lua (making directories as needed). Then we can give Huck module names to require, and it will just work as long as the generated output folder gets put into the user's package.path.

  • Generate output filepaths from the ModulePath
  • Make all the needed directories before writing the files
  • Change what codegen gives to require
  • Have a default (and configurable via CLI option) output directory for compiled files

Obviates any changes needed to also close #45

See some thoughts in #47

Use immutable data structures from `im`

im

Particular opportunities:

  • name::resolve instead of manually tracking what's in scope and using bind and unbind, use a new method subscope (or something)
  • typecheck::Typechecker's m_stack field

Fix up the semantics of IO

Conceptually, an IO Int represents an effectful action, which when executed, yields an Int.
The Lua function function() return 5 end seems like something which should have the type IO Int; but as currently implemented, the function has type () -> IO Int.

The probable fix for this is to change the representation of an IO Int from being "an Int which possibly comes with side-effects" into being "a Lua function which takes no arguments, and returns an Int, possibly also causing side-effects".

Basically, in Lua, to "execute an IO action", you do a function call.

Consider how hygiene works around `foreign import`s

e.g. Is any old module allowed to return absolutely any type from a foreign import, or may it only return types defined in the same module?

I don't see any issues with a foreign import accepting any type as an argument, but it should still be considered further whether that's okay also.

Incremental compilation

Basically, if a Huck module hasn't changed, and none of its dependencies have changed, we don't need to regenerate it.

This is probably pretty low priority by itself; but the work needed to do this would have compiler development benefits when running the test suite, as well as probably having a lot of overlap with things needed for the REPL (#22).

Type classes

Implement type classes, similar to those found in Haskell and other languages.

These could be implemented as a new type of Constraint, and integrated into the existing constraint solver; or possibly something more similar to ExplicitTypeConstraint with a separate constraint queue, if the solving doesn't need to be interleaved with the existing type inference constraints.

  • Do notation
    • Is this notation tied to a Monad typeclass, or Applicative, or some other more abstract thing (e.g. it uses whichever >>= and >> operators that are in scope)? Needs more investigation

Interesting paper from SPJ: Tackling the Awkward Squad: monadic input/output, concurrency, exceptions, and foreign-language calls in Haskell

Specify (i.e. decide) what is in scope for a `lua {...}` block

The lua {...} syntax was intended as a sort of escape hatch, to allow for use of Lua libraries/methods/etc. which fundamentally don't have any way to model in Huck. However, it is currently unclear what exactly the programmer should expect to be in the Lua scope into which the lua block will be expanded.

In scope:

  • Any locally-bound variables in scope at the lua block (e.g. function arguments, case arm bindings)
  • Any Lua globals that your runtime provides (e.g. string.find, require, io.popen)

Is this list complete? Should there be any more items? Should any of these items not be allowed?

We can probably enforce this a bit after #31.

Here is an example of a Lua interface to be wrapped in Huck (lua_library.lua):

local Object = {}
Object.__index = Object

function Object.new()
    return setmetatable({x = 0}, Object)
end

function Object:activate()
    self.x = self.x + 1
    print("Incremented x.")
end

function Object:getVal()
    return self.x
end

return Object

Huck wrapper:

foreign import "lua_library" (new as newObject : () -> IO Object);
type Object = Object;

activateObject : Object -> IO ();
activateObject obj = lua { obj:activate() };

getObjectVal : Object -> IO Int;
getObjectVal obj = lua { obj:getVal() };

Flesh out standard libraries to be more useful

  • IO functions
    • print and println
    • read and similar helpers, probably part of a typeclass, for reading from an IO handle (e.g. file handle, stdin)
    • write and similar helpers, probably part of a typeclass, for writing to an IO handle (e.g. file handle, stdout)
  • Numeric typeclass stack (names TBC)
    • Integral and Real etc. (?)
    • Enum (?)
    • Divide, Multiply, Add, Subtract (maybe group some of these?)
    • Ord and Eq
    • others
  • Functional combinators
    • fold, unfold
    • flip
    • id, const, etc.
    • Functor with map
    • Semigroup with append
    • Semigroup a => Monoid a with empty (or zero)
    • Applicative, Alternative, Traversable, Monad, etc.
  • Documentation of standard library
  • How much to provide? Probably anything which needs language support, or anything which will be used in every program.
  • How much should be in the Prelude?

LSP

Is this out of scope for the compiler? Should it be a separate project which uses huck as a Rust library?

For this, we'll need to cache the type of every single ast node, so that we can find out exactly the type of any expression, regardless of where it appears, or whether it's named or not. To support this, we probably want to change most ast nodes to be {} structs and variants, because tuple structs will be obnoxious to use with a () argument everywhere.

Exhaustiveness checking

e.g.

  • foo 1 = True; foo 2 = False; should be a warning or error;
  • bar True = 1; bar False = 2; should be A-OK;
  • baz 1 = True; baz _ = False; should be A-OK.

Make sure this works for:

  • Definitions
  • Case expressions

Don't use `lua-format` for output normalisation

We can probably just use an off-the-shelf Lua parser to parse, and implement some simple renderer of the parsed tree.

This would also be useful in parsing of our actual files, because then we can validate that the contents of lua {...} blocks is at least valid Lua syntax, or validate that it is a valid Lua expression, or even validate that it's a valid Lua expression which only depends on things it's allowed to from the Huck scope! As well as validating foreign export paths.

Binop type constructors

Allow type constructors which are binops, e.g. type MyList e = e :: MyList e | Nil;

This does not refer to type-level binops (see #11 for those).

  • Make sure that precedence declarations work, e.g. type Complex = Float +++ Float; infix +++ 9;

Reduce usage of intermediate variables in generated code

Instead of generating

function(...)
  local _Module_1 = select(1, ...)
  local x = _Module_1
  return x + x
end

we should generate

function(...)
  local x = select(1, ...)
  return x + x
end
  • Note: it's often unclear when this is actually possible. A helpful thing to reference would be the transpilation of a Huck program with lots of definitions with multiple assignments each. This is the main reason we use these: so that each arm of a case can use its locally-bound variable names, while still having multiple Huck assignments be turned into a single Lua function.
  • Should be able to at least not use the module name so often, but something more generic (e.g. instead of _Short_2, use _arg_2)
    • Should be able to restart the count on each top-level function, instead of a globally-unique ID
      • Actually it's probably not on each top-level function, but on each function being closed.
      • Could maybe increment a counter when we start a function, and decrement when we end a function.
    • Will have to be sure that we don't clash between names generated during ir conversion, and names generated during code generation
      • Maybe make a new variant for ResolvedName, which represents a name to be chosen by the code generator. This should clean up the use of leak! in IR conversion as well

Add types `Map`, `List`, and `Table` to the prelude

Map

Map compiles to a Lua dictionary-style table.

  • insert : forall k v. k -> v -> Map k v -> Map k v
  • remove : forall k v. k -> Map k v -> Map k v
  • get : forall k v. k -> Map k v -> Maybe v
  • replace : forall k v. k -> v -> Map k v -> (Map k v, Maybe v)

List

List compiles to a Lua list-style table.

  • get : forall e. Int -> List e -> e
  • insert : forall e. Int -> e -> List e -> List e
  • replace : forall e. Int -> e -> List e -> List e
  • push : forall e. e -> List e -> List e
  • pop : forall e. List e -> (List e, Maybe e)

Table

Basically a product of Map and List, but uses the same underlying Lua table.
e.g. type Table e k v = Table (List e) (Map k v)

  • getList : forall e k v. Table e k v -> List e
  • getMap : forall e k v. Table e k v -> Map k v
  • withList : forall e k v. (List e -> List e) -> Table e k v -> Table e k v
  • withMap : forall e k v. (Map k v -> Map k v) -> Table e k v -> Table e k v

Questions

  • How do we tell the compiler to compile them to Lua tables in the way we want? Some options:
    • Make them builtins (probably pretty neat to implement and use, but could be more elegant)
    • Add a new syntax to describe how to compile a data type, similar to lua { foo:bar() } for describing how to compile functions (not that elegant either, and probably a big overkill when could just make them builtins, and implement a Huck function toList : MyType -> List.
    • Something else?
      • Solved this using foreign imports. If you say you return a List a from a foreign function, and that's the only way to construct a List a, then a List a is defined as whatever is returned from that function.

Decide what is exported from each Huck module

Currently all definitions (including type constructors) are included in the returned table; as well as any foreign exports being assigned. We should either:

  1. Switch to a method of explicit exports, where nothing is included in the returned table except for things mentioned in an export (foo, bar); statement;
  2. Specify precisely what is automatically included. Particularly, are imports automatically exported? Currently no, but how do we do re-exports then?

Probably option 1 is best; we could even add a glob argument e.g. export (..); which exports everything1.

Footnotes

  1. This really just kicks the can down the road, and forces us to specify what the glob operator does. But it might be more predictable for programmers this way. โ†ฉ

Make it possible to have no prelude

Probably using a command line option. Or, maybe this ties in with shifting to a model more like rustc/cargo, where we have a separate build tool to source and provide the default Prelude, which is provided to huck with a command line argument. huck definitely needs to know which module is the prelude, so that it can implicitly import it.

Allow importing type constructors from another module

import Basic (LinkedList, Cons, Nil, map);

results in

Name resolution error: Identifier `Cons` doesn't exist in module `Basic`

Probably want to use Haskell-like syntax:

import Basic (LinkedList (Cons, Nil), map);

Change builtin list syntax `[x]` to describe Lua iterators

This would mean that [1, 2, 3] isn't necessarily an actual list, but a way to iterate over a list. This matches with the common inductive pattern used to define functions.

Possibly this type is best pronounced as "stream", e.g. "[Int] represents a stream of integers."

map : forall a b. (a -> b) -> [a] -> [b];
map f (x::xs) = f x :: map f xs;
map _ [] = [];

Currently, map over the builtin list type actually can't be defined this way, because there's no way to pattern match on its head, only on an entire list.

If we want to represent this as a Lua iterator, then it will need to keep track of:

  • the initial source of values;
  • each operation used on the stream.
    The initial source might be another collection (e.g. a List a), which will compile to a call to ipairs; or it might be an unfold, which will compile to a manually-written (i.e. generated) next function (and accompanying data) in Lua.

When the stream is evaluated, it should compile to a for loop. In each iteration of the for loop, the initial source is transformed through each of the operations used on the stream; and then at the end of the operations, add the final resulting value to the collection type given by the final call to collect (assuming that the stream was collected).
Notably, one of the operations needed is filter; this will be correspond to an early return in the for-loop body (ideally a continue, but Lua doesn't have continue, so probably nested ifs).

Note that the collection as described is inherently a lazy collection. That is, the whole computation may be thrown away if it is not evaluated. Need to be careful to document this, and treat it accordingly with relation to other lazy values implemented.

Functions involving [a]

  • map : forall a b. (a -> b) -> [a] -> [b]
  • filter : forall a. (a -> Bool) -> [a] -> [a]
  • flatten : forall a. [[a]] -> [a]
  • flatMap : forall a b. (a -> [b]) -> [a] -> [b]
  • fold : forall a b. ((a, b) -> b) -> b -> [a] -> b
  • unfold : forall a b. (b -> Maybe (a, b)) -> b -> [a]
  • iter : forall a. List a -> [a] (probably instead of List, using a typeclass called Iterable)
  • collect : forall a. [a] -> List a (similar to above)

Improve error messages

  • Include source code location and information in errors (will have to bubble up from far in the past, probably parsing, probably store in the AST somehow)
    • We will need to store the id: usize from Source::Local on each of the binding sites, in order to point error messages to the correct binding site (and maybe for other reasons I can't remember)
      • This is probably a good opportunity to make the binding/unbinding of variables a bit more statically-verified in some way, to avoid having to do the whole dance: bindings: Vec<_>; for b in bindings.iter() {...}
  • Implement type graphs (from section 6 of HHS02) to detect more specific/better type errors
  • Command line misuse errors with suggestions (really test the thing out)
  • Lua runtime error messages, for when a Huck function is given a mistyped argument
  • grep for 'name clashes'
  • grep for @Error and @Warn

Add record types

Perhaps similar to those found in Elm, these record types will be used to model Lua tables used as objects. While Map String (IO ()) could be considered a sort of similar object in Huck, really we need to have an object which has named fields, and which has different types for each field.

The interface should be something like the following:

type Point = Point { x : Float, y : Float };
thePoint : Point = Point { x = 1.23, y = 4.56 };
theX : Float = #x thePoint
theY = #y thePoint
(* #x : HasField #x r v => r -> v *)

#fieldname can be thought of as an accessor function, which given a record with type r which has a field with the name #fieldname of type v, will access that field. To be clear, in this system, the syntax #fieldname has two meanings:

  • When used in a value-level expression, representing a field accessor function;
  • When used in a type-level expression (specifically type class constraints), representing a field name. This syntax is basically a type-level label, which could possibly be useful elsewhere; so maybe this should be separated out.

Records should also be able to be used anonymously:

aPoint : { x : Float, y : Float } = { x = 1.23, y = 4.56 };
foo = #x aPoint;
  • Implement type classes #13
  • How does pattern matching work?
  • Should a named record (like Point {...} in the first example) be given special treatment, such that Point implements HasField #x Point Float; or should it be only the inner anonymous record implements it? i.e. thePoint : Point = Point { x = 1.23, y = 4.56 }; getX (Point {x, ..}) = x;
    • As written, this is obviously unacceptable. So either need a special case as described; or at the very least, some easier way to pattern match on a named record (e.g. theX = thePoint @ Point {x, ..} -> x;)

See this page on the historical Haskell wiki for some thoughts on possible implementation.

Write more test cases

Particular things to cover:

  • Test that each Error variant is caught in a prototypical case.
  • Codegen
  • Name resolution
  • Backtick binops
  • Module imports
    • Both Huck and foreign
    • Renaming using as
    • Qualified and unqualified
  • Dependency resolution

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.