Better allocator
Currently, the C allocator just reserves 8 GB of memory when the program starts, and, if there are multiple threads, that memory is split between then. So, for example, if there are 8 threads, each one gets 1 GB to work with. A worker can read from other worker's memories (which may happen for shared data, i.e., when they pass through a DP0/DP1 pointer), but only the owner of the slice can allocate memory from it. Each worker also keeps its own freelist.
As such, a huge improvement would be thread-safe, global, possibly arena-based, alloc and free.
Better scheduler
Right now, the task scheduler will just partition threads evenly among the normal form of the result. So, for example, if the result is (Pair A B)
, and there are 8 threads, 4 will be assigned to reduce A and B. This is obviously very limited: if A reduces quickly, it won't re-assign the threads to help reducing B, so the program will just use half of the cores from that point. And so on. This is why algorithms that return lists are often slower, they aren't using threads at all. In the worst case, it will just fallback to being single threaded.
A huge improvement would be a proper scheduler, that adds potential redexes to a task pool. When I attempted to do that, the cost of synchronization added too much overhead, ruining the performance. Perhaps a heuristic to consider would be to limit the depth of the redexes for which global tasks are emitted; anything below, say, depth=8, would just be reduced by the thread that reaches it. Many improvements can be done, though.
I32, I64, U64, F32, F64
Right now, HVM only supports U32. The numeric types above should be supported too. I32 and F32 should be easy to add, since they are unboxed, like U32. I64, U64 and F64 require implementing boxed numbers, since they don't fit inside a 64-bit Lnk (due to the 4-bit pointer), but shouldn't be hard. Discussion on whether we should have unboxed 60-bit variants is valid.
On #81.
Improve the left-hand side flattener
On #54.
A nice API to use as a Rust library
Using HVM as a pure Rust lib inside other projects should be very easy, specially considering the interpreter has a fairly good performance (only about 50% slower than the single-thread compiled version). We must think of the right API to expose these features in a Rust-idiomatic, future-proof way. Input from the Rust community is appreciated.
A JIT compiler
Right now, HVM compiles to C. That means a C compiler like clang or gcc is necessary to produce executables, which means that it isn't portable when used as a lib. Of course, libs can just use the interpreter, but it is ~3x slower than the compiler, and is not parallel. Ideally, we'd instead JIT-compile HVM files using WASM or perhaps Cranelift, allowing lib users to enjoy the full speed of HVM in a portable way.
IO
Adding IO should be easy: just write an alternative version of normalize
that, instead of just outputting the normal form, pattern-matches it against the expected IO type, and performs IO operations as demanded.
Windows support
The compiler doesn't support Windows yet. The use of -lpthreads
may be an issue. The interpreter should work fine, though.
On #52.
GPU compilers
Compiling to the GPU would be amazing, and I see no reason this shouldn't be possible. The only complications I can think of are:
-
How do we alloc inside a GPU work unit?
-
Reduce() is highly branching, so this may destroy the performance. But it can be altered to group tasks by categories, and merge all branches in one go. The reduce function essentially does this: branch -> match -> alloc -> subst -> free
. The last 4 steps are very similar independent of the branch it falls into. So, instead, the branch part can just prepare some data, and the match -> alloc -> subst -> free
pass are all moved to the same branch, just with different indices and parameters. This will avoid thread divergence.
-
GPGPU is a laborious mess. I'm under OSX, so I can't use CUDA; OpenCL is terrible; WebGPU is too new (and from what I could see, doesn't have the ability to read/write from the same buffer in the same pass!?). Regardless, we'd probably need different versions, Metal for OSX, CUDA and OpenCL for Windows depending on the GPU, etc. So, that's a lot of work that I can't do myself, it would require expensive engineers.
Compile Kind to HVM
HVM is meant to be a "low-level target", which is kinda confusing because it is actually very high-level, but in the sense that users shouldn't be writing it directly. Ideally, Kind will compile to HVM, but it needs a lot of adjustments before that is possible.
Add a checker for the "derivatives of (λx(x x) λx(x x)) not allowed" rule
See this issue for more information, as well as the superposed duplication section on HOW.md.
On #61.
Tests
This project has basically no test. Write tests!
Will edit the thread as I think of more things. Feel free to add to this thread.