junlarsen / llvm4j Goto Github PK
View Code? Open in Web Editor NEWExperimental LLVM FFI bindings for the Java Platform
License: Apache License 2.0
Experimental LLVM FFI bindings for the Java Platform
License: Apache License 2.0
The Target APIs (see branch api/target
) are not easily testable right now because there are no registered targets in the shipped binaries.
These tests and above branch should be merged once binaries for JavaCPP ships with targets
To ensure the LLVM Callbacks actually work, we need to find a way to call them to provide meaningful tests for them.
This may be doable with LLVMContext's DiagnosticHandler callback. Research what triggers this callback.
Find a generic way to match an OrderedEnum<*>.value
instead of calling .values().firstOrNull
This should be done without reflection
LLVM-C provides some quality-of-life functions for easier usage. Some of these have been skipped when implementing their function family. These functions are the following:
Additional functions which may be wrapped up into a Container
The github workflow does not automatically run on PRs and when it runs gradle, it also starts a gradle daemon which is unnecessary as the build only runs once before the machine shuts off.
We have ktlint to ensure the code style in the code base stays consistent. There is a ktlint.jar inside the /assets folder which can be used to run ktlint. This should be automated through GitHub actions.
Right now, there are constructors in most types that take an argument of the same type,
For example,
class PointerType {
public constructor(type: Type): this(type.ref)
}
This is deeply problematic because of this
val voidPointer = PointerType(VoidType()) // this is a void instead of void*. Yikes
This caused my languages test suites to segfault on generated code. I suggest removing these constructors and do explicit casting using the old way .asSomeType() and so on
This continues the topic which was slightly touched upon in #62
We have a couple options when it comes to designing the wrapper for Instruction values. We could group every single type of instruction under one single class, InstructionValue or we could have a lot of child classes which all extend InstructionValue. Similar to how we have Value and *Value types.
Pros
Cons
Abstract
The JavaCPP preset is missing A LOT from the LLVM-C API which we should have available. This means that these pre-generated bindings are not going to be sufficient for us and that we are going to have to roll our own through JNI bindings to the LLVM-C API.
Research
We are going to have to get our own binaries for LLVM for each platform LLVM supports instead of relying on the ones JavaCPP provided us while using their preset.
I suggest we write a build script for each platform LLVM supports to download the prebuilt binaries for that platform from its releases page, linking each archive together to get a static library which we will dynamically load into JavaCPP.
We also have many options when it comes to accessing JNI, but I think for the sake of control and customizability we want to roll our own hand written/C++ macro generated JNI bindings. This would potentially allow for custom LLVM extensions to be used, possibly C++ LLVM Passes accessed from Kotlin and more.
LLVM currently supports around 15 different host platforms which we should be able to run our library on. This means that we need to find a reliable way to build LLVM for each platform and link it to our JNI code. This should preferably be done via CI as GitHub Actions is capable of building most of the popular targets natively. We could also run the builds inside a Docker container running QEMU to build for other architectures.
Discovery
JavaCPP has an amazing and well developed JNI Core, but I do not think we want to use their code generator as it fails to properly read the LLVM header files (see above). If we go this route we need to find out how we can link their JNI code to ours as all of their Pointer types rely on native code.
We can build our own JNI bindings, wrapping all foreign pointer types into a long
which we send back to the Java world. This is the common approach for wrapping C/C++ types for JNI and we should use it as well. For this we might want to control de-allocation of objects which means we need to hook into the native environment again like JavaCPP does.
Ongoing Work
This is currently being worked on in the experimental/build-tools
branch.
Useful Links
Optimizing JNI Construction
It seems some of these APIs are missing. They should be implemented.
The APIs in question are these: https://llvm.org/doxygen/group__LLVMCCoreContext.html
The ConstantVector class does currently not support any floating point operations.
This should implement all of the LLVM ConstantExpression operations which work on floating point vectors. See ConstantVector for Integer implementation.
The kotlin version should be updated as Kotlin 1.4.10 has been released.
Right now there are many parts of the codebase which use a pointer constructor of size 0. Dereferencing a buffer returned by such allocation is undefined behavior.
While implementing a lot of the Value APIs I have noticed that the values we're working with are constants which have their own APIs which are only available for Constants (see docs)
I have been considering adding a Constant class/namespace which would allow us to add these APIs which are Constant specific.
This way we'd be able to differentiate between constant values (integers, arrays, structs etc) and non-constant values like instructions and users
Pros
isConstant()
Cons
LLVM has a lot of standalone functions for initializing various things. These should be wrapped up into a container which will take care of these.
We do this so the end user doesn't have to depend on bytedeco's raw llvm bindings
LLVM supports generating DWARF type Debug Info for LLVM programs. LLVM-C ships an entire interface to LLVM-C++'s Debug Information builder which we should be building a wrapper for.
All the relevant types and functions which should be implemented are declared in the llvm-c/DebugInfo.h
header. See https://llvm.org/doxygen/c_2DebugInfo_8h.html
Right now there's something suspiciously weird and wrong about the type casting mechanism we've implemented. It takes over 3 seconds to run a basic casting test, most likely because the Kotlin code gets compiled into code which uses lookup reflection.
This should be rewritten in a way that the Kotlin compiler doesn't generate
The samples currently depend on an older version in the vexelabs maven repository which means the samples will run regardless of breaking changes. This needs to be solved, preferably through automatic pushing to the repositories during the workflow run.
KDoc comments have the option to include a @throws
tag for any function which may throw an exception. This PR requests any function which has a throw statement inside of it to document the thrown exception with a @throws
tag.
The functions in question can easily be found through grep'ing throw
A lot of the ConstantExpressions for Values are LLVM Instructions which return another constant. A lot of these are operators like Add, Sub, Mul and Div. Would it be reasonable to add operator overloads for these?
Pros
Cons
This should upgrade the org.bytedeco.llvm-platform dependency to 11.0.0-1.5.5-SNAPSHOT which is fetched from the Sonatype Snapshot repository. The artifact in question is https://oss.sonatype.org/content/repositories/snapshots/org/bytedeco/llvm/11.0.0-1.5.5-SNAPSHOT/
This also requires the sonatype snapshot repository to be added to the gradle build.
As of right now, each push triggers 16 workflows (soon to be 24) across multiple sub-projects even though there was no code changed in those or they were not affected. Optimizing the workflows will reduce the amount of jobs triggered by each push.
Could probably use path specifiers for triggering certain workflows see documentation
Many different KDocs have different styles and different content. These should be consistent across the entire project.
@see
tags are the final item in the KDoc, apart from TODOsBytedeco JavaCPP has been updated to support LLVM 10.
We would want to upgrade our LLVM version to 10 or we could optionally wait for LLVM 11.0.0 bindings.
LLVM 10 Changelog
https://releases.llvm.org/10.0.0/docs/ReleaseNotes.html
Bytedeco Presets
https://github.com/bytedeco/javacpp-presets/tree/master/llvm#the-pomxml-build-file
Functions to Implement
This is a list of functions which LLVM 10 ships, which 9 does not. This list is not complete.
New Features
Freeze Instruction
When updating to LLVM 10.0.1/llvm-platform 10.0.1-1.5.4 a regression occured for LLVMMemoryManagerFinalizeCallback.
Previously, this function had a signature of (Pointer?, BytePointer?) -> Int
, but in the most recent update, it has a signature of Pointer?, PointerPointer<*>?) -> Int
Investigating this should compare the two versions (10.0.0-1.5.3, 10.0.1-1.5.4). This is could be LLVM changing the function signature.
The current src/test directory is a mess and it needs to be cleaned up. It should be refactored into a unit and integration package.
/unit
This directory should be flattened and reduced a bit because it's more complicated than it needs to be plus it lacks coverage over a lot of units. This should also be dealt with.
/integration
This directory should mostly contain operations you would probably perform when actually working with LLVM. Code like this can be ported over from LLVM's own unit tests or other existing compilers.
This is a tracker for all the changes which were introduces in the LLVM 11.0.0 C-API
The release we're currently using (10.0.1) was released on 6th Aug 2020.
The ConstantPropagation pass was removed. Users should use the InstSimplify pass instead.
llvm-c/Core.h
llvm-c/Orc.h
llvm-c/Transforms/Scalar.h
llvm-c/DebugInfo.h
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.