mochilibraries / biohazrd Goto Github PK
View Code? Open in Web Editor NEWA framework for automatically generating binding wrappers for C/C++ libraries
License: MIT License
A framework for automatically generating binding wrappers for C/C++ libraries
License: MIT License
TranslatedLibraryBuilder
treats file paths as case-insensitive regardless of the case-sensitivity of the file system. This was done to avoid problems arising between a difference in casing between what is provided to TranslatedLibraryBuilder
, the file system, and the #include
directives in the C++ source. (We regularly compare filenames for the sake of resolving which Clang cursors correspond to which input files -- or to determine if the cursor is out-of-scope.)
We do not expect well-formed C++ code to involve incorrect casing or to have multiple files with the same casing as both practices are non-portable. (The former is a warning -Wnonportable-include-path
in Clang.) As such, this is not really a high priority as it's only really problematic with poorly-written C++ libraries on Linux or unusual macOS/Windows systems.
The ideal solution would be to normalize paths to their actual casing once dotnet/runtime#14321 is realized.
I am hesitant to conditionally use case-sensitive comparisons based on the OSPlatform
because case-sensitivity is an attribute of the file system, not the OS. (ext4 even allows case-sensitivity as an attribute of a directory.)
Right now an anonymous (legacy-style) enum is translated as loose constants in all situations, including when it is used to type a field. For example:
class AnonymousEnumWithField
{
public:
enum
{
Red,
Green,
Blue
} AnonymousEnumField;
int FieldAfterEnum;
};
While this is accurate to how they work in C++, it isn't a particularly friendly translation. We should provide a transformation that fixes up the enum to meet C# expectations.
These aren't consistent between calling conventions/platforms, so leaking this implementation detail will cause API breakage between platforms.
Not sure how this can be solved for virtual methods when they're overridden from C#.
This should most likely be optional. If memory serves, asking Clang to retain preprocessor info slows things down noticeably.
Identifying numeric/string constants vs macros will probably have to be fairly heuristic, especially if we want to capture constants built from macros.
The transformation infrastructure is designed with concurrency in mind, but no such concurrency has been implemented.
This type is unlikely to ever show up in declarations, but it is possible.
auto
with a static fieldstatic auto AutoNullPtr = nullptr;
VarDecl VarDecl Var - AutoNullPtr Mangled=AutoNullPtr
^---- Variable type: AutoType (CXType_Auto) 'nullptr_t' SizeOf=8
BuiltinType (CXType_NullPtr) 'nullptr_t' SizeOf=8
CXXNullPtrLiteralExpr CXXNullPtrLiteralExpr -
nullptr_t
explicitly#include <cstddef>
static std::nullptr_t NullPtr = nullptr;
VarDecl VarDecl Var - NullPtr Mangled=NullPtr
^---- Variable type: ElaboratedType (CXType_Elaborated) 'std::nullptr_t' SizeOf=8
TypedefType (CXType_Typedef) 'std::nullptr_t' SizeOf=8
BuiltinType (CXType_NullPtr) 'nullptr_t' SizeOf=8
Ref NamespaceRef - std
Ref TypeRef - std::nullptr_t
CXXNullPtrLiteralExpr CXXNullPtrLiteralExpr -
decltype(nullptr)
static decltype(nullptr) DeclTypeNullPtr = nullptr;
VarDecl VarDecl Var - DeclTypeNullPtr Mangled=DeclTypeNullPtr
^---- Variable type: DecltypeType (CXType_Unexposed) 'decltype(nullptr)' SizeOf=8
BuiltinType (CXType_NullPtr) 'nullptr_t' SizeOf=8
CXXNullPtrLiteralExpr CXXNullPtrLiteralExpr -
CXXNullPtrLiteralExpr CXXNullPtrLiteralExpr -
C#-side trampolines can pollute stack traces. Consider adding DebuggerHiddenAttribute
, DebuggerStepThroughAttribute
, and StackTraceHiddenAttribute
(depends on dotnet/runtime#29681) to these methods. (Similar example from the BCL: https://github.com/dotnet/runtime/pull/32353/files)
DebuggerHiddenAttribute
DebuggerStepThroughAttribute
StackTraceHiddenAttribute
(This will probably happen before the prototype refactor is complete, but I'm creating this issue since it doesn't block anything and I don't want to forget.)
Right now all TranslatedDeclaration
are IEnumerable<TranslatedDeclaration>
. This certainly makes it easier to enumerate children when a declaration has them, but the vast majority of declarations can't even have children. Instead, perhaps we should only implement IEnumerable<TranslatedDeclaration>
when the declaration can actually have children and use is
in contexts where enumeration is required.
If we do this, the VisitorContext
/TransformationContext
will need to be updated to reflect this new paradigm. I'm thinking the following:
Parent
to Siblings
ParentDeclaration
to Parent
Right now the EnumerateRecursive
extension method flattens the entire declaration hierarchy without providing context. However, some simpler enumerations could benefit from context information without having to jump all the way to implementing a DeclarationVisitor
. It should be relatively easy to provide a version of EnumerateRecursive
that yields a (VisitorContext, TranslatedDeclaration)
tuple instead of just contextless declarations.
If this version was added, it should probably only be provided for TranslatedLibrary
rather than any random IEnumerable<TranslatedDeclaration>
. (Either that or there should be two versions: One for TranslatedLibrary
and one for IEnumerable<TranslatedDeclaration>
that takes an initial VisitorContext
.)
In the original prototype, GenerateModuleDefinitionTransformation
had this logic:
// Static non-method functions cannot be exported
if (methodDeclaration is null && functionDeclaration.StorageClass == CX_StorageClass.CX_SC_Static)
{ return null; }
Unfortunately I did not elaborate why and I'm not sure this is actually true. I think maybe I incorrectly made this assumption when I was investigating why module definitions weren't solving all the problems I had with exporting inline functions and this check is no longer valid.
It's not immediately clear if this is ever different from the record type that contains the method (or maybe I'm misunderstanding its purpose.) It caught my eye as something that might be able to improve correctness in weird situations we aren't handling yet when it was added to libClangSharp beta2.
Clang is has a basic understanding of Doxygen comments and attaches them to the appropriate declarations. We should parse these comments and turn them into XML documentation on the generated output.
Constant-sized arrays are currently not supported for fields. (They were in the prototype, but it was a huge hack.)
Additionally, constant-sized arrays are translated as bare pointers. This discards the array's element count, which isn't super ideal since there's no reasonable way to discover it.
We should synthesize types to represent the various sized constant arrays.
Useful resources:
(Note that the issues around the size of the element struct changing do not affect Biohazrd because our assemblies are already architecture and platform-dependent.)
In the prototype this was controlled by GlobalConfiguration.DumpClangDetails
. During the refactor, it was changed to only emit on debug builds for simplicity. Instead, we should allow configuring whether or not it is emitted.
We only handle a handful of the possible Type
variants exposed by ClangSharp. Below is a list of all of them, we need to eventually investigate all of the unchecked ones to determine what they are and whether we need to handle them:
AdjustedType
DecayedType
ArrayType
ConstantArrayType
DependentSizedArrayType
- These are arrays sized by a template parameterIncompleteArrayType
- Unsized arrays (IE: int x[]
)
VariableArrayType
AtomicType
AttributedType
#124BlockPointerType
BuiltinType
-- BuiltinType
is not handled in any one location, and not all types are supported. We do use it to filter, but we have to use Type.Kind
to get the actual builtin type. See #46 for details.ComplexType
DecltypeType
-- decltype(T)
DeducedType
AutoType
-- The magic auto
typeDeducedTemplateSpecializationType
DependentAddressSpaceType
DependentSizedExtVectorType
DependentVectorType
ExtVectorType
FunctionType
FunctionNoProtoType
FunctionProtoType
InjectedClassNameType
MacroQualifiedType
MemberPointerType
ObjCObjectPointerType
ObjCObjectType
ObjCInterfaceType
ObjCTypeParamType
PackExpansionType
ParenType
PipeType
PointerType
ReferenceType
LValueReferenceType
RValueReferenceType
-- ๐ We handle these the same as LValueReferenceType
, but I'm not sure that's actually correct so a warning is emitted as well. SubstTemplateTypeParmPackType
SubstTemplateTypeParmType
TagType
EnumType
RecordType
TemplateSpecializationType
TemplateTypeParmType
TypedefType
-- If the Clang decl is attached to a TranslatedDeclaration, we translate as a type reference. Otherwise we ignore the typedef and translate as the canonical type.TypeOfExprType
TypeOfType
TypeWithKeyword
DependentNameType
DependentTemplateSpecializationType
ElaboratedType
- These are namespace-qualified types (like physx::PxU32
)UnaryTransformType
UnresolvedUsingType
VectorType
I suspect many of them probably can't even appear in declaration contexts, but we should check for sure and explicitly error if they appear.
For types without constructors, the C++ compiler may initialize the vtable pointer wherever the type is initialized.
This was encountered with PxDefaultAllocator.
Here's a simple example: (Godbolt)
class AbstractBase
{
public:
virtual void* allocate(int size);
};
class DefaultImpl : public AbstractBase
{
public:
void* allocate(int size)
{
return nullptr;
}
};
void UseAllocator(AbstractBase* allocator)
{
allocator->allocate(100);
}
void Test()
{
DefaultImpl impl;
UseAllocator(&impl);
}
With GCC x86-64 10.2, Test
is generated as follows:
Test():
push rbp
mov rbp, rsp
sub rsp, 16
mov eax, OFFSET FLAT:vtable for DefaultImpl+16
mov QWORD PTR [rbp-8], rax
lea rax, [rbp-8]
mov rdi, rax
call UseAllocator(AbstractBase*)
nop
leave
ret
The easiest solution here is probably to use C trampolines for object construction (when necessary?)
Related: #51
Right now a vtable pointer gets synthesized in the child type when a parent holds the vtable pointer. This is probably OK since it allows specifying the child's more-specific vtable layout for the field, but I'm not sure if I'm happy with how it gets synthesized. It might not be a bad idea to revisit this.
We use the constant values computed by Clang for enum constants. The only "nice" thing we do is try to use hex if the original C++ declaration used hex.
For more complex enum constants, we should embed the original expression as a documentation comment on the constant. This would be especially nice in the case of composite flag constants.
IE:
enum class Colors
{
Red = 1,
Green = 1 << 1,
Blue = 1 << 2,
Yellow = Red | Green
// ...
}
becomes
[Flags]
enum Colors
{
Red = 1,
/// <remarks>Original C++ value: <c>1 << 1</c></remarks>
Green = 2,
/// <remarks>Original C++ value: <c>1 << 2</c></remarks>
Blue = 4,
/// <remarks>Original C++ value: <c>Red | Green</c></remarks>
Yellow = 3
}
We can use logic similar to our hex-declaration-detection to determine whether or not a constant declaration is trivial.
Right now constant-sized arrays specified as parameters are always passed as a pointer to the first element. Is this accurate? Can they be passed by value when a similar struct would?
IE:
void SomeFunc(int32_t x[1])
{
// ...
}
void SomeFunc(int32_t x[2]) // on x64
{
// ...
}
Things like the RTTI pointer are stored in the vTable before the entry where the vTable pointer points. Right now we just assume the vTable pointer points to the first occurrence of a function pointer in the table. This is probably fine, but in the interest of correctness we should figure out where this information is hidden within Clang.
If I remember right, -fdump-vtable-layouts
points it out for the Microsoft ABI but not Linux. I think I investigated how it was determining it but it was non-trivial and I decided to make an assumption for now instead.
Note that right now CSharpLibraryGenerator
is what is handling this assumption. Ideally this should be encoded in TranslatedVTable
instead.
Right now TypeTransformationBase will ignore type references in declarations it doesn't recognize. This is not ideal since it might not be immediately obvious that it needs to be aware of special declarations.
Without an architecture change, the easiest thing would be either:
TypeReference
fields. (Debug builds only?)Right now all conversion operator overloads are named ____ConversionOperator
. Ideally we should name them with the types involved with the conversion.
The analyzer should also error if a field/variable of a non-pointer type is created for these structs.
Doubleplussame for TranslatedUndefinedRecord
since they couldn't even be dereferenced in C++.
The fields of an anonymous union should be inlined into the containing record so they act like they do in C++. (Right now they get emitted as a nested type and field with an automatically generated names.)
__fp16
can be used in ARM NEON and I imagine it should be compatible with System.Half
.
__fp16
corresponds to CXType_Half
. Can/should we support CXType_Float16
(_Float16
) too? It's unclear why they're separate in the first place. (They share the same "singleton type" within Clang, it's not immediately clear what that means for us.)
Right now we use C#9 records for several types to allow using with
to clone+mutate them since with
can only be used with records. The LDM has expressed interest in adding with
support for for clonable non-record types, but it doesn't sound like it'll happen in the C#9 timeframe.
Since we don't need/want the value-equality semantics of records, we should really just ditch them once C# supports with
elsewhere. (Potentially with a source generator used to generate the boilerplate required to support it.
IE:
const int SomeConst = 100;
One consideration to be made for these types: I believe C++ guarantees that &SomeConst
is the same across all translation units. (If not, I bet most compilers do this anyway.) Making this guarantee means we can't let the constant be a C# const since C# does not allow taking the address of a constant. (And even if it did, the address would be different.)
There are essentially three ways to represent C++-style constants in C#:
NativeLibrary.GetExport
static readonly
field
Type.TypeInitializer
or RuntimeHelpers.RunClassConstructor
)RuntimeHelpers.RunModuleConstructor
I think the right solution here is C, but will revisit when I go to implement.
This issue tracks the support for individual Clang built-in types. (For type classes, see #38)
The list below was gathered from CXTypeKind
using the range CXType_FirstBuiltin
..CXType_LastBuiltin
. (Clang also considers some OpenCL stuff to be builtins, but they're out of scope for us.)
The documentation on the libclang side of things is pretty barren, BuiltinTypes.def
is much more verbose. (Note that not all types defined in BuiltinTypes.def are exposed by libclang. It may be worthwhile to enumerate these too.)
Char_S
-- char
(with -fsigned-char
) always translated as byte
since a bare char
usually indicates a C-style string and .NET's encoding utilities use byte
Char_U
-- char
(with -fno-signed-char
)Char8
-- char8_t
UTF-8 character (C++20 feature) should translate to System.Char8
once both are released.Char16
-- char16_t
, translated as C# char
WChar
-- wchar_t
(16-bit -- on Windows platforms)WChar
-- wchar_t
(32-bit -- on Itanium platforms) #45Char32
-- char32_t
#45UChar
-- unsigned char
/ uint8_t
UShort
UInt
ULong
ULongLong
UInt128
SChar
-- signed char
/ int8_t
Short
Int
Long
LongLong
Int128
Float16
-- _Float16
Half
-- half
(OpenCL) / __fp16
(ARM NEON) #44Float
Double
Float128
-- __float128
LongDouble
Bool
NullPtr
-- #43Void
Dependent
Overload
In addition to _Accum
, Clang also internally supports similar _Fract
and _Sat
types which aren't even exposed in libclang.
These are apparently for use on processors which support fixed-point types
I am assuming these will definitely be out of scope for Biohazrd. If a library needs these, I assume they will need special library/language-specific transformation.
I assume we could kludge them into same-sized builtins, but without real samples I'm not a fan of doing that.
(Anyone who happens across a library that uses these can use KludgeUnknownClangTypesIntoBuiltinTypesTransformation
until they figure out their library-specific translation.)
Accum
UAccum
ShortAccum
UShortAccum
LongAccum
ULongAccum
ObjCClass
ObjCId
ObjCSel
CXType_OCL*
typesThis name is based on the old prototype function that served a similar role. However, this transformation is also responsible for replacing basic ClangTypeReference
s with Biohazrd-equivalents like CXType_Void
-> VoidTypeReference
and FunctionProtoType
-> FunctionPointerTypeReference
Function pointers could be construed as some sort of reduction, but I think it'd be better for this to be the transformation that handles converting Clang type references to common Biohazrd equivalents instead.
Alternatively, maybe this transformation needs to be broken up anyway. I can see the array stuff not always being desirable.
PhysX is our current proof-of-concept, but it's pretty heavy. We should provide a more minimal example to demonstrate using Biohazrd to generate a project that supports multiple platforms and architectures.
It'd be a good idea to provide examples of generating projects for both multi-target single-package and mutli-target multi-package.
Calling constructors on abstract types is necessary when extending an abstract type from C#.
Right now they are skipped by the module definition generator since that's how it was in the prototype. Not sure if I did this because I wasn't thinking about extending abstract types from C# or if I was just trying to avoid dealing with them at the time. Either way, we need a strategy for calling inline constructors on abstract types, which will be trickier than with concrete inline constructors.
Edit: I was about to close this as out-of-scope, but I think my intent was to support the barebones basic stuff like indent styles and encoding, which should be simple enough to support.
I do not think Biohazrd should attempt to support the .NET extensions to editorconfig. There's just too many things that can be configured, and it'd unnecessarily complicate the emit logic in Biohazrd. Biohazrd emits code that is compliant with typical C# coding standards. (Well, as compliant as it can be considering it's mapping C/C++ code so the naming conventions are usually murdered.)
If someone wants their generated code to match their personal coding standards more closely, I think it's best that it's done as a post-process. I think Roslyn provides support for auto-formatting C# documents like that, but I've never touched it. If someone wants this I'm willing to entertain extensibility points in Biohazrd to allow such a post-process step.
No attempt at supporting variable arguments was made in the original prototype. Supporting these will be non-trivial because if I remember right, the runtime doesn't officially support P/Invoke with variable arguments outside of Windows.
char32_t
represents UTF-32 codepoints. They aren't supported yet purely out of lack of need and unclear best translation.
How these might translate to C# is somewhat unclear. Older methods such as Char.ConvertFromUtf32
use int
for UTF32 codepoints.
In theory System.Text.Rune
might be a good choice here, but it doesn't allow surrogates. I'm unsure if there's a legitimate reason for codepoints from the surrogate range to appear in UTF-32. However, we would be allows C++ to make invalid runes since the Rune
constructor protects against surrogates.
More research needs to be done to determine if a reasonable ideal default translation can be supported in Biohazrd. Until then it most likely makes more sense for people to provide their own purpose-built translation. (If you make one, please comment on how you did it and how it worked out for you!)
Related: #9
The libclang Pathogen extensions provide information about bit fields, but we don't really use it beyond marking the fields are unusable.
I think the best approach here is to make an additional TranslatedBitField
type to represent them rather than trying to wedge them into TranslatedNormalField
. That way the relative edge-case bit field stuff can live separate from the extremely common case of normal fields.
C# does not support bit fields, so we'd need to implement them manually using properties. This should be fine since the only thing this really does is prevent taking a reference to the bit field, which isn't something you can do regardless.
#17 alleviates most cases where this mistake can be made, but neither of these methods should generally be called anyway.
Basically the rules are:
Transform
must only be called by TransformRecursively
or an override of Transform
TransformRecursively
must only be called by Transform(TranslatedLibrary)
or a Transform*Children
methodAlternatively we might consider renaming Transform
to make it less attractive.
There isn't a way I'm aware of to convince MSVC or Clang to export an inline destructor without modifying its definition.
Destructors were disabled from module definition generation entirely (even non-inline ones) in the prototype and it's unclear exactly why that is.
Worth noting: Virtual destructors are always available in the output.
Ideally it should be simple to attach extra information to a declaration in a transformer without needing to introduce a new type. This could be useful for attaching optional, extra information to a declaration that might not be meaningful to all contexts and doesn't necessitate creating a whole new declaration.
For example, this could replace some C#-specific functionality like TranslatedFunction.HideFromIntellisense
. It'd also allow these sorts of things to be used with other declarations.
I am thinking there should be two variants of this metadata:
Complex data is represented by a user-defined struct. Only one of each struct can be attached to a declaration. Simple data is keyed on either an empty type or a magic string. (I'm inclined to prefer the former.)
Tentative design:
public record TranslatedDeclaration
{
// ...
public TranslatedMetadata Extra { get; init; }
// ...
}
// Support extending TranslatedLibrary too?
public record TranslatedLibrary
{
// ...
public TranslatedMetadata Extra { get; init; }
// ...
}
// This interface is used for simple data keys and ensures people don't accidentally mix up TValue.
public interface IMetadataKey<TValue> { }
// Like all of the translated object model, this is an immutable type.
public sealed class TranslatedMetadata
{
// ==========================================
// Self-keyed/complex overloads
// ==========================================
// Use an analyzer to warn if T is IMetadataKey<TValue>
public T? TryGet<T>()
where T : class
=> throw null;
public bool TryGet<T>(out T value)
=> throw null;
// ==========================================
// Tagged/simple overloads
// ==========================================
public TValue TryGet<TKey, TValue>()
where TKey : IMetadataKey<TValue>
where TValue : class
=> throw null;
public bool TryGet<TKey, TValue>(out TValue value)
where TKey : IMetadataKey<TValue>
=> throw null;
// ==========================================
// Remove
// ==========================================
public TranslatedMetadata RemoveKey<TKey>
=> throw null;
// ==========================================
// Update
// ==========================================
public TranslatedMetadata Update<TKey>(TKey newValue)
=> throw null;
public TranslatedMetadata Update<TKey, TValue>(TValue newValue)
where TKey : IMetadataKey<TValue>
=> throw null;
// ... similar methods for Add(value)
}
TranslatedMetadata
is essentially a dictionary where the type of the key determines the type of the value.
Originally I was on the fence about this, but using the API is quite annoying and unnatural without this. For instance, my PhysX demo has a ton of whatever.Base.Base.Base.thingIWanted
all over.
Easiest way to do this will probably be to (for example):
This generator should ideally be written so that non-standard declarations can be added easily. Custom declaration and type reference support is handled through special interfaces. So we don't need to worry about supporting them. (Although we could automagically implement these interfaces.)
ClangSharpTypereference
are present anywhere in the entire treeVoidTypeReference
are present for fields, parameters, or underlying enum types. (Probably easier to check if they're allowed rather than when they're not.)TranslatedTypeReference
actually resolve #109TranslatedRecord.Members
doesn't have any unexpected declarations.TranslatedVTableField.Accessibility
is <= TranslatedVTable.Accessibility
__UNICODE_0123__
, so this should only be a warning. See CSharpCodeWriter.IsLegalIdentifier
)A medium/long-term goal of Biohazrd is to automate verification of the calling convention to ensure that everything is kosher across the entire API surface.
The current translation of static fields is very user-unfriendly and leaks NativeLibrary
handles.
One of the large parts of Biohazrd that was spared during the big prototype refactor was diagnostics. I changed how/where they get emitted, but they're still very Clang-cerntric and a bit annoying to work with.
TranslatedDeclaration
when they aren't being attached to the declaration. (IE: during generation.)ObsoleteAttribute
can be applied to generated types. (For instance: The definition of an enum can be translated if the underlying type is unsupported, but the enum shouldn't be usable.)Right now global variables are a very inefficient, is user-unfriendly, leaks NativeLibrary
handles, and aren't considered by the module definition generator.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.