GithubHelp home page GithubHelp logo

cdisselkoen / llvm-ir Goto Github PK

View Code? Open in Web Editor NEW
520.0 9.0 46.0 10 MB

LLVM IR in natural Rust data structures

License: MIT License

Rust 45.78% Makefile 0.99% C 0.22% LLVM 52.95% Shell 0.04% C++ 0.02%

llvm-ir's Introduction

llvm-ir: LLVM IR in natural Rust data structures

crates.io License

llvm-ir seeks to provide a Rust-y representation of LLVM IR. It's based on the idea that an LLVM Instruction shouldn't be an opaque datatype, but rather an enum with variants like Add, Call, and Store. Likewise, types like BasicBlock, Function, and Module should be Rust structs containing as much information as possible.

Unlike other safe LLVM bindings such as inkwell, llvm-ir does not rely on continuous FFI to the LLVM API. It uses the LLVM API only for its initial parsing step, to pull in all the data it needs to construct its rich representation of LLVM IR. Once llvm-ir creates a Module data structure by parsing an LLVM file (using the excellent llvm-sys low-level LLVM bindings), it drops the LLVM FFI objects and makes no further FFI calls. This allows you to work with the resulting LLVM IR in pure safe Rust.

llvm-ir is intended for consumption of LLVM IR, and not necessarily production of LLVM IR (yet). That is, it is aimed at program analysis and related applications which want to read and analyze LLVM IR. In the future, perhaps llvm-ir could be able to output its Modules back into LLVM files, or even send them directly to the LLVM library for compiling. If this interests you, contributions are welcome! (Or in the meantime, check out inkwell for a different safe interface for producing LLVM IR.) But if you're looking for a nice read-oriented representation of LLVM IR for working in pure Rust, that's exactly what llvm-ir can provide today.

Getting started

This crate is on crates.io, so you can simply add it as a dependency in your Cargo.toml, selecting the feature corresponding to the LLVM version you want:

[dependencies]
llvm-ir = { version = "0.11.1", features = ["llvm-18"] }

Currently, the supported LLVM versions are llvm-9, llvm-10, llvm-11, llvm-12, llvm-13, llvm-14, llvm-15, llvm-16, llvm-17, and llvm-18.

Then, the easiest way to get started is to parse some existing LLVM IR into this crate's data structures. To do this, you need LLVM bitcode (*.bc) or text-format IR (*.ll) files. If you currently have C/C++ sources (say, source.c), you can generate *.bc files with clang's -c and -emit-llvm flags:

clang -c -emit-llvm source.c -o source.bc

Alternately, to compile Rust sources to LLVM bitcode, you can use rustc's --emit=llvm-bc flag.

In either case, once you have a bitcode file, then you can use llvm-ir's Module::from_bc_path function:

use llvm_ir::Module;
let module = Module::from_bc_path("path/to/my/file.bc")?;

or if you have a text-format IR file, you can use Module::from_ir_path().

You may also be interested in the llvm-ir-analysis crate, which computes control-flow graphs, dominator trees, etc for llvm-ir functions.

Documentation

Documentation for llvm-ir can be found on docs.rs, or of course you can generate local documentation with cargo doc --open. The documentation includes links to relevant parts of the LLVM documentation when appropriate.

Note that some data structures differ slightly depending on your choice of LLVM version. The docs.rs documentation is generated with the llvm-10 feature; for other LLVM versions, you can get appropriate documentation with cargo doc --features=llvm-<x> --open where <x> is the LLVM version you're using.

Compatibility

Starting with llvm-ir 0.7.0, LLVM versions are selected by a Cargo feature flag. This means that a single crate version can be used for any supported LLVM version. Currently, llvm-ir supports LLVM versions 9 through 18, selected by feature flags llvm-9 through llvm-18.

You should select the LLVM version corresponding to the version of the LLVM library you are linking against (i.e., that is available on your system.) Newer LLVMs should be able to read bitcode produced by older LLVMs, so you should be able to use this crate to parse bitcode older than the LLVM version you select via crate feature, even bitcode produced by LLVMs older than LLVM 9. However, this is not extensively tested by us.

llvm-ir works on stable Rust. As of this writing, it requires Rust 1.65+.

Development/Debugging

For development or debugging, you may want LLVM text-format (*.ll) files in addition to *.bc files.

For C/C++ sources, you can generate these by passing -S -emit-llvm to clang, instead of -c -emit-llvm. E.g.,

clang -S -emit-llvm source.c -o source.ll

For Rust sources, you can use rustc's --emit=llvm-ir flag.

Additionally, you may want to pass the -g flag to clang, clang++, or rustc when generating bitcode. This will generate LLVM bitcode with debuginfo, which will ensure that Instructions, Terminators, GlobalVariables, and Functions have valid DebugLocs attached. (See the HasDebugLoc trait.) Also note that these DebugLocs are only available in LLVM 9 and newer; previous versions of LLVM had a bug in this interface in the C API which would cause segfaults.

Limitations

A few features of LLVM IR are not yet represented in llvm-ir's data structures.

Most notably, llvm-ir recovers debug-location metadata (for mapping back to source locations), but makes no attempt to recover any other debug metadata. LLVM files containing metadata can still be parsed in with no problems, but the resulting Module structures will not contain any of the metadata, except debug locations.

A few other features are missing from llvm-ir's data structures because getters for them are missing from the LLVM C API and the Rust llvm-sys crate, only being present in the LLVM C++ API. These include but are not limited to:

  • the "fast-math flags" on various floating-point operations
  • contents of inline assembly functions
  • information about the clauses in the variadic LandingPad instruction
  • information about the operands of a BlockAddress constant expression
  • information about TargetExtType types
  • the "prefix data" associated with a function
  • the values of constant integers which are larger than 64 bits (and don't fit in 64 bits) -- see #5
  • the "other labels" reachable from a CallBr terminator (which was introduced in LLVM 9)
  • (LLVM 16 and lower -- fixed in LLVM 17 and later) the nsw and nuw flags on Add, Sub, Mul, and Shl, and likewise the exact flag on UDiv, SDiv, LShr, and AShr. The C API has functionality to create new instructions specifying values of these flags, but not to query the values of these flags on existing instructions.
  • (LLVM 9 and lower -- fixed in LLVM 10 and later) the opcode for the AtomicRMW instruction, i.e., Xchg, Add, Max, Min, and the like.

More discussion about this is in LLVM bug #42692. Any contributions to filling these gaps in the C API are greatly appreciated!

Acknowledgments

llvm-ir took its original inspiration from the llvm-hs-pure Haskell package. Most of the data structures in the original release of llvm-ir were essentially translations from Haskell to Rust of the data structures in llvm-hs-pure (with some tweaks).

Changelog for 0.7.0

llvm-ir 0.7.0 includes several fairly major changes from previous versions, which are outlined here.

  • LLVM versions are now selected via Cargo features. You must select exactly one of the features llvm-8, llvm-9, or llvm-10. Previously, we had the 0.6.x branch for LLVM 10, the 0.5.x branch for LLVM 9, and didn't officially support LLVM 8. Now, a single release supports LLVM 8, 9, and 10.
    • (Note: Versions of this crate beyond 0.7.0 have added support for later LLVM versions as well. For instance, 0.7.3 and later also support LLVM 11; and 0.7.5 and later also support LLVM 12. Crate version 0.11.0 removed support for LLVM 8.)
  • FunctionAttribute and ParameterAttribute are now proper enums with descriptive variants such as NoInline, StackProtect, etc. Previously, attributes were opaque numeric codes which were difficult to interpret.
  • Several changes to improve runtime performance and especially memory consumption, particularly when parsing large LLVM modules. This involves a number of breaking changes to the public interface:
    • Most users of Type now own a TypeRef rather than a Type directly. This includes Operand::LocalOperand, GlobalVariable, many variants of Instruction, many variants of Constant, and some variants of Type itself, among others. See the documentation on TypeRef.
    • Similarly, most users of Constant now own a ConstantRef rather than a Constant directly. See the documentation on ConstantRef.
    • To get the type of Typed objects, the provided .get_type() method now requires an additional argument; most users will probably prefer module.type_of() (or module.types.type_of()).
    • Type::NamedStructType no longer carries a weak reference to its inner type; instead, you can look up the name using module.types.named_struct_def() to get the definition for any named struct type in the module.
  • The required Rust version increased from 1.36+ to 1.39+.
    • (Note: Versions of this crate beyond 0.7.0 have increased this requirement further. For the current required Rust version, see "Compatibility" above.)

llvm-ir's People

Contributors

00xc avatar apmasell avatar benjins avatar bitmagier avatar bramverb avatar cdisselkoen avatar jedisct1 avatar langston-barrett avatar romfouq avatar smoelius avatar thezoq2 avatar woodruffw avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

llvm-ir's Issues

Functions returning large structs have sret attribute on the wrong argument

First of all, thanks for this awesome library. Being able to work with rust instead of C++ has been a huge improvement to my work project :)

I did come across a strange bug however, though I may just be misunderstanding something.

I have a function that looks like this:

struct output {
    double a0;
    double a1;
    double a2;
};

output run(float a) {
    output o;
    o.a0 = a;
    return o;
}

Which I compile to bitcode using

clang++ code.cpp -emit-llvm -c -O2 -fno-discard-value-names -o module.bc

and llvm assembly using

clang++ code.cpp -emit-llvm -c -O2 -fno-discard-value-names -o module.ll -S && cat module.ll

which gives me the following assembly:

; ModuleID = 'code.cpp'
source_filename = "code.cpp"
target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-pc-linux-gnu"

%struct.output = type { double, double, double }

; Function Attrs: nofree norecurse nounwind sspstrong uwtable writeonly
define dso_local void @_Z3runff(%struct.output* noalias nocapture sret %agg.result, float %a, float %b) local_unnamed_addr #0 {
entry:
  %0 = insertelement <2 x float> undef, float %a, i32 0
  %1 = insertelement <2 x float> %0, float %b, i32 1
  %2 = fpext <2 x float> %1 to <2 x double>
  %3 = bitcast %struct.output* %agg.result to <2 x double>*
  store <2 x double> %2, <2 x double>* %3, align 8, !tbaa !4
  ret void
}

attributes #0 = { nofree norecurse nounwind sspstrong uwtable writeonly "correctly-rounded-divide-sqrt-fp-math"="false" "disable-tail-calls"="false" "frame-pointer"="none" "less-precise-fpmad"="false" "min-legal-vector-width"="0" "no-infs-fp-math"="false" "no-jump-tables"="false" "no-nans-fp-math"="false" "no-signed-zeros-fp-math"="false" "no-trapping-math"="false" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+cx8,+fxsr,+mmx,+sse,+sse2,+x87" "unsafe-fp-math"="false" "use-soft-float"="false" }

!llvm.module.flags = !{!0, !1, !2}
!llvm.ident = !{!3}

!0 = !{i32 1, !"wchar_size", i32 4}
!1 = !{i32 7, !"PIC Level", i32 2}
!2 = !{i32 7, !"PIE Level", i32 2}
!3 = !{!"clang version 10.0.1 "}
!4 = !{!5, !5, i64 0}
!5 = !{!"double", !6, i64 0}
!6 = !{!"omnipotent char", !7, i64 0}
!7 = !{!"Simple C++ TBAA"}

The relevant part is the input arguments in which the first argument is agg.result which has a few attributes, followed by two floats which have no arguments.

Now, when I run the following rust code:

use llvm_ir::Module;
fn main() {
    let module = Module::from_bc_path("module.bc").unwrap();

    let function = &module.functions[0];

    println!("{:#?}", function.parameters);

    println!("Hello, world!");
}

it prints the following:

[
    Parameter {
        name: Name(
            "agg.result",
        ),
        ty: PointerType {
            pointee_type: NamedStructType {
                name: "struct.output",
                ty: Some(
                    (Weak),
                ),
            },
            addr_space: 0,
        },
        attributes: [],
    },
    Parameter {
        name: Name(
            "a",
        ),
        ty: FPType(
            Single,
        ),
        attributes: [
            EnumAttribute {
                kind: 21,
                value: None,
            },
            EnumAttribute {
                kind: 23,
                value: None,
            },
            EnumAttribute {
                kind: 58,
                value: None,
            },
        ],
    },
    Parameter {
        name: Name(
            "b",
        ),
        ty: FPType(
            Single,
        ),
        attributes: [],
    },
]

In which as far as I can tell, the attributes of the arguments have been moved to the second argument.

This feels wrong, but I also found the function hasStructRetAttr in llvm which checks for the StructRet attribute on both the first and second argument https://llvm.org/doxygen/Function_8h_source.html

Is this the expected behaviour or did I stumble across a bug?

Test failures from LLVM C API accessor issues

I tried running the tests via cargo test --features=llvm-15, and was getting a SIGILL termination. I tracked it down to what seems to be a bug in the LLVM C API, where LLVMGetOrdering does not support Fence instructions. Running a build of LLVM with assertions enabled also led to a few other issues

The main issue: LLVMGetOrdering does not support Fence instructions. It will instead cast it to an unrelated type AtomicRMWInst which will read a garbage value for the memory order (see https://github.com/llvm/llvm-project/blob/77c67436d9e7e63c50752d990a271cf860ba9a0e/llvm/lib/IR/Core.cpp#L3754-L3764 ). The problem is that if this is not a valid enum value, then MemoryOrdering::from_llvm will cause UB since none of the match arms will be hit (LLVMGetOrdering returns the LLVMAtomicOrdering enum):

llvm-ir/src/instruction.rs

Lines 3377 to 3380 in 23dac16

impl MemoryOrdering {
#[rustfmt::skip] // each one on one line, even if lines get a little long
pub(crate) fn from_llvm(ao: LLVMAtomicOrdering) -> Self {
match ao {

I believe this is what was what triggered the SIGILL for me. The exact test can be run with cargo test --features=llvm-15 --test llvm_10_tests atomicrmw_binops (the atomics function in tests/llvm_bc/compatibility.ll.bc)

I also found some other, more benign issues with the C API, which trigger assertion failures. I'm mentioning them in case anyone else runs the tests w/ a copy of LLVM that has assertions enabled

  1. LLVMIsAtomicSingleThread does not properly support Fence, Load, or Store instructions (called via SynchronizationScope::from_llvm_ref). This is a similar issue where the C API does not check for those types. It happens to work now because loads/stores, fences, and AtomicCmpXchgInst all define the sync scope ID field at the same offset
  2. LLVMGetNormalDest does not support CallBr instructions (called via CallBr::from_llvm_ref). It casts the value to an InvokeInst. This is technically an unrelated type, but it seems to work out because the methods seem to be pulling the operand from the same place, but I'm not sure if this will work in all cases
  3. LLVMGetThreadLocalMode does an overly-restrictive cast to GlobalVariable, when it should be using GlobalValue. It is called via GlobalAlias::from_llvm_ref. Global aliases are subclasses of GlobalValue, but not GlobalVariable. However this isn't really an issue since it will only access a method that is on GlobalValue, so global aliases work out

I've filed an upstream LLVM issue for the atomic-related issues, since the memory order one is causing problems for me on a normal build: llvm/llvm-project#65227 However, even if this lands promptly it likely won't make it to a release until LLVM 18.x. Not sure what the preferred path is for handling it in the meantime: the UB issue could at least be detected by transmuting LLVMAtomicOrdering to an integer, and checking if it's an expected value. However even then the enum value is probably not going to be correct

Support for LLVM 15?

First of all, thank you for maintaining this crate -- it's been incredibly useful on some of our projects!

Recent versions of rustc have upgraded to LLVM 15, meaning that the IR generated by rustc can't be used with this crate's bindings by default.

Would it be possible to add LLVM 15 support to this crate? I'm happy to take a stab at it first as well, if it's easier for you to review changes ๐Ÿ™‚

Feature: Type sizes

I have an analysis that could make use of information about the sizes of types, in particular, what LLVM calls DataLayout::getTypeAllocSize. LLVM has a whole variety of methods that calculate different sorts of sizes for various types, it looks to me like they're all fundamentally based on DataLayout::getTypeSizeInBits.

Support for LLVM 17

Looks like rustc bumped up to LLVM 17, starting with 1.73.0.

I'll try and get a PR opened for this in the coming days.

Instanciating types for tests

Hello again.

I'm in the process of porting my code over to 0.7 and ran into a problem when trying to port my tests. Most of them do something like this:

let instruction = BitCast {
    operand: LlvmOperand::LocalOperand {
        name: "retval".into(),
        ty: Type::PointerType { // This no longer compiles
            pointee_type: Box::new(Type::NamedStructType {
                name: "struct.Result".into(),
                ty: None,
            }),
            addr_space: 0,
        },
    },

At first, I thought I would just be able to create a new typeref using
TypeRef::new(Type::...) but that seems to not be the case.

I also found the TypesBuilder and Types struct, however the TypesBuilder
seems to be private and Types, as far as I can tell, has no constructor.

Am I missing something here?

Module::ThinLTO Summary

It would be very nice to have support for the ThinLTO Summary section. I have not seen that data inside a parsed module yet.
LLVM specifies this section, which contains essential linker information - e.g. a "guid" for referenced functions. This section is written e.g. by clang -flto ...

Version 0.7 and pattern matching

I started bumping my current project to the 0.7 version of this crate which has introduced the ConstantRef and TypeRef structs. Unfortunately a lot of my code does a whole bunch of pattern matching on things, for example:

    let min = if let LlvmOperand::ConstantOperand(Constant::Float(Float::Double(val))) = min_ {
        val
    } else {
        Err(Error::NonConstantInputMin)?
    };

Obviously this code no longer compiles, and as far as I know, there is no way
to maintain this neat pattern matching thing. Instead, I'd have to add an inner
match, with additional handling of other constant types.

I know this is a hard problem to solve, so feel free to ignore it. If it's
something that may be interesting to others, I might look into working out a
solution.

Query for LLVM version?

Is there a way to ask llvm-ir for the LLVM version it was configured for?

I am imagining something like:

pub fn llvm_version() -> u32 {
    if cfg!(feature = "llvm-8") {
        return 8;
    }
    ...
}

Clearly, this could be handled in the caller (since it chose the feature). But I am wondering if there is a way to handle this without having to do extra work in the caller.

The `Name` field of `GlobalReference` could be a `String`

The GlobalReference constant has a Name field, where Name is a sum of u64 and String. I believe global variables and functions always have actual names, rather than numbers - perhaps the Name field of GlobalReference could be specialized to String?

Doc request and question regarding where and how to align LLVM versions

This seems like a great tool. I look forward to giving it go! And thus my post.

Despite the documentation already in place that touches on how to align LLVM versions, I identified the following "gaps" (leave it me to come from an unusual angle :)):

  1. I have a .bc file generated using a project specific rustup override set 14.0.6 => compatible with llvm-ir
    Note: I confirmed the rustc version from within the root directory cargo rustc -- --version --verbose and rustup settings.toml/[overrides]

  2. I have a separate project that follows the "getting-started" documentation for llvm-ir (or -analytics) using the feature gated features = ["llvm-14"]

    • attempt 1: use rustc 1.70 and its paired LLVM version 16
      => it should be able to use the latest rustc to compile b/c I have already aligned the parser version with the .bc file I expect it to parse

    • attempt 2: use rustc 1.63.0 and its paired LLVM version 14.0.5
      => compiler error (in the build script):

    ...
    cargo:rustc-link-search=native=/opt/homebrew/Cellar/llvm/16.0.0/lib
    ...
    thread 'main' panicked at 'system library flag "/opt/homebrew/lib/libz3.dylib" does not look like a link library', /Users/edmund/.cargo/registry/src/github.com-1ecc6299db9ec823/llvm-sys-140.1.1/build.rs:284:17
    

I looked at the rustc configuration options... and other places.

A question and a request for information:

  1. Attempt 1 may have failed only because the .bc file was compiled using rustc 16 despite my use of the rustup override. But otherwise my logic is sound. Is it? It's not clear that the chosen feature-gate version means I must also use that LLVM to compile the parsing app.

  2. May I ask how to set the rustc path to the one I set using the rustup override command? (i.e., ignore the instance likely installed when I installed rustup).
    Perhaps as an aside: It failing here makes me think the feature-gate needs to align with the compiler not just of the .bc file from another project, but with the "viewer/parser" itself (that would be worth documenting one way or another).

One last potential clue (where my understanding of how things work get fuzzy): I'm using M1 where, early-on, I needed to create rustflags for the C linkers (hosted in my .cargo/config.toml target-specific entries. I'm hesitant to point the blame here b/c I haven't had issues toggling between cargo versions, but maybe in this setting it might be a source of trouble? (given explicit/face-value interpretation of the error message).

Thank you!

Panic while parsing in llvm-ir-0.10.0/src/constant.rs:1503

llvm-ir = { version = "0.10.0", features = ["llvm-16"] }

Parsing bitcode file, created from clang-16 compiled glibc file "string/strstr.c" panics.
(repo: https://sourceware.org/git/glibc.git, branch azanella/clang)

2024-01-27T19:04:16.836+01:00 DEBUG [llvm_ir::module] Processing a GlobalVariable with type TypeRef(PointerType { addr_space: 0 })
thread 'main' panicked at /home/bitmagier/.cargo/registry/src/index.crates.io-6f17d22bba15001f/llvm-ir-0.10.0/src/constant.rs:1503:103:
Global not found in ctx.global_names; have names [Name("llvm.lifetime.start.p0"), Name("__strstr_sse2_unaligned"), Name("__strstr_generic"), Name("llvm.dbg.declare"), Name("__strstr_avx512"), Name("bcmp"), Name("__strnlen"), Name("strlen"), Name("two_way_long_needle"), Name("__libc_strstr_ifunc"), Name("strstr"), Name("llvm.dbg.value"), Name("strchr"), Name("llvm.memset.p0.i64"), Name("llvm.lifetime.end.p0"), Name("llvm.umax.i64"), Name("SYMBOL_NAME_ifunc_selector"), Name("_dl_x86_cpu_features")]

Bitcode: strstr.o.bc.gz
Full Log: trace.log

LLVM 12

It looks like LLVM 12 was released last month, do you have plans to update this library for it? I'd be happy to help out with the process if you'd like, but I might need some pointers to figure out what needs changing.

Looking at the LLVM-11 update, I suppose one needs to go through the changelog of llvm to find the things that need patching, or is there an easier way?

Linking fails with LLVM 12

Hello,

It seems that LLVM 12 does not have the functions LLVMIsTypeAttribute and LLVMGetTypeAttributeValue defined in the source code (https://github.com/llvm/llvm-project/releases/download/llvmorg-12.0.0/llvm-12.0.0.src.tar.xz). Building with cargo build --features llvm-12 creates a library with both these symbols as undefined and causes a linking error when trying to link against llvm 12.

nm libllvm_ir.rlib | less
                 U LLVMGetAlignment
                 U LLVMGetAttributeCountAtIndex
                 U LLVMGetAttributesAtIndex
                 U LLVMGetComdat
                 U LLVMGetDLLStorageClass
                 U LLVMGetElementType
                 U LLVMGetEnumAttributeKind
                 U LLVMGetEnumAttributeValue <--
                 U LLVMGetFunctionCallConv
                 U LLVMGetLinkage
                 U LLVMGetPersonalityFn
                 U LLVMGetReturnType
                 U LLVMGetTypeAttributeValue
                 U LLVMGetVisibility
                 U LLVMHasPersonalityFn
                 U LLVMIsAFunction
                 U LLVMIsEnumAttribute
                 U LLVMIsFunctionVarArg
                 U LLVMIsStringAttribute
                 U LLVMIsTypeAttribute <---
                 U LLVMTypeOf

The cause seems to the feature flags in src/function.rs (https://github.com/cdisselkoen/llvm-ir/blob/main/src/function.rs#L847) and (https://github.com/cdisselkoen/llvm-ir/blob/main/src/function.rs#L874) referring to both these symbols with the llvm-12 feature enabled.

Both the functions seem to be present from LLVM 13 onwards.

Assertion `isa<X>(Val) && "cast<Ty>() argument of incompatible type!

I got these error when cargo run

Error message

rust-llvm-ir-test: llvm-project/llvm/include/llvm/Support/Casting.h:269: typename llvm::cast_retty<X, Y*>::ret_type llvm::cast(Y*) [with X = llvm::GetElementPtrInst; Y = llvm::Value; typename llvm::cast_retty<X, Y*>::ret_type = llvm::GetElementPtrInst*]: Assertion `isa<X>(Val) && "cast<Ty>() argument of incompatible type!"' failed.
zsh: abort (core dumped)  cargo run

Debug Log

Creating a Module from path "./irs/client.bc"
Created a MemoryBuffer
Parsed bitcode to llvm_sys module
Creating a Module from an LLVMModuleRef
Processing func "__traceiter_9p_client_req"
unknown enum param attr 41
Collected info on 4 parameters
Collected names of 7 basic blocks
Collected names of 16 values
Processing a basic block named Name("entry")
Processing instruction "  call void @__sanitizer_cov_trace_pc() #14"
Processing instruction "  %0 = load volatile %struct.tracepoint_func*, %struct.tracepoint_func** getelementptr inbounds ({ i8*, { %struct.atomic_t, { %struct.device_dma_parameters* } }, %struct.static_call_key*, i8*, i8*, i32 ()*, void ()*, %struct.tracepoint_func* }, { i8*, { %struct.atomic_t, { %struct.device_dma_parameters* } }, %struct.static_call_key*, i8*, i8*, i32 ()*, void ()*, %struct.tracepoint_func* }* @__tracepoint_9p_client_req, i64 0, i32 7), align 8"
rust-llvm-ir-test: llvm-project/llvm/include/llvm/Support/Casting.h:269: typename llvm::cast_retty<X, Y*>::ret_type llvm::cast(Y*) [with X = llvm::GetElementPtrInst; Y = llvm::Value; typename llvm::cast_retty<X, Y*>::ret_type = llvm::GetElementPtrInst*]: Assertion `isa<X>(Val) && "cast<Ty>() argument of incompatible type!"' failed.
zsh: abort (core dumped)  cargo run

Source code
image

Global variables can be named with numbers

I am trying to use llvm-ir to analyse some code compiled by zig, and I get this error message:

Expected global variable or function to have a real name, not a number 7

The message is the same for either .bc or .ll. In my .ll I have

@7 = internal unnamed_addr constant %"[]u8" { i8* getelementptr inbounds ([25 x i8], [25 x i8]* @6, i64 0, i64 0), i64 24 }, align 8

So it looks like a number is actually a valid name?

Passing an invalid bitcode file results in an error that can't be handled by callee

It seems that if you call llvm_ir::Module::from_bc_path() with a path to a file that does not contain valid bitcode, an error occurs which cannot be handled at the call site. Instead the program panics without a backtrace and with the message "error: Invalid bitcode signature". The signature of from_bc_path would lead one to believe that it returns errors that can be handled as a typical rust error.

New release for support LLVM 13

First of all thanks a lot for this crate!

Current latest release (v0.8.1) does not contain some recent changes on supporting LLVM 13 (like this #17). Also docs in this release are still referencing to LLVM 12. Is it possible to release v0.8.2 containing all changes already made since then?

Read bitcode casuses stack overflow

Hi,

I got the following code

#[no_mangle]
fn foo() -> i32 {
    let a = 41 + 42;
    return a;
}

fn main() {
    println!("The magic number {}", foo());
}

That I compile this with

RUSTFLAGS="--emit=llvm-bc --emit=llvm-ir -Clto -Cembed-bitcode=yes" cargo build

where the rustc version is 1.63.

When trying to read this file with the simple program

use llvm_ir::Module;

fn main() {
    let path = "foo.bc";
    let module = Module::from_bc_path(path).unwrap();
    println!("Done parsing");
}

it throws a stack overflowed error when running cargo run. I've tried to increase my stack with ulimit -s all the way up to 2097152kb but with no different result.

When digging thru the code I'd assume when collecting from from_llvm_ref of the different modules it blows the stack. I guess the solution is to wrap each Vec with Box or Rc in order for it to put it on the heap or is there any other solution?

Running the latest commit on llvm-ir (552d11e) and on a Ubuntu 20.04.

Cheers!

New release for LLVM 17?

Hello! As always, thanks a ton for maintaining llvm-ir!

Now that #55 is merged, it'd be great to have a release that includes the llvm-17 feature ๐Ÿ™‚

Indexed type of `GetElementPtr`

As noted in the docs:

The interpretation of each index is dependent on the type being indexed into.

One can't calculate, e.g., the offset that the GEP adds to the pointer without knowing which type it is indexing at. The solution is probably just to add a indexed_type: TypeRef field to GetElementPtr.

This would enable a method like LLVM's accumulateConstantOffset, which would be quite useful for my purposes!

Support for LLVM 16

Rust 1.70 uses LLVM 16 by default, so support here would be nice!

I'll try and do a PR similar to #25.

Support for LLVM 14?

First of all, thanks a ton for this crate!

rustc recently switched to LLVM 14, meaning that the IR produced by recent releases of rustc can't be consumed by this crate.

I can take a stab at this in a couple of days, following the changes needed for LLVM 13: e6fc719

Producing LLVM IR files

I noticed your note about being open to contributions for .ll file generation. I'm interested in working on this feature - it would enable some useful capabilities. If you're still open to it being added, I'd be glad to contribute.

Function declarations

Module.functions contains only the functions defined in the module, and it doesn't look like there's anywhere else to get the declared functions. This is an issue for certain analyses that need e.g., parameter types for declared functions.

A reasonable way to add this to the API would be to add a new field (declarations?) to Module of type Vec<Declaration> for some struct Declaration that shares most of the fields of Function.

Illegal instruction in debug mode when processing MemoryOrdering::from_llvm

When processing the following instruction:

[2021-11-17T13:52:02Z DEBUG llvm_ir::instruction] Processing instruction "  fence seq_cst, !dbg !164"

MemoryOrdering::from_llvm get a parameter of value 3, which corresponding to no variant of the enum, causing illegal instruction.

This bug can only be triggered in debug mode.

LLVM version 13.0.0

OS: ArchLinux

How are values larger than u64 handled for Constant::Int?

Hey,

first of all thank you for this lib, it is very nice to work with!

I noticed that llvm can have int values of arbitrary size, one example mentions an integer with the size of more than a million bits. In the case that raised this question for me, I have a constant integer size of 128, in a normally compiled program. However, the actual value fits inside a u64 and is used as part of the mul instruction.
https://releases.llvm.org/10.0.0/docs/LangRef.html#integer-type

The internal field value that contains the constant value is of size u64.

value: u64, // If the Int is less than 64 bits, the value will be zero-extended to create the Rust `u64` `value` (so if `bits` is 8, the lowest 8 bits of `value` are the relevant bits, and the others are all zeroes). Note that LLVM integers aren't signed or unsigned; each individual instruction indicates whether it's treating the integer as signed or unsigned if necessary (e.g., UDiv vs SDiv).

So my question is, what happens for a constant int value that is larger than a u64?

[FEATURE REQUEST] ability to send the Module struct to the LLVM lib

As pointed out in the README: this library is currently unable to send the Rust-y and safe Module struct back to the LLVM library, and thus can't be used for LLVM IR generation nor for compilation of LLVM IR.

I think this library would benefit from the ability to send the module struct back to the LLVM library.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.