GithubHelp home page GithubHelp logo

Comments (8)

RyanGlScott avatar RyanGlScott commented on August 25, 2024 1

Thanks, this is helpful to know.

For the time being, I suppose we could simply not support opaque pointers in uc-crux-llvm, and we could have uc-crux-llvm always invoke Clang with the necessary options to disable opaque pointers in LLVM bitcode. Unfortunately, this is not a viable long-term strategy, as LLVM 17 will drop support for non-opaque pointers entirely. We will need to have some kind of coping strategy if excising uc-crux-llvm of pointee types proves too difficult.

from crucible.

kquick avatar kquick commented on August 25, 2024

I agree on changing the equality perspective for overrides.

from crucible.

langston-barrett avatar langston-barrett commented on August 25, 2024

The dependency on pointer types is deeply, deeply ingrained into UC-Crux-LLVM. UC-Crux attempts to uncover function preconditions by repeatedly executing a function starting from "minimal" symbolic inputs and attempting to "fix" any undefined behavior it encounters (see this post). In particular, when allocating a pointer in the input, it uses the type of that pointer to determine the size of the allocation, and later, to deduce reasonable preconditions that might be applicable to the backing memory (e.g., if it's a pointer to an array of integers, some of those integers might have some maximum bound). Excising the use of pointer types would be a substantial undertaking.

from crucible.

RyanGlScott avatar RyanGlScott commented on August 25, 2024

I certainly believe you when you say that this would be a substantial undertaking. That being said, I am curious to know how uc-crux-llvm's current approach to tracking pointee types works, if only to better my understanding of how much work would be required:

In particular, when allocating a pointer in the input, it uses the type of that pointer to determine the size of the allocation, and later, to deduce reasonable preconditions that might be applicable to the backing memory (e.g., if it's a pointer to an array of integers, some of those integers might have some maximum bound).

At first glance, this approach seems to be compatible with opaque pointers. Even if pointer types are themselves opaque, their allocation sites are always annotated with their type. For example, an alloca <ty> instruction will return something of type ptr, but the <ty> should tell you everything you need to know about size, preconditions, etc.

The part where this gets tricky is when LLVM reinterprets the underlying memory at a different type, be it through bitcasts, getelementptr, or some other mechanism. For example, see here for a tricky example where a pointer is allocated with a "pointee type" of [2 x [2 x i8]], but a subsequent store instruction reinterprets the underlying memory at type i32. How does uc-crux-llvm deal with this sort of situation?

from crucible.

langston-barrett avatar langston-barrett commented on August 25, 2024

Even if pointer types are themselves opaque, their allocation sites are always annotated with their type.

Except for heap allocations 🙂 But anyway, the problem is that UC-Crux attempts to execute individual functions from the program, and must allocate memory itself to do so. Consider this function:

int f(int *x) {
    return x[5] + 1;
}

UC-Crux would first execute this with an "intial pointer" (see the blog post). It would fail when reading from x, because there wouldn't be any allocation backing it. On the next attempt, UC-Crux would use the function's signature to deduce that x is a pointer to an array of integers, and allocate an array of integers with length 6. Executing the function this time results in a signed integer overflow, so UC-Crux would deduce that the integer at index 5 must be signed-less-than INT_MAX. After this modification, execution would succeed.

These preconditions are stored in a Shape, as you highlighted above. The Shape is tied to (and type-indexed by) the LLVM type of the function argument, and the logic for deducing the preconditions depends on the type/shape involved. Shapes are indexed by Cursors, which also use high-level notions like struct and array types, rather than byte offsets.

To handle opaque pointers, this logic would all need to be based on offsets. Additionally, UC-Crux would either have to make shot-in-the-dark guesses about allocation sizes, or would have to instrument load/store/getelementptr instructions to track the types used and relate them back to the corresponding part of the input being loaded from.

How does uc-crux-llvm deal with this sort of situation?

It gives up a lot! It can simply fail to find adequate preconditions.

from crucible.

langston-barrett avatar langston-barrett commented on August 25, 2024

Yeah for sure!

We will need to have some kind of coping strategy if excising uc-crux-llvm of pointee types proves too difficult.

We can always replace pointer types with a pointer to an unbounded array of bytes, UC-Crux will just give you bad results (and give up often).

The original UC-symex paper had a lazy memory model that let them avoid this issue around allocation sizes, but again, that's a big project.

from crucible.

kquick avatar kquick commented on August 25, 2024

I think another, and possibly more sound approach would be to use the type specified at the usage points rather than the function declaration. Using the previous example:

int f(int *x) {
  return x[5] + 1;
}

the resulting llvm bitcode (from clang 17 with only opaque pointers) is:

define i32 @f(ptr nocapture noundef readonly %0) ... {
  %2 = getelementptr i32, ptr %0, i64 5
  ...
}

From the above, uc-crux-llvm should still be able to infer that the pointer can be interpreted as an array of at least 5 i32 elements and the value of that 5th element may have constraints.

The soundness aspect I referred to earlier is that if the pointer is dereferenced with different types, uc-crux-llvm should be able to identify preconditions for all of the type uses rather than just those for the declared input type.

I realize this is probably a significant extension of the current capabilities of uc-crux-llvm (or perhaps not: I haven't looked at the implementation details), but I think it is probably a reasonable path forward. Let me know if I've oversimplified or overlooked something here, however.

from crucible.

langston-barrett avatar langston-barrett commented on August 25, 2024

From the above, uc-crux-llvm should still be able to infer that the pointer can be interpreted as an array of at least 5 i32 elements and the value of that 5th element may have constraints.

Note that pointer type information was static, so UC-Crux had an idea of what shape to make arguments before executing the function. While in this simple example one could reasonably propagate the type at the use-site back up to the argument, in general that would require heap-aware data-flow analysis: for example, the pointer coming from the function argument could be stored to the heap, only later to be read out and used in some callee of a callee of the function under analysis.

We could consider dynamic tracking of pointer usages, but this gets weird if e.g., we add a symbolic offset to a pointer before casting it to a particular type.

Regardless, anything based on GEP types would only work until they get rid of GEP.

IMO the correct solution is to view memory as untyped, and to expand allocations when the function under analysis hits a memory error that indicates that an allocation was too small.

from crucible.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.