GithubHelp home page GithubHelp logo

UB when sdiv overflows about remill HOT 6 OPEN

katrinafyi avatar katrinafyi commented on June 15, 2024
UB when sdiv overflows

from remill.

Comments (6)

pgoodman avatar pgoodman commented on June 15, 2024

Here's where we implement it for aarch64: https://github.com/lifting-bits/remill/blob/master/lib/Arch/AArch64/Semantics/BINARY.cpp#L140-L151

For x86, I think we explicitly detect signed underflow, though it adds a bunch of complexity to the semantics:
https://github.com/lifting-bits/remill/blob/master/lib/Arch/X86/Semantics/BINARY.cpp#L488-L489

Also the auto-formatting with MAKE_IDIVxax on that file looks awful but that is unrelated.

@katrinafyi I see your llvm-translator project using Remill. What are you trying to accomplish where (not?) flagging this UB is significant? Are later optimization passes replacing the sdiv with undef? Also, where in that linked document is the underflow behavior specified?

Perhaps one thing to consider is that we try do the equivalent of a freezing operation, to mirror LLVM's freeze instruction. @2over12 thoughts?

from remill.

katrinafyi avatar katrinafyi commented on June 15, 2024

Thanks for the reply, I'm not aware of any major problems this causes. We're just working towards validating lifters using the ARM specifications as a trustworthy source of truth. So there's no concrete application asides from the goal of being as accurate as possible.

In the ARM ISA, it is specified implicitly. Here, the integer type is an arbitrary precision two's complement integer. result<datasize-1:0> then extracts the lower 64 bits. The INT_MIN result is obtained from the lower 64 bits of the two's complement of the true numeric result +2^63.

from remill.

pgoodman avatar pgoodman commented on June 15, 2024

Ah interesting! @regehr had previously done a bit of translation validation with remill (through anvill) and alive2 with opt-fuzz. I believe they eventually switched to their own custom AArch64 binary lifter/translator.

@lkorenc does a bit of translation validation-type stuff internally with a separate but related system. It might be worth a longer discussion on the #binary-lifting channel of the Empire Hacking Slack.

from remill.

regehr avatar regehr commented on June 15, 2024

hi this is interesting! as Peter says, we have a lifter from AArch64 to LLVM, but we're not trying to find bugs in the lifter, but rather using the lifter + Alive2 to find bugs in LLVM, you can see our current list here, these are all silent miscompiles:

https://github.com/regehr/alive2/blob/arm-tv/BugList.md#miscompilation-bugs-found-in-arm64-backend

I don't think we've done anything particularly interesting while lifting sdiv, but we definitely had to be careful when lifting things like shift exponents, and that Remill had to be similarly careful.

the thing I'm currently a bit stuck on with this project is the assembly-level memory model is sufficiently different from the LLVM memory model that we're going to need to rip out the Alive2 memory model and make it pluggable, so that it can use one set of rules for LLVM IR and another for lifted ARM code, which of course freely mixes integers and pointers in ways that make it pretty difficult to reason about.

@katrinafyi that's interesting to hear that you're using the ARM semantics. we're currently not, but I've spent a bunch of time getting executable code out of Sail and Isla (tools from Peter Sewell's group that make it easier to work with ARM's semantics). currently that's a bit on the back burner since my postdoc who was supporting this project had to leave his position, but I want to keep working on it.

from remill.

katrinafyi avatar katrinafyi commented on June 15, 2024

Very cool! Our project is to derive a trustworthy lifter for ARMv8, based on Arm's machine-readable specifications. This is coming along nicely and it lifts a large subset of AArch64 to LLVM IR. We also use Alive2 for bidirectional translation validation to compare its results to other lifters, which is how we found these edge cases. Overall, we found Remill's semantics were actually very accurate; this and #656 being the only discrepancies of note.

@regehr That's an interesting project and an impressive list of compilation bugs. This is definitely a case where you'd need to trust the lifter you're using! We had looked at Sail for our work too but had trouble along the way; its definitions can be cumbersome and its effect type is difficult to translate and reason about.

Working with the ARM semantics directly is actually not too bad. The language is relatively simple and Alastair Reid has published tools for parsing and manipulating the ASL specification language, which we make use of in our lifter.

from remill.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.