GithubHelp home page GithubHelp logo

immunant / ibresolver Goto Github PK

View Code? Open in Web Editor NEW
4.0 4.0 2.0 11.49 MB

A QEMU TCG plugin for resolving indirect branches.

License: BSD 3-Clause "New" or "Revised" License

Makefile 3.44% C++ 32.23% C 52.62% Shell 2.86% Python 8.86%

ibresolver's People

Contributors

ayrtonm avatar thedataking avatar

Stargazers

 avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Forkers

learner0x5a

ibresolver's Issues

Remove need to pass in indirect callsites

It'd be nice to have this tool find indirect callsites automatically instead of passing in a list of callsites. The two options are

  1. Shell out to objdump and grep for indirect branches like in find_indirect_*.sh. The grepping could probably be done from the plugin in the post-install initialization, but using objdump would require cross-compiling binutils for the target arch which isn't as user-friendly. I'd also need to expand the arm regex since it's missing some indirect jumps (e.g. ldr pc, Rn)
  2. In the translate block callback pattern match each instruction against the target arch's indirect branches. For x64 this is reasonable since only unconditional jumps/calls can be indirect so there are only a few patterns to check. I'm not sure how involved this would be for ARM (i.e. how many patterns we'd have to match against), but insn sizes are limited to 2 or 4 bytes and the instruction encoding manual is much easier to follow than intel's.

Fixing this issue also means we don't have to pass in the binary name twice as explained here so the plugin would only require one arg for the output file.

Make resolving jumps to dynamically linked shared objects less arch-specific

To resolve jumps to dynamically linked shared objects we currently trace mmap and openat syscalls to track what is in memory. Aside from the overhead of the extra tracing, the syscalls made by ELF interpreters for different architectures vary slightly which makes it harder to support more architectures. Also it seems that vaddrs for non-native binaries can't be dereferenced without adding an offset provided by QEMU so even just checking the filename passed to openat on arm32 is a hassle.

An easier alternative may be to check what's in memory by looking at /proc/$PID/maps. It turns out that QEMU (and the plugin) and the emulated process have the same pid so instead of tracing syscalls and manually tracking what's in memory we can look at proc/self/maps from the plugin when we need to resolve any jump. For more info on the format we'd need to parse look for "/proc/[pid]/maps" on this page.

Consecutive indirect branches are not handled properly

When the plugin encounters consecutive indirect branches the the indirect_branch_exec callback is registered for both, but branch_skipped is also registered for the second. This means that results may vary depending on which callback is executed first.

This could be fixed by not registering the branch_skipped for the second instruction (i.e. the branch_skipped callback that corresponds to skipping the first branch). Since this scenario is rare in practice, the plugin currently just emits a warning to stdout when it runs into this. It'd be good to have some test cases before making the fix to verify it'll work as expected.

Check if branches on ARM32 switch between THUMB/ARM mode

It might be possible to use binja like in #3 to check this. Basically for branch destinations with 32-bit instructions, we'd try to parse them with binja in both ARM and THUMB mode and see if one case fails. u32s that are valid in both may require a more involved solution (e.g. looking at cpu registers), but using binja would be a good first step.

Handle branches that occur in the middle of a block

The tracer assumes that indirect branches always occur at the end of the translation blocks defined by QEMU to avoid the need for single-step mode. Currently input addresses that are found in the middle of a block will not show up in the output .csv even if they're executed.

While this assumption will very likely always be true for unconditional indirect branches, it'd be nice to log a warning to stderr when one of these inputs is encountered in block_trans_handler. To avoid needing to iterate through all input callsites in block_trans_handler we should sort the callsites in qemu_plugin_install then limit the callsites checked to those within the block being translated.

Fix support for non-native binaries

Switching from tracing syscalls to reading /proc/self/maps in #2 seems to have broken support for non-native binaries. The args to the syscalls we were tracing had addresses in terms of the guest's memory map which is what we want. For non-native binaries (e.g. arm32), these addresses don't correspond to the addresses in /proc/self/maps so arm32 doesn't work anymore.

Adding (or subtracting) QEMU's guest_base let's you go from guest to host addresses and seems to solve the issue. Implementing this fix requires two things

  1. Find a way to get access to guest_base, probably by patching QEMU and modifying the plugin API. So far I've tested by adding extern uintptr_t guest_base to plugin.cpp but this probably isn't reliable.
  2. Add guest_base where required in block_trans_handler and mark_indirect_branch. We should probably use newtypes instead of uint64_t for the different types of addresses to make things explicit and avoid breaking other use cases.

The Binaryninja backend doesn't mark indirect calls correctly

For some reason binaryninja doesn't mark indirect calls (blx on arm32 or callq on x64) as indirect branches so all tests are failing with this backend. I'm not sure if this is a bug or the expected behavior, but it's the same in both is_indirect_branch_default_impl in src/binaryninja_backend.cpp and the python equivalent of that function.

It'd be good to make a list of instructions that binaryninja marks as indirect branches to know what to expect in the results and add calls to that list if possible. Binaryninja does mark jmp *%rax, ldr pc ... and other indirect jumps correctly though, so a temporary workaround might be to use a custom backend to catch the instructions that binaryninja misses. This would require a slight changes to the Makefile to allow custom backends to have reverse dependencies (i.e. allow dynamically loaded code to call is_indirect_branch_default_impl).

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.