immunant / ibresolver Goto Github PK
View Code? Open in Web Editor NEWA QEMU TCG plugin for resolving indirect branches.
License: BSD 3-Clause "New" or "Revised" License
A QEMU TCG plugin for resolving indirect branches.
License: BSD 3-Clause "New" or "Revised" License
It'd be nice to have this tool find indirect callsites automatically instead of passing in a list of callsites. The two options are
objdump
would require cross-compiling binutils for the target arch which isn't as user-friendly. I'd also need to expand the arm regex since it's missing some indirect jumps (e.g. ldr pc, Rn)Fixing this issue also means we don't have to pass in the binary name twice as explained here so the plugin would only require one arg for the output file.
To resolve jumps to dynamically linked shared objects we currently trace mmap and openat syscalls to track what is in memory. Aside from the overhead of the extra tracing, the syscalls made by ELF interpreters for different architectures vary slightly which makes it harder to support more architectures. Also it seems that vaddrs for non-native binaries can't be dereferenced without adding an offset provided by QEMU so even just checking the filename passed to openat on arm32 is a hassle.
An easier alternative may be to check what's in memory by looking at /proc/$PID/maps
. It turns out that QEMU (and the plugin) and the emulated process have the same pid so instead of tracing syscalls and manually tracking what's in memory we can look at proc/self/maps
from the plugin when we need to resolve any jump. For more info on the format we'd need to parse look for "/proc/[pid]/maps" on this page.
When the plugin encounters consecutive indirect branches the the indirect_branch_exec
callback is registered for both, but branch_skipped
is also registered for the second. This means that results may vary depending on which callback is executed first.
This could be fixed by not registering the branch_skipped
for the second instruction (i.e. the branch_skipped
callback that corresponds to skipping the first branch). Since this scenario is rare in practice, the plugin currently just emits a warning to stdout when it runs into this. It'd be good to have some test cases before making the fix to verify it'll work as expected.
It might be possible to use binja like in #3 to check this. Basically for branch destinations with 32-bit instructions, we'd try to parse them with binja in both ARM and THUMB mode and see if one case fails. u32s that are valid in both may require a more involved solution (e.g. looking at cpu registers), but using binja would be a good first step.
Something to consider once there is a clear need.
The tracer assumes that indirect branches always occur at the end of the translation blocks defined by QEMU to avoid the need for single-step mode. Currently input addresses that are found in the middle of a block will not show up in the output .csv even if they're executed.
While this assumption will very likely always be true for unconditional indirect branches, it'd be nice to log a warning to stderr when one of these inputs is encountered in block_trans_handler
. To avoid needing to iterate through all input callsites in block_trans_handler
we should sort the callsites in qemu_plugin_install
then limit the callsites checked to those within the block being translated.
Switching from tracing syscalls to reading /proc/self/maps
in #2 seems to have broken support for non-native binaries. The args to the syscalls we were tracing had addresses in terms of the guest's memory map which is what we want. For non-native binaries (e.g. arm32), these addresses don't correspond to the addresses in /proc/self/maps
so arm32 doesn't work anymore.
Adding (or subtracting) QEMU's guest_base
let's you go from guest to host addresses and seems to solve the issue. Implementing this fix requires two things
guest_base
, probably by patching QEMU and modifying the plugin API. So far I've tested by adding extern uintptr_t guest_base
to plugin.cpp
but this probably isn't reliable.guest_base
where required in block_trans_handler
and mark_indirect_branch
. We should probably use newtypes instead of uint64_t for the different types of addresses to make things explicit and avoid breaking other use cases.For x86_64 indirect calls like call rax
, binary ninja does not identify them as branch instruction.
Thus even BNBranchType::CallDestination
cannot record these indirect calls.
Maybe by parsing the disassembly text token types can resolve this issue, as shown in https://api.binary.ninja/binaryninja.architecture-module.html#binaryninja.architecture.InstructionTextToken
For some reason binaryninja doesn't mark indirect calls (blx
on arm32 or callq
on x64) as indirect branches so all tests are failing with this backend. I'm not sure if this is a bug or the expected behavior, but it's the same in both is_indirect_branch_default_impl
in src/binaryninja_backend.cpp
and the python equivalent of that function.
It'd be good to make a list of instructions that binaryninja marks as indirect branches to know what to expect in the results and add calls to that list if possible. Binaryninja does mark jmp *%rax
, ldr pc ...
and other indirect jumps correctly though, so a temporary workaround might be to use a custom backend to catch the instructions that binaryninja misses. This would require a slight changes to the Makefile to allow custom backends to have reverse dependencies (i.e. allow dynamically loaded code to call is_indirect_branch_default_impl
).
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.