galoisinc / renovate Goto Github PK
View Code? Open in Web Editor NEWA library for binary analysis and rewriting
A library for binary analysis and rewriting
A few test cases in the SFE Embrittle test suite have been failing for awhile, and it seems Renovate is to blame here. Here's a session showing how refurbish
produces a segfaulting binary when applied to tests/binaries/linked-list.noopt.nostdlib.x86_64.exe
, but not when applied to tests/binaries/linked-list.noopt.stdlib.x86_64.exe
:
stack exec -- refurbish -o tmp/ll.noopt.nostdlib.refurbish tests/binaries/linked-list.noopt.nostdlib.x86_64.exe
stack exec -- refurbish -o tmp/ll.noopt.stdlib.refurbish tests/binaries/linked-list.noopt.stdlib.x86_64.exe
tests/binaries/linked-list.noopt.stdlib.x86_64.exe
chmod +x tmp/ll.noopt.stdlib.refurbish && tmp/ll.noopt.stdlib.refurbish
tests/binaries/linked-list.noopt.nostdlib.x86_64.exe
chmod +x tmp/ll.noopt.nostdlib.refurbish && tmp/ll.noopt.nostdlib.refurbish
zsh: segmentation fault tmp/ll.noopt.nostdlib.refurbish
The nostdlib
variant defines its own _start
, but we have other tests that do this which refurbish
and embrittle
handle just fine.
Permalink to SFE tests dir with artifacts used for above example: https://github.com/GaloisInc/sfe/tree/22eddcc9e98719d177ea84c0ea6500db7ff29631/tests/binaries .
NOTE: I have not tested this on Renovate HEAD
, but it's been going on for at least two weeks and I'm testing on a Renovate commit that's 4 days old.
Making the rewrite action pure (or with restricted effects instead of being in RewriteM
) would enable parallel rewriting.
It seems like the main use of the more elaborate monad context is logging. That can be accomodated in other ways.
There are a number of functions that have been added over time to support some arbitrary patching in binaries. They aren't used in renovate and are a bit too specific to renovate itself. This is conspicuous because they are only implemented for x86. These extra functions should be removed (any clients should have their own set of abstractions that they need).
We should be using internal libraries to expose these internal functions for testing without giving library clients access
See 3cedd6a
Currently, renovate re-assembles basic blocks that the caller has not changed to produce the bytes that will be written back to the new binary. We could instead just copy the bytes from the original, which would be robust against re-assembly bugs (especially relevant for the x86_64 backend).
We would need to be a bit careful here to ensure that this is only applied to completely unchanged blocks (i.e., that the block successor does not need to change).
It isn't clear that this would be a significant win, but it could avoid some annoying re-assembly problems.
Currently, neither implementation ever returns Nothing
and just panics if a non-jump is passed in or if the required jump is not realizable. There is a redundant check at the call sites if we want it to be able to just panic.
We can either make it "total" (modulo the panics) or just return 'Nothing' when we currently panic and push the panic to the caller. There is additional arch-specific information available to provide more accurate panics in the backend implementations, so that might be a useful concern. On the other hand, the call site (mapJumpAddress
) is in a context that could turn the failure into an exception that could be recoverable by a renovate client.
There are currently separate notions of jump table relocations (represented as the TaggedInstruction
type) and data address relocations (represented as arch-specific instruction annotation types). We should unify these in a single arch-independent (but potentially arch-extensible) Relocation
type. Both are currently processed in the same function (isaConcretizeAddresses
), which is a good indication that they should just be one type now.
We could save a lot of memory by re-disassembling each block on-demand, while only storing bytestrings (or, better, slices of the original binary) and the disassembler function.
The rewriter makes the implicit assumption that there is always a data section before the bss. It then extends that section to cover the bss address range. In a binary with no data, the text section can be expanded, which breaks the rewriter later.
The changes I introduced in #20 broke refurbish, I didn't notice this because the CI build for refurbish has been broken for a long time, so it simply stayed broken: http://fryingpan.dev.galois.com/hydra/jobset/sfe/master.HEADs-ghc844-7#tabs-jobs
Add an additional rewriter API that allows client code to receive notifications about uninstrumentable blocks.
This could be a callback or possibly just an additional return value.
Ideally, we can also include a reason why the block was considered uninstrumentable.
The usage examples in the top level Renovate module are copied from source files under the examples directory, and are not checked against the current API. It would be really nice to enforce that the examples in the documentation still compile. It seems like we should be able to do this with doctest.
Neither implementation needs it at this point
To fix the issues in #48, @travitch suggests we revisit the segment ordering in https://github.com/GaloisInc/renovate/blob/master/renovate/src/Renovate/BinaryFormat/ELF.hs. Specifically:
It seems like at the very least
EXIDX
has to come before the first loadable segment, and perhaps even be the first (the documentation I found was unclear)
It could be the case that we need per-architecture layouts (I hope we can avoid it, but that is the worst case)
It might be that we can just make sure we leaveEXIDX
as the first segment if it is present
Ah one important thing to note:EXIDX
needs to be an early segment, but its position in the file (represented by the offset in the output ofreadelf
) need not be first. The ordering of segments is just their position in the program headers table
Right now, the rewriter lays out each basic block individually (unless the loop body locality optimization is enabled). This has a negative impact on performance in critical code, and also fragments the code space, making it harder to re-use space in the original text section by requiring more redirecting jumps (one for each block). We could recover a significant amount of space by redirecting execution at the start of a block, rather than at each basic block. Additionally, it would be faster, especially if we keep the same block layout within each function as the compiler generated.
The JumpType
type is quite rich, but that data comes at a cost: to discover anything about a jump instruction, you must provide a Memory
and the address of that instruction:
, isaJumpType :: forall t . Instruction arch t -> MM.Memory (MM.ArchAddrWidth arch) -> ConcreteAddress arch -> JumpType arch
However, some of the information in JumpType
could likely be deduced without access to the Memory
or ConcreteAddress
, but just from the opcode and operands, such as whether the jump is
The test suite for refurbish
uses docker to contain individual test cases (in case they go rogue due to a binary rewriter failure). Unfortunately, this doesn't work inside a docker container, as we have in Github Actions. We should have a special flag to just not use the Docker-based test runner that we can employ under CI.
There is a mix right now. This would also remove the located-base dependency.
With some recent changes, each architecture-specific backend provides its own functions that recovers blocks based on macaw results. These functions produce precise error information in the case of failures, but the caller receives the reported errors as simply Nothing
. The call site should change to preserve the more precise errors.
This is obsolete now that the refurbish tool exists in a separate package
Removing this tool will remove the dependency of the core on the PowerPC backend
The refurbish
tool would be a good place to dump metrics for code discovery failures, along with detailed listings of the errors at each failure. We could also dump the machine code leading up to the failure to help triage.
Renovate fails silently and catastrophically if its re-assembled block size is inaccurate. This happens due to bugs in flexdis86 where re-assembly is incorrect. We should be checking as soon as we construct a renovate ConcreteBlock
from a macaw ParsedBlock
(using blockSize
). Fail if they do not match, as it will just silently introduce run-time errors.
It looks like the layout of the ELF segments in the second writing pass violates some restrictions. It isn't clear how well this should work. It would be nice to support, but we'd have to think about what the API needs to look like
The current sequence stores a 4 byte pointer in the instruction stream to permit jumps longer than 14 bits of offset. Storing a pointer is not position independent. We should see if we can store an offset instead, then perform a register PC-relative offset jump instead.
src/Renovate/Arch/PPC/ISA.hs:85:3: warning: [-Wmissing-fields]
• Fields of ‘R.ISA’ not initialised: isaMakeSymbolicCall, isaMove,
isaMoveImmediate, isaLoad, isaStore, isaStoreImmediate,
isaAddImmediate, isaSubtractImmediate
• In the expression:
R.ISA
{R.isaInstructionSize = ppcInstrSize,
R.isaPrettyInstruction = ppcPrettyInstruction,
R.isaMakePadding = ppcMakePadding,
R.isaMakeRelativeJumpTo = ppcMakeRelativeJumpTo,
R.isaMaxRelativeJumpSize = ppcMaxRelativeJumpSize,
R.isaJumpType = ppcJumpType,
R.isaModifyJumpTarget = ppcModifyJumpTarget,
R.isaMakeSymbolicJump = ppcMakeSymbolicJump,
R.isaConcretizeAddresses = ppcConcretizeAddresses,
R.isaSymbolizeAddresses = ppcSymbolizeAddresses}
In an equation for ‘isa’:
isa
= R.ISA
{R.isaInstructionSize = ppcInstrSize,
R.isaPrettyInstruction = ppcPrettyInstruction,
R.isaMakePadding = ppcMakePadding,
R.isaMakeRelativeJumpTo = ppcMakeRelativeJumpTo,
R.isaMaxRelativeJumpSize = ppcMaxRelativeJumpSize,
R.isaJumpType = ppcJumpType,
R.isaModifyJumpTarget = ppcModifyJumpTarget,
R.isaMakeSymbolicJump = ppcMakeSymbolicJump,
R.isaConcretizeAddresses = ppcConcretizeAddresses,
R.isaSymbolizeAddresses = ppcSymbolizeAddresses}
|
85 | R.ISA { R.isaInstructionSize = ppcInstrSize
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^...
Currently there is no support for rewriting thumb. It is not inherently harder than ARM, but does require a bit more care because it only supports much shorter jumps. We may also have to be conservative in the presence of instructions like the IT
family.
There are currently a large number of type aliases (and confusingly-named types) used internally in renovate core. Moving all of these to newtypes and cleaning up the names will significantly improve understandability.
This path doesn't exist on master
:
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.