galoisinc / renovate Goto Github PK

View Code? Open in Web Editor NEW

46.0 46.0 5.0 2.86 MB

A library for binary analysis and rewriting

Haskell 89.33% Makefile 0.37% C 1.40% Assembly 0.52% Shell 0.08% Dockerfile 0.08% Roff 8.23%

renovate's People

Contributors

Stargazers

Watchers

Forkers

wonderzdh karabijavad pnwamk travitch galoisinc

renovate's Issues

Refurbish produces segfaulting binary

A few test cases in the SFE Embrittle test suite have been failing for awhile, and it seems Renovate is to blame here. Here's a session showing how refurbish produces a segfaulting binary when applied to tests/binaries/linked-list.noopt.nostdlib.x86_64.exe, but not when applied to tests/binaries/linked-list.noopt.stdlib.x86_64.exe:

stack exec -- refurbish -o tmp/ll.noopt.nostdlib.refurbish tests/binaries/linked-list.noopt.nostdlib.x86_64.exe

stack exec -- refurbish -o tmp/ll.noopt.stdlib.refurbish tests/binaries/linked-list.noopt.stdlib.x86_64.exe

tests/binaries/linked-list.noopt.stdlib.x86_64.exe 

chmod +x tmp/ll.noopt.stdlib.refurbish && tmp/ll.noopt.stdlib.refurbish

tests/binaries/linked-list.noopt.nostdlib.x86_64.exe             

chmod +x tmp/ll.noopt.nostdlib.refurbish && tmp/ll.noopt.nostdlib.refurbish
zsh: segmentation fault  tmp/ll.noopt.nostdlib.refurbish

The nostdlib variant defines its own _start, but we have other tests that do this which refurbish and embrittle handle just fine.

Permalink to SFE tests dir with artifacts used for above example: https://github.com/GaloisInc/sfe/tree/22eddcc9e98719d177ea84c0ea6500db7ff29631/tests/binaries .

NOTE: I have not tested this on Renovate HEAD, but it's been going on for at least two weeks and I'm testing on a Renovate commit that's 4 days old.

Enable parallel rewriting

Making the rewrite action pure (or with restricted effects instead of being in RewriteM) would enable parallel rewriting.

It seems like the main use of the more elaborate monad context is logging. That can be accomodated in other ways.

Remove rewriting functions from the ISA that aren't used in renovate

There are a number of functions that have been added over time to support some arbitrary patching in binaries. They aren't used in renovate and are a bit too specific to renovate itself. This is conspicuous because they are only implemented for x86. These extra functions should be removed (any clients should have their own set of abstractions that they need).

Avoid exporting .Internal modules for testing

We should be using internal libraries to expose these internal functions for testing without giving library clients access

Additional comments for PC-relative data accesses with LDR

See 3cedd6a

Make re-assembly more robust

Currently, renovate re-assembles basic blocks that the caller has not changed to produce the bytes that will be written back to the new binary. We could instead just copy the bytes from the original, which would be robust against re-assembly bugs (especially relevant for the x86_64 backend).

We would need to be a bit careful here to ensure that this is only applied to completely unchanged blocks (i.e., that the block successor does not need to change).

It isn't clear that this would be a significant win, but it could avoid some annoying re-assembly problems.

Change the type of isaModifyJumpTarget

Currently, neither implementation ever returns Nothing and just panics if a non-jump is passed in or if the required jump is not realizable. There is a redundant check at the call sites if we want it to be able to just panic.

We can either make it "total" (modulo the panics) or just return 'Nothing' when we currently panic and push the panic to the caller. There is additional arch-specific information available to provide more accurate panics in the backend implementations, so that might be a useful concern. On the other hand, the call site (mapJumpAddress) is in a context that could turn the failure into an exception that could be recoverable by a renovate client.

Unify jump targets and instruction annotations as Relocations

There are currently separate notions of jump table relocations (represented as the TaggedInstruction type) and data address relocations (represented as arch-specific instruction annotation types). We should unify these in a single arch-independent (but potentially arch-extensible) Relocation type. Both are currently processed in the same function (isaConcretizeAddresses), which is a good indication that they should just be one type now.

Reducing space use

We could save a lot of memory by re-disassembling each block on-demand, while only storing bytestrings (or, better, slices of the original binary) and the disassembler function.

Rewriting a binary with bss but no data fails

The rewriter makes the implicit assumption that there is always a data section before the bss. It then extends that section to cover the bss address range. In a binary with no data, the text section can be expanded, which breaks the rewriter later.

Refurbish does not compile due to rewriteElf changes

The changes I introduced in #20 broke refurbish, I didn't notice this because the CI build for refurbish has been broken for a long time, so it simply stayed broken: http://fryingpan.dev.galois.com/hydra/jobset/sfe/master.HEADs-ghc844-7#tabs-jobs

API for reporting uninstrumentable blocks

Add an additional rewriter API that allows client code to receive notifications about uninstrumentable blocks.

This could be a callback or possibly just an additional return value.

Ideally, we can also include a reason why the block was considered uninstrumentable.

Attempt to use doctest for the usage examples

The usage examples in the top level Renovate module are copied from source files under the examples directory, and are not checked against the current API. It would be really nice to enforce that the examples in the documentation still compile. It seems like we should be able to do this with doctest.

Remove the symbolic address lookup argument from isaSymbolizeAddresses

Neither implementation needs it at this point

Revisit ordering of segments in the program header table on AArch32

To fix the issues in #48, @travitch suggests we revisit the segment ordering in https://github.com/GaloisInc/renovate/blob/master/renovate/src/Renovate/BinaryFormat/ELF.hs. Specifically:

It seems like at the very least EXIDX has to come before the first loadable segment, and perhaps even be the first (the documentation I found was unclear)
It could be the case that we need per-architecture layouts (I hope we can avoid it, but that is the worst case)
It might be that we can just make sure we leave EXIDX as the first segment if it is present
Ah one important thing to note: EXIDX needs to be an early segment, but its position in the file (represented by the offset in the output of readelf) need not be first. The ordering of segments is just their position in the program headers table

Add function-level re-use

Right now, the rewriter lays out each basic block individually (unless the loop body locality optimization is enabled). This has a negative impact on performance in critical code, and also fragments the code space, making it harder to re-use space in the original text section by requiring more redirecting jumps (one for each block). We could recover a significant amount of space by redirecting execution at the start of a block, rather than at each basic block. Additionally, it would be faster, especially if we keep the same block layout within each function as the compiler generated.

Expose more granular information about jumps through ISA

The JumpType type is quite rich, but that data comes at a cost: to discover anything about a jump instruction, you must provide a Memory and the address of that instruction:

  , isaJumpType :: forall t . Instruction arch t -> MM.Memory (MM.ArchAddrWidth arch) -> ConcreteAddress arch -> JumpType arch

However, some of the information in JumpType could likely be deduced without access to the Memory or ConcreteAddress, but just from the opcode and operands, such as whether the jump is

Direct or indirect
Conditional or unconditional
A return or not
Not a jump

Avoid Docker in test suite under CI

The test suite for refurbish uses docker to contain individual test cases (in case they go rogue due to a binary rewriter failure). Unfortunately, this doesn't work inside a docker container, as we have in Github Actions. We should have a special flag to just not use the Docker-based test runner that we can employ under CI.

Replace uses of error with panic

There is a mix right now. This would also remove the located-base dependency.

Specific errors are dropped during recovery

With some recent changes, each architecture-specific backend provides its own functions that recovers blocks based on macaw results. These functions produce precise error information in the case of failures, but the caller receives the reported errors as simply Nothing. The call site should change to preserve the more precise errors.

Remove the run-discovery tool

This is obsolete now that the refurbish tool exists in a separate package

Removing this tool will remove the dependency of the core on the PowerPC backend

Code discovery failure visualization

The refurbish tool would be a good place to dump metrics for code discovery failures, along with detailed listings of the errors at each failure. We could also dump the machine code leading up to the failure to help triage.

Add a consistency check when recovering blocks

Renovate fails silently and catastrophically if its re-assembled block size is inaccurate. This happens due to bugs in flexdis86 where re-assembly is incorrect. We should be checking as soon as we construct a renovate ConcreteBlock from a macaw ParsedBlock (using blockSize). Fail if they do not match, as it will just silently introduce run-time errors.

Rewriting an already rewritten binary breaks

It looks like the layout of the ELF segments in the second writing pass violates some restrictions. It isn't clear how well this should work. It would be nice to support, but we'd have to think about what the API needs to look like

Investigate a better code sequence for long jumps on AArch32

The current sequence stores a 4 byte pointer in the instruction stream to permit jumps longer than 14 bits of offset. Storing a pointer is not position independent. We should see if we can store an offset instead, then perform a register PC-relative offset jump instead.

PPC: Some fields of ISA not initialized

src/Renovate/Arch/PPC/ISA.hs:85:3: warning: [-Wmissing-fields]
    • Fields of ‘R.ISA’ not initialised: isaMakeSymbolicCall, isaMove,
                                         isaMoveImmediate, isaLoad, isaStore, isaStoreImmediate,
                                         isaAddImmediate, isaSubtractImmediate
    • In the expression:
        R.ISA
          {R.isaInstructionSize = ppcInstrSize,
           R.isaPrettyInstruction = ppcPrettyInstruction,
           R.isaMakePadding = ppcMakePadding,
           R.isaMakeRelativeJumpTo = ppcMakeRelativeJumpTo,
           R.isaMaxRelativeJumpSize = ppcMaxRelativeJumpSize,
           R.isaJumpType = ppcJumpType,
           R.isaModifyJumpTarget = ppcModifyJumpTarget,
           R.isaMakeSymbolicJump = ppcMakeSymbolicJump,
           R.isaConcretizeAddresses = ppcConcretizeAddresses,
           R.isaSymbolizeAddresses = ppcSymbolizeAddresses}
      In an equation for ‘isa’:
          isa
            = R.ISA
                {R.isaInstructionSize = ppcInstrSize,
                 R.isaPrettyInstruction = ppcPrettyInstruction,
                 R.isaMakePadding = ppcMakePadding,
                 R.isaMakeRelativeJumpTo = ppcMakeRelativeJumpTo,
                 R.isaMaxRelativeJumpSize = ppcMaxRelativeJumpSize,
                 R.isaJumpType = ppcJumpType,
                 R.isaModifyJumpTarget = ppcModifyJumpTarget,
                 R.isaMakeSymbolicJump = ppcMakeSymbolicJump,
                 R.isaConcretizeAddresses = ppcConcretizeAddresses,
                 R.isaSymbolizeAddresses = ppcSymbolizeAddresses}
   |
85 |   R.ISA { R.isaInstructionSize = ppcInstrSize
   |   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^...

Support rewriting Thumb

Currently there is no support for rewriting thumb. It is not inherently harder than ARM, but does require a bit more care because it only supports much shorter jumps. We may also have to be conservative in the presence of instructions like the IT family.

Improve internal type safety

There are currently a large number of type aliases (and confusingly-named types) used internally in renovate core. Moving all of these to newtypes and cleaning up the names will significantly improve understandability.

Outdated path in Refurbish.Tutorial

This path doesn't exist on master:

renovate/refurbish/src/Refurbish/Tutorial.hs

Line 8 in 73c4358

repository under the @renovate\/renovate\/examples@ directory.

galoisinc / renovate Goto Github PK

renovate's People

Contributors

Stargazers

Watchers

Forkers

renovate's Issues

Recommend Projects

Recommend Topics

Recommend Org

Jobs