The remill's discuss from lifting-bits

Improve external dependencies

I think dependency management needs to be improved. Right now the bootstrap.sh script downloads and builds some packages from source. It would be preferable to use existing package managers to get these libraries.

Provide a script for unpacking and installing XED to a common library directory, e.g. /usr/local/lib. This should allow the build process to avoid nasty hacks related to linking against the third-party folder.
Use OS package managers (aptitude, homebrew, etc.) to download and install things like protobufs, glog, etc. This should also be used for pip to install the python bindings of things.
Include the protoc-produced Python- and C++-generated code for using CFG.proto. Changes to CFG.proto should result in these auto-generated files being updated in the repo.
Download and globally install the LLVM release, assuming it is not already installed.

These steps will make it easier to have a bunch of simple binaries (e.g. remill-opt, remill-lift, etc.) that can be installed to system directories, without needing to reference stuff in the third_party directory. Ideally, this type of change will enable Remill itself to be packageable.

Make algorithm intrinsic

There should be a number of pre-defined algorithms (using an enum to list them all) that can be invoked by semantics functions. These would roughly correspond to instructions available in hardware, e.g. logarithms, tangents, etc.

Eliminate (n)curses dependency

I am not even sure what needs it. It may actually be LLVM libraries. If so, then this issue isn't really doable.

Implement conditional interrupts like conditional branches

Conditional branches modify the BRANCH_TAKEN variable. The translator then uses the value of this variable to decide to tail-call to one block function or another. I think that conditional interrupt instructions, like into and bound can be similarly implemented. There is already the INTERRUPT_TAKEN variable that is modified, as shown below:

DEF_ISEL_SEM(INTO) {
  INTERRUPT_TAKEN = FLAG_OF;
  INTERRUPT_VECTOR = 4;
}

In the above code, INTERRUPT_TAKEN maps into a field in the State structure. I think this is particularly ugly. A better solution would be to use the BRANCH_TAKEN variable in __remill_basic_block, thereby not polluting the State structure with fields that cannot opaquely [1] be represented across architectures.

I think instead we can just "take over" BRANCH_TAKEN variable. This particular nuance is "hidden" by the code translator. For example, the semantics of JO (jump on overflow) are:

DEF_SEM(JO, R8W cond, PC taken_pc, PC not_taken_pc) {
  auto take_branch = FLAG_OF;
  Write(cond, take_branch);
  Write(REG_PC, Select<addr_t>(take_branch, taken_pc, not_taken_pc));
}

The R8W cond argument is actually a pointer to the BRANCH_TAKEN. We can see the addition of this argument in the DecodeConditionalBranch code.

What I mean by "opaquely" is that "front line" code implementing the __remill_interrup_call intrinsic should not need to know the actual contents of the state structure itself. Imagine a scenario where you have a symbolic executor, and you point it at some lifted bitcode, as well as a shared library implementing a system call model. The implementation of __remill_interrupt_call should only "pass the buck" into the shared library's code. It should not need to inspect a field telling it if the interrupt should be taken, because then it needs to know the structure of the State struct, and thus would not be architecture-agnostic. Of course, we still have the pesky interrupt vector field, which is a nuisance. There is a solution to this, though. The State structure could be a derived class of an architecture-neutral class with common elements. The opaque implementations of intrinsics could generically operate on the State structure's base class, thus gaining extra info. I think this is acceptable for interrupt vector numbers, but not acceptable for conditional execution of an interrupt.

Run Valgrind on Remill

Use the Valgrind annotations to enable/disable checking around execution of native and lifted code. Periodically ensure an absence of errors.

The same should be done for cfg_to_bc.

Clang static analyzer

Update the build system's --dry_run option (or add something similar) to produce a compilation database [1] that can be consumed by the Clang Static Analyzer.

[1] http://clang.llvm.org/docs/JSONCompilationDatabase.html

Convert inteprocedural register/flag kill analysis into an LLVM pass

There are some benefits to this:

It will not be x86-specific, so a theoretic port to another architecture will benefit from the same analysis.
It will not require all code to be present in a single CFG file. Large executables push or exceed the maximum protobuf sizes, so switching to using many, more fine-grained CFGs (e.g. one per function) makes sense.
It will be "simple" insofar as it can use identical local variable names to identify dead stores, thus avoiding alias analysis altogether.

X87 Instruction support

Implement and test the following instructions:

Add State structure reference to memory access intrinsics.

This would probably be very useful, especially for implementations that want to access things like the program counter from within the memory intrinsics.

Implement IDA script for producing one or more CFG protos

There used to be one but it was over complicated and hacky (see commit history). The proto format has since changed, so having a new script would be helpful.

Type conversion instruction support

Implement and test the following instructions:

Add jump tables to CFG proto

A new structure in the protobuf that represents indirect control flows. This should be represented as a pair of program counters: the address of the control-flow instruction, and the address of the targeted block. This field should also have some kind of flow type annotation, e.g. is_local, that semantically says that the source block and target block logically belong to the same function.
Represent jump tables using this new format. That is, each entry of the jump table should be an element of this structure.
Update the data-flow analyses in Remill to understand indirect flows via jump tables. This will improve dead register and flag elimination.

Add undef dead store eliminator to libOptimize

This should eliminate stores of undef values, as well as memsets of undef values.

Investigate using OpenCL's ext_vector_type attribute to back vecN_t types.

It's possible that this would work better for ARM's NEON implementation of SIMD. Right now I use the vector_size attribute.

Add CPU time stamp counter access as an instrinsic.

Represent ST registers passed to instruction semantic functions as integral indexes, e.g. ST(0) is 0.

That way the stack register can be computed in terms of the data pointer and the index argument passed to the semantic function.

Use the new Intel XED kits instead of PIN kits.

The XED kits can be found here:
https://software.intel.com/en-us/protected-download/267266/560870/step2

Convert __mcsema_memory_order into a third parameter to control-flow intrinsics instead of having it being a global variable.

I have a hunch that this will further improve compiler optimizations, as well as tightening the optimized code to not include so many loads/stores.

Add multiply strength reduction to libOptimize

The following pattern some times comes up:

  %26 = mul nsw i128 %25, -8198552921648689607
  %27 = trunc i128 %26 to i64
  store i64 %27, i64* %3, align 8
  store i64 4216879, i64* %5, align 8
  %trunc = trunc i128 %26 to i32

The only uses of %26 are trunc instructions, so I should be able to strength reduce the 128-bit multiplication to be a 64-bit multiplication.

Investigate AFL on Remill bitcode

Add MPX regs to State structure

Also add accompanying test cases.

Analyze instructions in CFG file to do feature detection

Pre-process the instructions CFG before lifting to feature detect for things like SSEn, AVXn, etc. and broadly categorize into: no-AVX, AVX (includes AVX2), and AVX512.

Replace build.py with CMake

Ideally, it would be cool if there were a way to "install" Remill somewhere.

Consider using SQLite in place of protobufs

Some of the advantages are:

Does not have a limit on the file size. Protobufs have a limit, unless a special API is used to increase the limit
Tools using mcsema2 can extend the database in their own ways. Using SQLite could thus provide a way to share data across tools.

Add CPU number access as an intrinsic.

Perhaps to be used by RDTSCP.

MMX Instruction support.

Implement and test the following MMX instructions:

Partially unimplemented instruction intrinsic

In some cases it might be valuable to have a partially (un)implemented instruction. For example, some complex FPU instructions like FPATAN have two real components:

The mechanics of how the FPU stack is modified, and
The actual algorithm (e.g. arctangent).

The former should be implemented in the partial instruction, as it is arch-specific and depends on no special features. The latter should be stubbed out in some way. McSema1 currently does someline like this by using LLVM intrinsics. This is probably be the most sensible approach.

Investigate PointsTo on Remill

Make State structures derived from an architecture-neutral base class

This is related to the footnote in Issue #52. The idea here is that some information (e.g. the interrupt vector) cannot be passed through control-flow intrinsics (e.g. __remill_interrupt_call) because of the rigid argument requirements for control-flow intrinsics (State *state, Memory *memory, addr_t pc). I think an appropriate solution is to put these architecture-neutral "dirty details" into a base class, and have the State structure derive from this base class. Then, control-flow intrinsics can be defined as accepting pointers to the base class. Implementations of the intrinsics that require access to the actual machine-specific contents can then down-cast the pointer.

Encode CFG as meta-data.

ida_get_cfg doesn't recognize every basic block

Sometimes it will miss basic blocks that it knows should exist (e.g. target of a direct call).

Crashing compile_semantics.sh on Ubuntu 16.04.1

From a freshly installed remill repo, I followed the README and got to ./scripts/bootstrap.sh
It crashes while running scripts/compile_semantics.sh

Building for x86
In file included from /remill/remill/Arch/X86/Runtime/Instructions.cpp:5:
In file included from /remill/remill/Arch/Runtime/Intrinsics.h:6:
/remill/remill/Arch/Runtime/Types.h:6:10: fatal error: 'cstdint' file not
      found
#include <cstdint>
         ^
1 error generated.
In file included from /remill/remill/Arch/X86/Runtime/BasicBlock.cpp:3:
In file included from /remill/remill/Arch/X86/Runtime/State.h:21:
In file included from /remill/remill/Arch/Runtime/Runtime.h:14:
In file included from /remill/remill/Arch/Runtime/Intrinsics.h:6:
/remill/remill/Arch/Runtime/Types.h:6:10: fatal error: 'cstdint' file not
      found
#include <cstdint>
         ^
1 error generated.
clang-3.8: error: no such file or directory: '/remill/generated/Arch/X86/Runtime/sem_x86_instr.bc'
clang-3.8: error: no input files
/remill/third_party/bin/llvm-link: /remill/generated/Arch/X86/Runtime/sem_x86_block.bc: error: Could not open input file: No such file or directory
0  llvm-link       0x0000000000574308
1  llvm-link       0x0000000000574977
2  libpthread.so.0 0x00007f57616da3d0
3  llvm-link       0x00000000004dd2e1
4  llvm-link       0x0000000000409257
5  llvm-link       0x000000000040845c
6  llvm-link       0x00000000004070b2
7  libc.so.6       0x00007f5760865830 __libc_start_main + 240
8  llvm-link       0x0000000000406e19
Stack dump:
0.  Program arguments: /remill/third_party/bin/llvm-link -o=/remill/generated/sem_x86.bc /remill/generated/Arch/X86/Runtime/sem_x86_block.bc /remill/generated/Arch/X86/Runtime/sem_x86_instr.opt.bc 
./scripts/compile_semantics.sh: line 23: 66120 Segmentation fault      $DIR/third_party/bin/llvm-link -o=$DIR/generated/${FILE_NAME}.bc $DIR/generated/Arch/X86/Runtime/${FILE_NAME}_block.bc $DIR/generated/Arch/X86/Runtime/${FILE_NAME}_instr.opt.bc
Error: Building for x86

I attached a quick cxxflags change in compile_semantics.sh that fixed this for me.
cxxflags_diff.txt

Investigate XSAVE instruction.

Unimplemented instruction intrinsic

Implement an unimplemented instruction intrinsic. This intrinsic should be treated as a control-flow intrinsic, be given the current and next program counters, the bytes of the instruction, and a reference to the State structure. In a practical application, this intrinsic could be implemented via micro-execution of the instruction, or via full VM-based emulation (a la Unicorn).

Make sure initial SSE / AVX / AVX512 register state is all 1s for tests

This should be an effective way to verify that (non-)zeroing of higher bits of SSE or AVX registers happens correctly.

Fix build of x86 test generators/runners.

Somehow cfg_to_bc produces bitcode that contains an invalid record here:
http://code.woboq.org/llvm/llvm/lib/Bitcode/Reader/BitcodeReader.cpp.html#3851

What is a good strategy to debug this? Some ideas:

Inject bitcode dump() invocations into the bitcode reader to see how far it gets.
Trace the program's execution to see what sequence of calls/returns leads to the error.
Try to dump the bitcode that is lifted for each function as it is produced by cfg_to_bc.

Pattern match to boil flag uses in jCC to the intended operation.

This may end up being a bit x86-specific, but maybe not. The key is to do this before force-inlining some of those flag functions.

Implement Binary Ninja-based get_cfg

This should be pretty straightforward since all it needs are basic blocks, a list of known code symbol names, and whether or not those symbols are exported.

Find a better way to implement non-transparent flags optimizations

Right now there are some macro-enables flags optimizations directly in the instruction semantics code. This isn't the right place for them, but I want to eventually be able to use them. The goal of these optimizations is to "kill" the eflags based on the assumption that the code being lifted is produced by a "sane" compiler that doesn't use the flags after a conditional branch, function call, indirect jump, or function return.

Probably what we want is some kind of intrinsic for telling us that we're doing a direct control flow transfer (e.g. for condition branches, direct jumps, and direct function calls). Other flags killing code can be placed in the existing intrinsics for indirect function call/return and indirect jumps.

Micro op fusion

Consider trying to do the equivalent of Intel's micro-op fusion to make the compiled code for compare-and-jump patterns more sane.

Incorrect use of llvm::CloneFunctionInto

In general builds of Remill this doesn't really appear, but by shimming in a build of LLVM in debug mode with assertions and expensive checks, we see the following crash:

Program received signal SIGABRT, Aborted.
0x00007ffff5b35418 in __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:54
54  ../sysdeps/unix/sysv/linux/raise.c: No such file or directory.
(gdb) bt
#0  0x00007ffff5b35418 in __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:54
#1  0x00007ffff5b3701a in __GI_abort () at abort.c:89
#2  0x00007ffff5b2dbd7 in __assert_fail_base (fmt=<optimized out>, assertion=assertion@entry=0xf5a2e8 "(Flags & RF_IgnoreMissingEntries) && \"Referenced value not in value map!\"", file=file@entry=0xf5a0b8 "/home/pag/git/llvm/lib/Transforms/Utils/ValueMapper.cpp", line=line@entry=444, function=function@entry=0xf609c0 <llvm::RemapInstruction(llvm::Instruction*, llvm::ValueMap<llvm::Value const*, llvm::WeakVH, llvm::ValueMapConfig<llvm::Value const*, llvm::sys::SmartMutex<false> > >&, llvm::RemapFlags, llvm::ValueMapTypeRemapper*, llvm::ValueMaterializer*)::__PRETTY_FUNCTION__> "void llvm::RemapInstruction(llvm::Instruction*, llvm::ValueToValueMapTy&, llvm::RemapFlags, llvm::ValueMapTypeRemapper*, llvm::ValueMaterializer*)") at assert.c:92
#3  0x00007ffff5b2dc82 in __GI___assert_fail (assertion=0xf5a2e8 "(Flags & RF_IgnoreMissingEntries) && \"Referenced value not in value map!\"", file=0xf5a0b8 "/home/pag/git/llvm/lib/Transforms/Utils/ValueMapper.cpp", line=444, function=0xf609c0 <llvm::RemapInstruction(llvm::Instruction*, llvm::ValueMap<llvm::Value const*, llvm::WeakVH, llvm::ValueMapConfig<llvm::Value const*, llvm::sys::SmartMutex<false> > >&, llvm::RemapFlags, llvm::ValueMapTypeRemapper*, llvm::ValueMaterializer*)::__PRETTY_FUNCTION__> "void llvm::RemapInstruction(llvm::Instruction*, llvm::ValueToValueMapTy&, llvm::RemapFlags, llvm::ValueMapTypeRemapper*, llvm::ValueMaterializer*)") at assert.c:101
#4  0x00000000009126a1 in llvm::RemapInstruction (I=0x16e5970, VMap=..., Flags=llvm::RF_NoModuleLevelChanges, TypeMapper=0x0, Materializer=0x0) at /home/pag/git/llvm/lib/Transforms/Utils/ValueMapper.cpp:443
#5  0x00000000008a1ce7 in llvm::CloneFunctionInto (NewFunc=0x1643cc8, OldFunc=0x167b9a8, VMap=..., ModuleLevelChanges=false, Returns=..., NameSuffix=0xe622ea "", CodeInfo=0x0, TypeMapper=0x0, Materializer=0x0) at /home/pag/git/llvm/lib/Transforms/Utils/CloneFunction.cpp:163
#6  0x00000000006d5528 in remill::(anonymous namespace)::AddBlockInitializationCode (block_func=0x1643cc8, template_func=0x167b9a8) at /home/pag/Code/remill/remill/BC/Translator.cpp:155
#7  0x00000000006d4ce1 in remill::Translator::LiftBlock (this=0x7fffffffd9b0, cfg_block=0x16f09c0) at /home/pag/Code/remill/remill/BC/Translator.cpp:607
#8  0x00000000006d4a01 in remill::Translator::LiftBlocks (this=0x7fffffffd9b0, cfg_module=0x1690c50) at /home/pag/Code/remill/remill/BC/Translator.cpp:580
#9  0x00000000006d48be in remill::Translator::LiftCFG (this=0x7fffffffd9b0, cfg_module=0x1690c50) at /home/pag/Code/remill/remill/BC/Translator.cpp:565
#10 0x00000000006f242c in main (argc=1, argv=0x7fffffffdd08) at /home/pag/Code/remill/remill/Translate.cpp:85

Function return intrinsic

Recently I've been mulling through the idea of introducing a new intrinsic, __remill_accept_function_return, and would appreciate feedback.

The idea is to be explicit about function return target. I don't think this loses generality, even in the case of something like longjmp or the pattern of doing call +5; pop reg to get the current program counter.

The idea is to use a setjmp idiom for preparing function returns. It would be a kind of brother to __remill_function_return, who would presumably target this function (though does not need to).

Suppose this is the code being translated. It is contrived but shows the point:

           sub_0123abc:
0123abc    call sub_0456def
0123ac1    ...   // Target code of return at 0456def.
...
           sub_0456def:
0456def    ret  // Targets 0123ac1.

Here's the gist of what code would look like, as if it were written in C++:

void __remill_sub_123abc(State *state, Memory *memory, addr_t pc) {
  CALL_NEAR_RELBRd(state, &memory, 0x456def);
  if (!__remill_accept_function_return(state, &memory, 0123ac1) {
    return __remill_sub_456def(state, memory, 0x456def);  // Target of call.
  } else {
    return __remill_sub_123ac1(state, memory, 0123ac1);  // Target of return.
  }
}

void __remill_sub_456def(State *state, Memory *memory, addr_t pc) {
  RET_NEAR(state, memory);
  return __remill_function_return(state, memoru, state->gpr.rip.qword);
}

One thing of note is that the __remill_accept_function_return intrinsic function takes memory by pointer, so that it can change the memory pointer used down the return path.

I think this may be nice from a static analysis perspective, especially for direct function calls. There's no real way to distinguish a direct function call from a direct jump in the optimised LLVM bitcode; this would provide such a way. It may also be useful for some kind of CFI-related instrumentation downstream, but that's not a compelling enough reason to do this.

The __remill_sub_123ac1 would still be marked as an "indirect block", so I don't think there would be any loss of generality.

Finally, I think is that this structure is also easily removable by a downstream tool -- it commits it to nothing. Consider replacing all uses of __remill_accept_function_return with false. Dead code elimination will turn the result into what we already have.

Upgrade to LLVM 3.9

Intermediate milestones:

Update bootstrap.sh to download new code for LLVM.
Change libOptimize into no longer being an LLVM pass, but instead being a tool. I think this will simplify the build process in a number of ways. Name this new tool remill-opt.
Rename cfg_to_bc to remill-lift.

Make glog not depend on any stack unwinder

This will likely require adding a tar.xz into blob that is either the latest glog that supports Cmake, or with outright modifications that eliminate stack unwinding. This will be one less external dependency that doesn't provide more information than you already get with a debugger.

Create testing infrastructure

Tasks:

Replace some code in condition flag computation with compiler builtins

For example, use __builtin_parity for parity computation, as opposed to doing it manually. I think this will improve optimisation opportunities without loss of generality.

Create a new arch feature intrinsic

I think CPUID should be handled by a special control-flow intrinsics, __mcsema_arch_read_features. The purpose of making it a control-flow intrinsic, kind of like __mcsema_function_call is to implicitly represent that the behaviour of the intrinsic is undefined (i.e. it can read/write the machine state in an arbitrary way) and therefore unobservable to static analysis. It also comes with the benefit that the synchronizing nature of the instruction would be somewhat implicit in its use as a flow intrinsic,

Figure out how to handle instructions making suppressed memory accesses using the ADDR32 or ADDR16 prefixes.

For exampe, STOS and SCAS in 64-bit can use [EDI] instead of [RDI] as the base address. If RDI != ZExtend(EDI) then there will be a translation transparency issue.

Pre-publication tasks

Convert code to use LLVM-like instruction functions as opposed to depending on implicit C/C++ semantics. E.g. using inline functions like ZExtend, FMul, etc,
Replace order_t with an abstract Memory * variable, and pass it by value through the basic block functions and into instruction semantics functions. This will get closer to describing the small-step semantics of memory-modifying code, and clean up the optimized bitcode substantially.
Implement a Binary Ninja-based get_cfg. This will be much simpler than McSema's one.
Fix the IDA get_cfg program.
Documentation describing some of the finer points of the design.
Have the get_cfg program output one .cfg proto file per function, as opposed to one large proto. Use cfg_to_bc chaining functionality to build up a single bitcode file.
Make some kind of front-end script, kind of like how the gcc or clang can do all sorts of things via clang ..., .e.g remill .... It would be cool if you could use remill in a Makefile, sort of like you can with an everyday compiler.

Implement MMX instructions.

This issue relates to #19.

lifting-bits / remill Goto Github PK

remill's Issues

Recommend Projects

Recommend Topics

Recommend Org

Jobs