lifting-bits / remill Goto Github PK
View Code? Open in Web Editor NEWLibrary for lifting machine code to LLVM bitcode
License: Apache License 2.0
Library for lifting machine code to LLVM bitcode
License: Apache License 2.0
I think dependency management needs to be improved. Right now the bootstrap.sh
script downloads and builds some packages from source. It would be preferable to use existing package managers to get these libraries.
/usr/local/lib
. This should allow the build process to avoid nasty hacks related to linking against the third-party folder.pip
to install the python bindings of things.protoc
-produced Python- and C++-generated code for using CFG.proto. Changes to CFG.proto should result in these auto-generated files being updated in the repo.These steps will make it easier to have a bunch of simple binaries (e.g. remill-opt
, remill-lift
, etc.) that can be installed to system directories, without needing to reference stuff in the third_party
directory. Ideally, this type of change will enable Remill itself to be packageable.
There should be a number of pre-defined algorithms (using an enum
to list them all) that can be invoked by semantics functions. These would roughly correspond to instructions available in hardware, e.g. logarithms, tangents, etc.
I am not even sure what needs it. It may actually be LLVM libraries. If so, then this issue isn't really doable.
Conditional branches modify the BRANCH_TAKEN
variable. The translator then uses the value of this variable to decide to tail-call to one block function or another. I think that conditional interrupt instructions, like into
and bound
can be similarly implemented. There is already the INTERRUPT_TAKEN
variable that is modified, as shown below:
DEF_ISEL_SEM(INTO) {
INTERRUPT_TAKEN = FLAG_OF;
INTERRUPT_VECTOR = 4;
}
In the above code, INTERRUPT_TAKEN
maps into a field in the State
structure. I think this is particularly ugly. A better solution would be to use the BRANCH_TAKEN
variable in __remill_basic_block
, thereby not polluting the State
structure with fields that cannot opaquely [1] be represented across architectures.
I think instead we can just "take over" BRANCH_TAKEN
variable. This particular nuance is "hidden" by the code translator. For example, the semantics of JO
(jump on overflow) are:
DEF_SEM(JO, R8W cond, PC taken_pc, PC not_taken_pc) {
auto take_branch = FLAG_OF;
Write(cond, take_branch);
Write(REG_PC, Select<addr_t>(take_branch, taken_pc, not_taken_pc));
}
The R8W cond
argument is actually a pointer to the BRANCH_TAKEN
. We can see the addition of this argument in the DecodeConditionalBranch
code.
__remill_interrup_call
intrinsic should not need to know the actual contents of the state structure itself. Imagine a scenario where you have a symbolic executor, and you point it at some lifted bitcode, as well as a shared library implementing a system call model. The implementation of __remill_interrupt_call
should only "pass the buck" into the shared library's code. It should not need to inspect a field telling it if the interrupt should be taken, because then it needs to know the structure of the State
struct, and thus would not be architecture-agnostic. Of course, we still have the pesky interrupt vector field, which is a nuisance. There is a solution to this, though. The State
structure could be a derived class of an architecture-neutral class with common elements. The opaque implementations of intrinsics could generically operate on the State
structure's base class, thus gaining extra info. I think this is acceptable for interrupt vector numbers, but not acceptable for conditional execution of an interrupt.Use the Valgrind annotations to enable/disable checking around execution of native and lifted code. Periodically ensure an absence of errors.
The same should be done for cfg_to_bc
.
Update the build system's --dry_run
option (or add something similar) to produce a compilation database [1] that can be consumed by the Clang Static Analyzer.
There are some benefits to this:
Implement and test the following instructions:
This would probably be very useful, especially for implementations that want to access things like the program counter from within the memory intrinsics.
There used to be one but it was over complicated and hacky (see commit history). The proto format has since changed, so having a new script would be helpful.
Implement and test the following instructions:
is_local
, that semantically says that the source block and target block logically belong to the same function.This should eliminate stores of undef
values, as well as memset
s of undef
values.
It's possible that this would work better for ARM's NEON implementation of SIMD. Right now I use the vector_size
attribute.
That way the stack register can be computed in terms of the data pointer and the index argument passed to the semantic function.
The XED kits can be found here:
https://software.intel.com/en-us/protected-download/267266/560870/step2
I have a hunch that this will further improve compiler optimizations, as well as tightening the optimized code to not include so many loads/stores.
The following pattern some times comes up:
%26 = mul nsw i128 %25, -8198552921648689607
%27 = trunc i128 %26 to i64
store i64 %27, i64* %3, align 8
store i64 4216879, i64* %5, align 8
%trunc = trunc i128 %26 to i32
The only uses of %26
are trunc
instructions, so I should be able to strength reduce the 128-bit multiplication to be a 64-bit multiplication.
Also add accompanying test cases.
Pre-process the instructions CFG before lifting to feature detect for things like SSEn, AVXn, etc. and broadly categorize into: no-AVX, AVX (includes AVX2), and AVX512.
Ideally, it would be cool if there were a way to "install" Remill somewhere.
Some of the advantages are:
Perhaps to be used by RDTSCP
.
Implement and test the following MMX instructions:
In some cases it might be valuable to have a partially (un)implemented instruction. For example, some complex FPU instructions like FPATAN
have two real components:
The former should be implemented in the partial instruction, as it is arch-specific and depends on no special features. The latter should be stubbed out in some way. McSema1 currently does someline like this by using LLVM intrinsics. This is probably be the most sensible approach.
This is related to the footnote in Issue #52. The idea here is that some information (e.g. the interrupt vector) cannot be passed through control-flow intrinsics (e.g. __remill_interrupt_call
) because of the rigid argument requirements for control-flow intrinsics (State *state, Memory *memory, addr_t pc
). I think an appropriate solution is to put these architecture-neutral "dirty details" into a base class, and have the State
structure derive from this base class. Then, control-flow intrinsics can be defined as accepting pointers to the base class. Implementations of the intrinsics that require access to the actual machine-specific contents can then down-cast the pointer.
Sometimes it will miss basic blocks that it knows should exist (e.g. target of a direct call).
From a freshly installed remill repo, I followed the README and got to ./scripts/bootstrap.sh
It crashes while running scripts/compile_semantics.sh
Building for x86
In file included from /remill/remill/Arch/X86/Runtime/Instructions.cpp:5:
In file included from /remill/remill/Arch/Runtime/Intrinsics.h:6:
/remill/remill/Arch/Runtime/Types.h:6:10: fatal error: 'cstdint' file not
found
#include <cstdint>
^
1 error generated.
In file included from /remill/remill/Arch/X86/Runtime/BasicBlock.cpp:3:
In file included from /remill/remill/Arch/X86/Runtime/State.h:21:
In file included from /remill/remill/Arch/Runtime/Runtime.h:14:
In file included from /remill/remill/Arch/Runtime/Intrinsics.h:6:
/remill/remill/Arch/Runtime/Types.h:6:10: fatal error: 'cstdint' file not
found
#include <cstdint>
^
1 error generated.
clang-3.8: error: no such file or directory: '/remill/generated/Arch/X86/Runtime/sem_x86_instr.bc'
clang-3.8: error: no input files
/remill/third_party/bin/llvm-link: /remill/generated/Arch/X86/Runtime/sem_x86_block.bc: error: Could not open input file: No such file or directory
0 llvm-link 0x0000000000574308
1 llvm-link 0x0000000000574977
2 libpthread.so.0 0x00007f57616da3d0
3 llvm-link 0x00000000004dd2e1
4 llvm-link 0x0000000000409257
5 llvm-link 0x000000000040845c
6 llvm-link 0x00000000004070b2
7 libc.so.6 0x00007f5760865830 __libc_start_main + 240
8 llvm-link 0x0000000000406e19
Stack dump:
0. Program arguments: /remill/third_party/bin/llvm-link -o=/remill/generated/sem_x86.bc /remill/generated/Arch/X86/Runtime/sem_x86_block.bc /remill/generated/Arch/X86/Runtime/sem_x86_instr.opt.bc
./scripts/compile_semantics.sh: line 23: 66120 Segmentation fault $DIR/third_party/bin/llvm-link -o=$DIR/generated/${FILE_NAME}.bc $DIR/generated/Arch/X86/Runtime/${FILE_NAME}_block.bc $DIR/generated/Arch/X86/Runtime/${FILE_NAME}_instr.opt.bc
Error: Building for x86
I attached a quick cxxflags change in compile_semantics.sh that fixed this for me.
cxxflags_diff.txt
Implement an unimplemented instruction intrinsic. This intrinsic should be treated as a control-flow intrinsic, be given the current and next program counters, the bytes of the instruction, and a reference to the State
structure. In a practical application, this intrinsic could be implemented via micro-execution of the instruction, or via full VM-based emulation (a la Unicorn).
This should be an effective way to verify that (non-)zeroing of higher bits of SSE or AVX registers happens correctly.
Somehow cfg_to_bc
produces bitcode that contains an invalid record here:
http://code.woboq.org/llvm/llvm/lib/Bitcode/Reader/BitcodeReader.cpp.html#3851
What is a good strategy to debug this? Some ideas:
dump()
invocations into the bitcode reader to see how far it gets.cfg_to_bc
.This may end up being a bit x86-specific, but maybe not. The key is to do this before force-inlining some of those flag functions.
This should be pretty straightforward since all it needs are basic blocks, a list of known code symbol names, and whether or not those symbols are exported.
Right now there are some macro-enables flags optimizations directly in the instruction semantics code. This isn't the right place for them, but I want to eventually be able to use them. The goal of these optimizations is to "kill" the eflags based on the assumption that the code being lifted is produced by a "sane" compiler that doesn't use the flags after a conditional branch, function call, indirect jump, or function return.
Probably what we want is some kind of intrinsic for telling us that we're doing a direct control flow transfer (e.g. for condition branches, direct jumps, and direct function calls). Other flags killing code can be placed in the existing intrinsics for indirect function call/return and indirect jumps.
Consider trying to do the equivalent of Intel's micro-op fusion to make the compiled code for compare-and-jump patterns more sane.
In general builds of Remill this doesn't really appear, but by shimming in a build of LLVM in debug mode with assertions and expensive checks, we see the following crash:
Program received signal SIGABRT, Aborted.
0x00007ffff5b35418 in __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:54
54 ../sysdeps/unix/sysv/linux/raise.c: No such file or directory.
(gdb) bt
#0 0x00007ffff5b35418 in __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:54
#1 0x00007ffff5b3701a in __GI_abort () at abort.c:89
#2 0x00007ffff5b2dbd7 in __assert_fail_base (fmt=<optimized out>, assertion=assertion@entry=0xf5a2e8 "(Flags & RF_IgnoreMissingEntries) && \"Referenced value not in value map!\"", file=file@entry=0xf5a0b8 "/home/pag/git/llvm/lib/Transforms/Utils/ValueMapper.cpp", line=line@entry=444, function=function@entry=0xf609c0 <llvm::RemapInstruction(llvm::Instruction*, llvm::ValueMap<llvm::Value const*, llvm::WeakVH, llvm::ValueMapConfig<llvm::Value const*, llvm::sys::SmartMutex<false> > >&, llvm::RemapFlags, llvm::ValueMapTypeRemapper*, llvm::ValueMaterializer*)::__PRETTY_FUNCTION__> "void llvm::RemapInstruction(llvm::Instruction*, llvm::ValueToValueMapTy&, llvm::RemapFlags, llvm::ValueMapTypeRemapper*, llvm::ValueMaterializer*)") at assert.c:92
#3 0x00007ffff5b2dc82 in __GI___assert_fail (assertion=0xf5a2e8 "(Flags & RF_IgnoreMissingEntries) && \"Referenced value not in value map!\"", file=0xf5a0b8 "/home/pag/git/llvm/lib/Transforms/Utils/ValueMapper.cpp", line=444, function=0xf609c0 <llvm::RemapInstruction(llvm::Instruction*, llvm::ValueMap<llvm::Value const*, llvm::WeakVH, llvm::ValueMapConfig<llvm::Value const*, llvm::sys::SmartMutex<false> > >&, llvm::RemapFlags, llvm::ValueMapTypeRemapper*, llvm::ValueMaterializer*)::__PRETTY_FUNCTION__> "void llvm::RemapInstruction(llvm::Instruction*, llvm::ValueToValueMapTy&, llvm::RemapFlags, llvm::ValueMapTypeRemapper*, llvm::ValueMaterializer*)") at assert.c:101
#4 0x00000000009126a1 in llvm::RemapInstruction (I=0x16e5970, VMap=..., Flags=llvm::RF_NoModuleLevelChanges, TypeMapper=0x0, Materializer=0x0) at /home/pag/git/llvm/lib/Transforms/Utils/ValueMapper.cpp:443
#5 0x00000000008a1ce7 in llvm::CloneFunctionInto (NewFunc=0x1643cc8, OldFunc=0x167b9a8, VMap=..., ModuleLevelChanges=false, Returns=..., NameSuffix=0xe622ea "", CodeInfo=0x0, TypeMapper=0x0, Materializer=0x0) at /home/pag/git/llvm/lib/Transforms/Utils/CloneFunction.cpp:163
#6 0x00000000006d5528 in remill::(anonymous namespace)::AddBlockInitializationCode (block_func=0x1643cc8, template_func=0x167b9a8) at /home/pag/Code/remill/remill/BC/Translator.cpp:155
#7 0x00000000006d4ce1 in remill::Translator::LiftBlock (this=0x7fffffffd9b0, cfg_block=0x16f09c0) at /home/pag/Code/remill/remill/BC/Translator.cpp:607
#8 0x00000000006d4a01 in remill::Translator::LiftBlocks (this=0x7fffffffd9b0, cfg_module=0x1690c50) at /home/pag/Code/remill/remill/BC/Translator.cpp:580
#9 0x00000000006d48be in remill::Translator::LiftCFG (this=0x7fffffffd9b0, cfg_module=0x1690c50) at /home/pag/Code/remill/remill/BC/Translator.cpp:565
#10 0x00000000006f242c in main (argc=1, argv=0x7fffffffdd08) at /home/pag/Code/remill/remill/Translate.cpp:85
Recently I've been mulling through the idea of introducing a new intrinsic, __remill_accept_function_return
, and would appreciate feedback.
The idea is to be explicit about function return target. I don't think this loses generality, even in the case of something like longjmp
or the pattern of doing call +5; pop reg
to get the current program counter.
The idea is to use a setjmp
idiom for preparing function returns. It would be a kind of brother to __remill_function_return
, who would presumably target this function (though does not need to).
Suppose this is the code being translated. It is contrived but shows the point:
sub_0123abc:
0123abc call sub_0456def
0123ac1 ... // Target code of return at 0456def.
...
sub_0456def:
0456def ret // Targets 0123ac1.
Here's the gist of what code would look like, as if it were written in C++:
void __remill_sub_123abc(State *state, Memory *memory, addr_t pc) {
CALL_NEAR_RELBRd(state, &memory, 0x456def);
if (!__remill_accept_function_return(state, &memory, 0123ac1) {
return __remill_sub_456def(state, memory, 0x456def); // Target of call.
} else {
return __remill_sub_123ac1(state, memory, 0123ac1); // Target of return.
}
}
void __remill_sub_456def(State *state, Memory *memory, addr_t pc) {
RET_NEAR(state, memory);
return __remill_function_return(state, memoru, state->gpr.rip.qword);
}
One thing of note is that the __remill_accept_function_return
intrinsic function takes memory
by pointer, so that it can change the memory pointer used down the return path.
I think this may be nice from a static analysis perspective, especially for direct function calls. There's no real way to distinguish a direct function call from a direct jump in the optimised LLVM bitcode; this would provide such a way. It may also be useful for some kind of CFI-related instrumentation downstream, but that's not a compelling enough reason to do this.
The __remill_sub_123ac1
would still be marked as an "indirect block", so I don't think there would be any loss of generality.
Finally, I think is that this structure is also easily removable by a downstream tool -- it commits it to nothing. Consider replacing all uses of __remill_accept_function_return
with false
. Dead code elimination will turn the result into what we already have.
Intermediate milestones:
bootstrap.sh
to download new code for LLVM.libOptimize
into no longer being an LLVM pass, but instead being a tool. I think this will simplify the build process in a number of ways. Name this new tool remill-opt
.cfg_to_bc
to remill-lift
.This will likely require adding a tar.xz
into blob
that is either the latest glog that supports Cmake, or with outright modifications that eliminate stack unwinding. This will be one less external dependency that doesn't provide more information than you already get with a debugger.
Tasks:
State
structure.State
structures from native and lifted runs.For example, use __builtin_parity
for parity computation, as opposed to doing it manually. I think this will improve optimisation opportunities without loss of generality.
I think CPUID
should be handled by a special control-flow intrinsics, __mcsema_arch_read_features
. The purpose of making it a control-flow intrinsic, kind of like __mcsema_function_call
is to implicitly represent that the behaviour of the intrinsic is undefined (i.e. it can read/write the machine state in an arbitrary way) and therefore unobservable to static analysis. It also comes with the benefit that the synchronizing nature of the instruction would be somewhat implicit in its use as a flow intrinsic,
For exampe, STOS
and SCAS
in 64-bit can use [EDI]
instead of [RDI]
as the base address. If RDI != ZExtend(EDI)
then there will be a translation transparency issue.
ZExtend
, FMul
, etc,order_t
with an abstract Memory *
variable, and pass it by value through the basic block functions and into instruction semantics functions. This will get closer to describing the small-step semantics of memory-modifying code, and clean up the optimized bitcode substantially..cfg
proto file per function, as opposed to one large proto. Use cfg_to_bc
chaining functionality to build up a single bitcode file.clang ...
, .e.g remill ...
. It would be cool if you could use remill in a Makefile, sort of like you can with an everyday compiler.This issue relates to #19.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.