vtil-project / vtil-core Goto Github PK
View Code? Open in Web Editor NEWVirtual-machine Translation Intermediate Language
License: BSD 3-Clause "New" or "Revised" License
Virtual-machine Translation Intermediate Language
License: BSD 3-Clause "New" or "Revised" License
$ clang --version
clang version 10.0.1
Target: x86_64-pc-linux-gnu
Thread model: posix
InstalledDir: /usr/bin
$ gcc --version
gcc (GCC) 10.1.0
Copyright (C) 2020 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
(Note that the following also occurs on GCC 10.)
I am not clear on why -stdlib=libc++
is explicitly specified here: https://github.com/vtil-project/VTIL-Core/blob/master/VTIL-Common/CMakeLists.txt#L37-L39. Regardless, it breaks the build and causes missing includes:
In file included from _deps/vtil-core-src/VTIL-Common/arch/arm64/arm64_assembler.cpp:39:
_deps/vtil-core-src/VTIL-Common/arch/arm64/arm64_assembler.hpp:40:10: fatal error: 'string' file not found
#include <string>
So it is necessary to remove that broken configuration from the CMakeLists.txt
before you are able to reproduce the following.
The forward declaration of cache_value
here:
https://github.com/vtil-project/VTIL-Core/blob/master/VTIL-SymEx/simplifier/simplifier.cpp#L94
causes errors during template expansion for std::pair
, etc. because of the declaration order here, I defined cache_map
as follows:
- using cache_map = std::unordered_map<expression::reference, cache_value, signature_hasher, cache_scanner>;
+ using cache_map = std::unordered_map<expression::reference, cache_value*, signature_hasher, cache_scanner>;
(And subsequently updated all accesses of cache_value to use the arrow operator.)
Then updated the map to insert values of cache_value*
- auto [it, inserted] = map.emplace( exp, make_default<cache_value>() );
+ auto [it, inserted] = map.emplace( exp, new cache_value );
Obviously using new
here is not the solution, but I am unfamiliar with the lifetimes of this value so I am wondering how you think this problem can best be solved.
I'd be happy to make a PR to fix both Clang and GCC, just wasn't able to figure out the best way to do it. As well, if you'd be open to adding the latest version of GCC and Clang to the CI I'd be happy to add that as well. Looks like it's already using Clang 10.
vtil\VTIL-NativeLifters\Dependencies\VTIL-Core\VTIL-Compiler\includes\vtil../../common/auxiliaries.hpp(37,38): error C2149: 'vtil::optimizer::aux::branch_analysis_flags::pack': named bit field cannot have zero width
\vtil\VTIL-NativeLifters\Dependencies\VTIL-Core\VTIL-Compiler\includes\vtil../../common/auxiliaries.hpp(38,38): error C2106: '=': left operand must be l-value
This stack overflows on dcep when optimizing a routine using apply_all
CMake Error at C:/Program Files/CMake/share/cmake-3.18/Modules/ExternalProject.c
make:2350 (message):
error: could not find git for clone of capstone-populate
Call Stack (most recent call first):
C:/Program Files/CMake/share/cmake-3.18/Modules/ExternalProject.cmake:3206 (_e
p_add_download_command)
CMakeLists.txt:13 (ExternalProject_Add)
Edit: im just forget to install git
For example, I have this block, how do I correctly generate an expression for reg_ax
vtil::register_desc reg_ax(vtil::register_physical, registers::ax, vtil::arch::bit_count, 0);
auto block = vtil::basic_block::begin(0x1337);
block->mov(reg_ax, 0x10);
block->add(reg_ax, 0x1);
block->vexit(0ull);
I tried using the tracer like this, but to no avail. Regardless of whether I use tracer or not, I get the result "rax#0x1337", the same will happen without tracing if I just use variable.to_expression()
vtil::tracer tracer;
auto expression = tracer.rtrace_p({ block->begin(), reg_ax });
vtil::logger::log("%s\n", expression.to_string());
When I was looking at the vtil arch code, I noticed that this argument doesn't make sense, or am I wrong?
Can I use it to convert vmprotect to an intermediate language,I want to restore the instructions after vm
The following code is producing corrupt results:
void run_err_test_1()
{
auto b = vtil::basic_block::begin(0);
auto first = vtil::register_desc(vtil::register_flag::register_local, 0, 64);
auto second_ptr = vtil::register_desc(vtil::register_flag::register_local, 5, 64);
auto second = vtil::register_desc(vtil::register_flag::register_local, 4, 64);
auto result = vtil::register_desc(vtil::register_flag::register_local, 6, 64);
auto dest = vtil::register_desc(vtil::register_flag::register_local, 37, 64);
// load first value
b->ldd(first, X86_REG_R11, 0x0);
// load second value
b->mov(second_ptr, X86_REG_R11);
b->add(second_ptr, 0x8);
b->ldd(second, second_ptr, 0x0);
// calculate the result
b->mov(result, first);
b->add(result, second);
// store the result
b->mov(dest, X86_REG_R11);
b->add(dest, 0x8);
b->str(dest, 0x0, result);
apply_optimizations(b, 0, optimization_type::optimization_type_symbolic_rewrite_pass_forced, 0);
vtil::debug::dump(b);
}
| | 0000: [ PSEUDO ] +0x0 movq t0 r11
| | 0001: [ PSEUDO ] +0x0 lddq t1 t0 0x0
| | 0002: [ PSEUDO ] +0x0 movq t2 t0
| | 0003: [ PSEUDO ] +0x0 addq t2 0x8
| | 0004: [ PSEUDO ] +0x0 lddq t3 t0 0x8
| | 0005: [ PSEUDO ] +0x0 movq t4 t1
| | 0006: [ PSEUDO ] +0x0 addq t4 t3
| | 0007: [ PSEUDO ] +0x0 movq t0 t1
| | 0008: [ PSEUDO ] +0x0 movq t5 t2
| | 0009: [ PSEUDO ] +0x0 movq t4 t3
| | 0010: [ PSEUDO ] +0x0 movq t6 t4
| | 0011: [ PSEUDO ] +0x0 movq t37 t2
| | 0012: [ PSEUDO ] +0x0 strq t0 0x8 t4
After a bit of messing around, I've narrowed down the cause to the register naming. Rewriting all register names to a higher index produces the following instead:
| | 0000: [ PSEUDO ] +0x0 movq t0 r11
| | 0001: [ PSEUDO ] +0x0 lddq t1 t0 0x0
| | 0002: [ PSEUDO ] +0x0 movq t2 t0
| | 0003: [ PSEUDO ] +0x0 addq t2 0x8
| | 0004: [ PSEUDO ] +0x0 lddq t3 t0 0x8
| | 0005: [ PSEUDO ] +0x0 movq t4 t1
| | 0006: [ PSEUDO ] +0x0 addq t4 t3
| | 0007: [ PSEUDO ] +0x0 movq t1500 t1
| | 0008: [ PSEUDO ] +0x0 movq t1501 t2
| | 0009: [ PSEUDO ] +0x0 movq t1502 t3
| | 0010: [ PSEUDO ] +0x0 movq t1503 t4
| | 0011: [ PSEUDO ] +0x0 movq t1504 t2
| | 0012: [ PSEUDO ] +0x0 strq t0 0x8 t4
The pass appears to be restoring to the original registers, and is clobbering them in the process.
expression X_00 = { {"X_00"}, 1 };
expression X_01 = { {"X_01"}, 1 };
expression X_02 = { {"X_02"}, 1 };
expression X_03 = { {"X_03"}, 1 };
expression X_04 = { {"X_04"}, 1 };
expression X_05 = { {"X_05"}, 1 };
expression X_06 = { {"X_06"}, 1 };
expression a = ((X_06 & X_00) | (X_05 & X_01));
expression b = ((((X_05 & X_00) | (X_04 & X_01)) & (((X_04 & X_00) | (X_03 & X_01)) & ((X_01 & X_00) & (X_03 | X_02)))) | ((X_05 & X_00) & (X_04 & X_01)));
expression c = ~(a & b);
log("c = %s\n", c.to_string());
Why do you say it is for binary de-obfuscation and de-virtualization? Can VTIL de-virtualize the binary protected by VMP (https://vmpsoft.com/)? Beside, devirtualize the binary obfuscated by LLVM obfuscator, for instance?
void test_vtil_crash()
{
expression X_00, X_01, X_02, X_03;
expression a,b,c;
X_00 = expression(unique_identifier("X_00"), 1);
X_01 = expression(unique_identifier("X_01"), 1);
X_02 = expression(unique_identifier("X_02"), 1);
X_03 = expression(unique_identifier("X_03"), 1);
a = ~(((X_03 & X_00) ^ (X_02 & X_01)) & (X_01 & X_00));
b = ((X_03 & X_00) ^ (X_02 & X_01));
printf("a = %s\n", a.to_string().c_str());
printf("b = %s\n", b.to_string().c_str());
c = ~(a & b); //crash at here
printf("c = %s\n", c.to_string().c_str());
}
expression X_00 = { {"X_00"}, 1 };
expression X_01 = { {"X_01"}, 1 };
expression X_02 = { {"X_02"}, 1 };
expression X_03 = { {"X_03"}, 1 };
expression X_04 = { {"X_04"}, 1 };
expression X_05 = { {"X_05"}, 1 };
expression X_06 = { {"X_06"}, 1 };
expression X_07 = { {"X_07"}, 1 };
expression X_08 = { {"X_08"}, 1 };
expression X_09 = { {"X_09"}, 1 };
expression X_0A = { {"X_0A"}, 1 };
expression X_0B = { {"X_0B"}, 1 };
expression a = ((X_0B & X_00) | (X_0A & X_01));
expression b = ((((X_0A & X_00) | (X_09 & X_01)) & ((((X_09 & X_00) | (X_08 & X_01)) & ((((X_08 & X_00) | (X_07 & X_01)) & ((((X_07 | X_06) & ((((X_01 & X_00) & (X_05 | (X_04 & (X_00 & X_06)))) & (X_03 | ((X_03 | X_02) & (X_00 & X_04)))) | ((X_01 & X_04) & (X_05 & X_00)))) | ((X_01 & X_05) & (X_06 & X_00))) | ((X_07 & X_00) & (X_06 & X_01)))) | ((X_08 & X_00) & (X_07 & X_01)))) | ((X_09 & X_00) & (X_08 & X_01)))) | ((X_0A & X_00) & (X_09 & X_01)));
expression c = a & b;
log("c = %s\n", c.to_string());
Hello.
DOCTEST_TEST_CASE("dummy")
{
vtil::logger::log("\n\n>> %s \n", __FUNCTION__);
auto block = vtil::basic_block::begin(0);
auto [t0, t1, t2, t3] = block->tmp(64, 64, 1, 64);
auto rtn = block->owner;
block->mov(t0, vtil::REG_FLAGS);
block->bnot(t0);
block->ifs(t1, t0.select(1, 2), 0x1000);
block->mov(t2, t0.select(1, 2));
block->bnot(t2);
block->ifs(t3, t2, 0x2000);
block->add(t1, t3);
block->add(t1, vtil::REG_IMGBASE);
block->jmp(t1);
if (auto block_1000 = block->fork(0x1000)) {
block_1000->jmp(0x3000);
block_1000->fork(0x3000);
}
if (auto block_2000 = block->fork(0x2000)) {
block_2000->jmp(0x3000);
block_2000->fork(0x3000);
}
if (auto block_3000 = rtn->get_block(0x3000)) {
block_3000->vexit(uintptr_t(0xdeadc0de));
}
vtil::logger::log(":: Before:\n");
vtil::debug::dump(rtn);
vtil::optimizer::bblock_thunk_removal_pass{}(rtn);
vtil::optimizer::branch_correction_pass{}(rtn);
vtil::logger::log(":: After:\n");
vtil::debug::dump(rtn);
vtil::logger::log(":: Over:\n");
CHECK(1 == 1);
}
VTIL-Core/VTIL-Compiler/optimizer/branch_correction_pass.cpp
Lines 71 to 98 in 6f21abb
The branch_correction_pass
will remove branches that do not exist in the branch_info.
I think this is correct, but the bblock_thunk_removal_pass
is too aggressive and also handles jump instructions that have not been correctly converted to js instructions.
Is it possible to add support for x32_86 architecture?
thanks to @wallds 's demo
Test code liking this
auto exp_a = __bt( variable_a, (uint32_t)0x6 ).simplify(true);
auto exp_b = __bt( variable_a, (uint8_t)0x6 ).simplify(true);
the hash of exp_a
and exp_b
is different,
but as a bittest, (uint32_t)0x6
and (uint8_t)0x6
should be same.
this will cause vm_jcc expression extract wrong destinations if a jcc expression contains both exp_a and exp_b. (when calcing destination of vm_jcc, using hash for compare expression, but hash is different)
vtil::symbolic::expression op1 = { {"op1"}, 4 };
vtil::symbolic::expression op2 = { {"op2"}, 4 };
// out: (op1==op2)
auto dst1 = op1 == op2;
vtil::logger::log("%s\n", dst1.simplify().to_string().c_str());
// out: 0x0
auto dst2 = (op1 - op2) == 0;
vtil::logger::log("%s\n", dst2.simplify().to_string().c_str());
VTIL-Core/VTIL-Architecture/trace/tracer.cpp
Lines 504 to 507 in 7e74109
VTIL-Core/VTIL-Architecture/trace/tracer.cpp
Lines 526 to 529 in 7e74109
lvm.execute can do read_register:
VTIL-Core/VTIL-Architecture/vm/interface.cpp
Lines 45 to 51 in 7e74109
read_register in turn calls tracer:
VTIL-Core/VTIL-Architecture/vm/lambda.hpp
Lines 66 to 71 in 7e74109
And se we end up with a stack overflow.
Assuming the multiplication operator has been implemented
Expression (x + x) & 1
can be transformed to (x * 2) & 1
.
After x * 2
, the lowest known bit is 0(with lowest unknown 0), so (x * 2) & 1
should be 0.
But what about (x*(x+1)) & 1
? Multiplying odd numbers and even numbers must be even numbers.
But the lowest bit would be unknown after a multiply operation.
Do this kind of operation need an extra pass?
Here it is without the annoying function_view stuff obscuring the call stack:
I will update the issue with more information and code snippets as I acquire it. This is during a dead code elimination pass. BTW the reason I say it is on test_access and not on tracer is that the stack overflow starts with a call to test_access but maybe is more accurate to say the stack overflow is in tracer? Kind of a chicken and egg situation
It looks like the same 3 symbolic variables are being repeatedly traced and that the failure is happening in enum_paths. Tracing one of the symbolic variables is causing enum_paths to recurse back into rtrace_primitive, tracing the same variable we started with, ad infinitum. I am guessing this has something to do with the linkage between these variables, maybe my input vtil is invalid? I will check on it.
Small update: It looks one expression has its paths enumerated, and two paths are traced, and the second path links back up to the first one causing infinite recursion. The one preceding "Enumerating paths" followed immediately by "done enumerating paths" is totally irrelevant as it is traced without incident.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.