GithubHelp home page GithubHelp logo

vtil-project / vtil-core Goto Github PK

View Code? Open in Web Editor NEW
1.3K 57.0 165.0 3.93 MB

Virtual-machine Translation Intermediate Language

License: BSD 3-Clause "New" or "Revised" License

C++ 99.91% CMake 0.08% C 0.01%
virtual-machine-translation optimizing-compilers compiler intermediate-language optimizer devirtualizer deobfuscation

vtil-core's Introduction

VTIL

github-actions license discord

Virtual-machine Translation Intermediate Language

Introduction

1) What is VTIL?

VTIL Project, standing for Virtual-machine Translation Intermediate Language, is a set of tools designed around an optimizing compiler to be used for binary de-obfuscation and de-virtualization.

The main difference between VTIL and other optimizing compilers such as LLVM is that it has an extremely versatile IL that makes it trivial to lift from any architecture including stack machines. Since it is built for translation, VTIL does not abstract away the native ISA and keeps the concept of the stack, physical registers, and the non-SSA architecture of a general-purpose CPU as is. Native instructions can be emitted in the middle of the IL stream and the physical registers can be addressed from VTIL instructions freely.

VTIL also makes it trivial to emit code back into the native format at any virtual address requested without being constrained to a specific file format.

2) What is this repository?

This repository contains the core components of the VTIL Project used across the toolchain.

It is currently incomplete as the initial release is not done yet, and documentation and FAQ will be within this repository and the organization website once they're done.

Until the initial release, you can keep up to date with the VTIL project by checking my personal twitter account or the VTIL website vtil.org.

Building (Windows)

cmake -B build

Then open build\VTIL-Core.sln. You can also open this folder in a CMake-compatible IDE (Visual Studio, CLion, Qt Creator, VS Code).

Building (Linux/Mac)

cmake -G Ninja -B build -DCMAKE_BUILD_TYPE=Release
cmake --build build

Note: To build on Apple Silicon one needs to use -DCMAKE_OSX_ARCHITECTURES=x86_64.

vtil-core's People

Contributors

abay12676 avatar acurisu avatar altairq avatar arduinoidiot avatar can1357 avatar georgeto avatar heinrich5991 avatar ioncodes avatar l33t avatar meme avatar moepus avatar mrexodia avatar old-pigeon avatar pmeerw avatar staticinvocation avatar tai7sy avatar vmcall avatar wallds avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

vtil-core's Issues

Stack overflow during simpilfication. #2

expression X_00 = { {"X_00"}, 1 };
expression X_01 = { {"X_01"}, 1 };
expression X_02 = { {"X_02"}, 1 };
expression X_03 = { {"X_03"}, 1 };
expression X_04 = { {"X_04"}, 1 };
expression X_05 = { {"X_05"}, 1 };
expression X_06 = { {"X_06"}, 1 };

expression a = ((X_06 & X_00) | (X_05 & X_01));
expression b = ((((X_05 & X_00) | (X_04 & X_01)) & (((X_04 & X_00) | (X_03 & X_01)) & ((X_01 & X_00) & (X_03 | X_02)))) | ((X_05 & X_00) & (X_04 & X_01)));
expression c = ~(a & b);
log("c = %s\n", c.to_string());

How to extract an expression from a block ?

For example, I have this block, how do I correctly generate an expression for reg_ax

vtil::register_desc reg_ax(vtil::register_physical, registers::ax, vtil::arch::bit_count, 0);

auto block = vtil::basic_block::begin(0x1337);

block->mov(reg_ax, 0x10);
block->add(reg_ax, 0x1);
block->vexit(0ull);

I tried using the tracer like this, but to no avail. Regardless of whether I use tracer or not, I get the result "rax#0x1337", the same will happen without tracing if I just use variable.to_expression()

vtil::tracer tracer;
auto expression = tracer.rtrace_p({ block->begin(), reg_ax });

vtil::logger::log("%s\n", expression.to_string());

Hash problem in expression

Test code liking this

auto exp_a = __bt( variable_a, (uint32_t)0x6 ).simplify(true);
auto exp_b = __bt( variable_a, (uint8_t)0x6 ).simplify(true);

the hash of exp_a and exp_b is different,
but as a bittest, (uint32_t)0x6 and (uint8_t)0x6 should be same.

this will cause vm_jcc expression extract wrong destinations if a jcc expression contains both exp_a and exp_b. (when calcing destination of vm_jcc, using hash for compare expression, but hash is different)

if ( exp->lhs->is_identical( *cnd_out ) )

Stack overflow during simpilfication.

void test_vtil_crash()
{
expression X_00, X_01, X_02, X_03;
expression a,b,c;
X_00 = expression(unique_identifier("X_00"), 1);
X_01 = expression(unique_identifier("X_01"), 1);
X_02 = expression(unique_identifier("X_02"), 1);
X_03 = expression(unique_identifier("X_03"), 1);
a = ~(((X_03 & X_00) ^ (X_02 & X_01)) & (X_01 & X_00));
b = ((X_03 & X_00) ^ (X_02 & X_01));
printf("a = %s\n", a.to_string().c_str());
printf("b = %s\n", b.to_string().c_str());
c = ~(a & b); //crash at here
printf("c = %s\n", c.to_string().c_str());
}

Error while compiling

CMake Error at C:/Program Files/CMake/share/cmake-3.18/Modules/ExternalProject.c
make:2350 (message):
error: could not find git for clone of capstone-populate
Call Stack (most recent call first):
C:/Program Files/CMake/share/cmake-3.18/Modules/ExternalProject.cmake:3206 (_e
p_add_download_command)
CMakeLists.txt:13 (ExternalProject_Add)

Edit: im just forget to install git

What can this be used for?

Can I use it to convert vmprotect to an intermediate language,I want to restore the instructions after vm

Stack overflow in Tracer

lvm.hooks.read_register = [ & ] ( const register_desc& desc )
{
return trace( { it, desc } );
};

...
// Step one instruction, if result was successfuly captured, return.
//
if ( lvm.execute( *it ), result )
return result;

lvm.execute can do read_register:

if ( op.is_register() )
{
// Trace the source register.
//
symbolic::expression::reference result = read_register( op.reg() );
// If stack pointer, add the current virtual offset.

read_register in turn calls tracer:

symbolic::expression::reference read_register( const register_desc& desc ) const override
{
return hooks.read_register
? hooks.read_register( desc )
: vm_base::read_register( desc );
}

And se we end up with a stack overflow.

Forward declaration errors on Clang 10

$ clang --version
clang version 10.0.1 
Target: x86_64-pc-linux-gnu
Thread model: posix
InstalledDir: /usr/bin
$ gcc --version
gcc (GCC) 10.1.0
Copyright (C) 2020 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

(Note that the following also occurs on GCC 10.)

I am not clear on why -stdlib=libc++ is explicitly specified here: https://github.com/vtil-project/VTIL-Core/blob/master/VTIL-Common/CMakeLists.txt#L37-L39. Regardless, it breaks the build and causes missing includes:

In file included from _deps/vtil-core-src/VTIL-Common/arch/arm64/arm64_assembler.cpp:39:
_deps/vtil-core-src/VTIL-Common/arch/arm64/arm64_assembler.hpp:40:10: fatal error: 'string' file not found
#include <string>

So it is necessary to remove that broken configuration from the CMakeLists.txt before you are able to reproduce the following.


The forward declaration of cache_value here:
https://github.com/vtil-project/VTIL-Core/blob/master/VTIL-SymEx/simplifier/simplifier.cpp#L94
causes errors during template expansion for std::pair, etc. because of the declaration order here, I defined cache_map as follows:

- using cache_map = std::unordered_map<expression::reference, cache_value, signature_hasher, cache_scanner>;
+ using cache_map = std::unordered_map<expression::reference, cache_value*, signature_hasher, cache_scanner>;

(And subsequently updated all accesses of cache_value to use the arrow operator.)

Then updated the map to insert values of cache_value*

- auto [it, inserted] = map.emplace( exp, make_default<cache_value>() );
+ auto [it, inserted] = map.emplace( exp, new cache_value );

Obviously using new here is not the solution, but I am unfamiliar with the lifetimes of this value so I am wondering how you think this problem can best be solved.

I'd be happy to make a PR to fix both Clang and GCC, just wasn't able to figure out the best way to do it. As well, if you'd be open to adding the latest version of GCC and Clang to the CI I'd be happy to add that as well. Looks like it's already using Clang 10.

Question about the design

Why do you say it is for binary de-obfuscation and de-virtualization? Can VTIL de-virtualize the binary protected by VMP (https://vmpsoft.com/)? Beside, devirtualize the binary obfuscated by LLVM obfuscator, for instance?

Symbolic rewrite corruption

The following code is producing corrupt results:

void run_err_test_1()
{
	auto b = vtil::basic_block::begin(0);
	auto first = vtil::register_desc(vtil::register_flag::register_local, 0, 64);
	auto second_ptr = vtil::register_desc(vtil::register_flag::register_local, 5, 64);
	auto second = vtil::register_desc(vtil::register_flag::register_local, 4, 64);
	auto result = vtil::register_desc(vtil::register_flag::register_local, 6, 64);
	auto dest = vtil::register_desc(vtil::register_flag::register_local, 37, 64);

	// load first value
	b->ldd(first, X86_REG_R11, 0x0);

	// load second value
	b->mov(second_ptr, X86_REG_R11);
	b->add(second_ptr, 0x8);
	b->ldd(second, second_ptr, 0x0);

	// calculate the result
	b->mov(result, first);
	b->add(result, second);

	// store the result
	b->mov(dest, X86_REG_R11);
	b->add(dest, 0x8);
	b->str(dest, 0x0, result);

	apply_optimizations(b, 0, optimization_type::optimization_type_symbolic_rewrite_pass_forced, 0);
	vtil::debug::dump(b);
}
 | | 0000: [ PSEUDO ]     +0x0     movq     t0           r11
 | | 0001: [ PSEUDO ]     +0x0     lddq     t1           t0           0x0
 | | 0002: [ PSEUDO ]     +0x0     movq     t2           t0
 | | 0003: [ PSEUDO ]     +0x0     addq     t2           0x8
 | | 0004: [ PSEUDO ]     +0x0     lddq     t3           t0           0x8
 | | 0005: [ PSEUDO ]     +0x0     movq     t4           t1
 | | 0006: [ PSEUDO ]     +0x0     addq     t4           t3
 | | 0007: [ PSEUDO ]     +0x0     movq     t0           t1
 | | 0008: [ PSEUDO ]     +0x0     movq     t5           t2
 | | 0009: [ PSEUDO ]     +0x0     movq     t4           t3
 | | 0010: [ PSEUDO ]     +0x0     movq     t6           t4
 | | 0011: [ PSEUDO ]     +0x0     movq     t37          t2
 | | 0012: [ PSEUDO ]     +0x0     strq     t0           0x8          t4

After a bit of messing around, I've narrowed down the cause to the register naming. Rewriting all register names to a higher index produces the following instead:

 | | 0000: [ PSEUDO ]     +0x0     movq     t0           r11
 | | 0001: [ PSEUDO ]     +0x0     lddq     t1           t0           0x0
 | | 0002: [ PSEUDO ]     +0x0     movq     t2           t0
 | | 0003: [ PSEUDO ]     +0x0     addq     t2           0x8
 | | 0004: [ PSEUDO ]     +0x0     lddq     t3           t0           0x8
 | | 0005: [ PSEUDO ]     +0x0     movq     t4           t1
 | | 0006: [ PSEUDO ]     +0x0     addq     t4           t3
 | | 0007: [ PSEUDO ]     +0x0     movq     t1500        t1
 | | 0008: [ PSEUDO ]     +0x0     movq     t1501        t2
 | | 0009: [ PSEUDO ]     +0x0     movq     t1502        t3
 | | 0010: [ PSEUDO ]     +0x0     movq     t1503        t4
 | | 0011: [ PSEUDO ]     +0x0     movq     t1504        t2
 | | 0012: [ PSEUDO ]     +0x0     strq     t0           0x8          t4

The pass appears to be restoring to the original registers, and is clobbering them in the process.

Optimize (x*(x+1)) & 1 ?

Assuming the multiplication operator has been implemented

Expression (x + x) & 1 can be transformed to (x * 2) & 1.
After x * 2, the lowest known bit is 0(with lowest unknown 0), so (x * 2) & 1 should be 0.

But what about (x*(x+1)) & 1 ? Multiplying odd numbers and even numbers must be even numbers.
But the lowest bit would be unknown after a multiply operation.
Do this kind of operation need an extra pass?

crash 0518

expression X_00 = { {"X_00"}, 1 };
expression X_01 = { {"X_01"}, 1 };
expression X_02 = { {"X_02"}, 1 };
expression X_03 = { {"X_03"}, 1 };
expression X_04 = { {"X_04"}, 1 };
expression X_05 = { {"X_05"}, 1 };
expression X_06 = { {"X_06"}, 1 };
expression X_07 = { {"X_07"}, 1 };
expression X_08 = { {"X_08"}, 1 };
expression X_09 = { {"X_09"}, 1 };
expression X_0A = { {"X_0A"}, 1 };
expression X_0B = { {"X_0B"}, 1 };

expression a = ((X_0B & X_00) | (X_0A & X_01));
expression b = ((((X_0A & X_00) | (X_09 & X_01)) & ((((X_09 & X_00) | (X_08 & X_01)) & ((((X_08 & X_00) | (X_07 & X_01)) & ((((X_07 | X_06) & ((((X_01 & X_00) & (X_05 | (X_04 & (X_00 & X_06)))) & (X_03 | ((X_03 | X_02) & (X_00 & X_04)))) | ((X_01 & X_04) & (X_05 & X_00)))) | ((X_01 & X_05) & (X_06 & X_00))) | ((X_07 & X_00) & (X_06 & X_01)))) | ((X_08 & X_00) & (X_07 & X_01)))) | ((X_09 & X_00) & (X_08 & X_01)))) | ((X_0A & X_00) & (X_09 & X_01)));
expression c = a & b;
log("c = %s\n", c.to_string());

BUG: Some basic blocks have been incorrectly removed.

Hello.

DOCTEST_TEST_CASE("dummy")
{
    vtil::logger::log("\n\n>> %s \n", __FUNCTION__);
    auto block = vtil::basic_block::begin(0);
    auto [t0, t1, t2, t3] = block->tmp(64, 64, 1, 64);
    auto rtn = block->owner;
    block->mov(t0, vtil::REG_FLAGS);
    block->bnot(t0);
    block->ifs(t1, t0.select(1, 2), 0x1000);
    block->mov(t2, t0.select(1, 2));
    block->bnot(t2);
    block->ifs(t3, t2, 0x2000);
    block->add(t1, t3);
    block->add(t1, vtil::REG_IMGBASE);
    block->jmp(t1);

    if (auto block_1000 = block->fork(0x1000)) {
        block_1000->jmp(0x3000);
        block_1000->fork(0x3000);
    }
    if (auto block_2000 = block->fork(0x2000)) {
        block_2000->jmp(0x3000);
        block_2000->fork(0x3000);
    }
    if (auto block_3000 = rtn->get_block(0x3000)) {
        block_3000->vexit(uintptr_t(0xdeadc0de));
    }

    vtil::logger::log(":: Before:\n");
    vtil::debug::dump(rtn);
    vtil::optimizer::bblock_thunk_removal_pass{}(rtn);
    vtil::optimizer::branch_correction_pass{}(rtn);
    vtil::logger::log(":: After:\n");
    vtil::debug::dump(rtn);
    vtil::logger::log(":: Over:\n");
    CHECK(1 == 1);
}

Result

image

Relevant code

for ( auto it = blk->next.begin(); it != blk->next.end(); )
{
// Check if this destination is plausible or not.
//
vip_t target = ( *it )->entry_vip;
bool plausible = false;
for ( auto& branch : branch_info.destinations )
plausible |= ( branch == target ).get<bool>().value_or( true );
// If it is not:
//
if ( !plausible )
{
// Delete prev and next links.
//
( *it )->prev.erase( std::remove( ( *it )->prev.begin(), ( *it )->prev.end(), blk ), ( *it )->prev.end() );
it = blk->next.erase( it );
// Increment counter and continue.
//
cnt++;
continue;
}
// Otherwise increment iterator and continue.
//
++it;
}

Description

The branch_correction_pass will remove branches that do not exist in the branch_info.
I think this is correct, but the bblock_thunk_removal_pass is too aggressive and also handles jump instructions that have not been correctly converted to js instructions.

Comparison between unknowns return immediate result.

vtil::symbolic::expression op1 = { {"op1"}, 4 };
vtil::symbolic::expression op2 = { {"op2"}, 4 };

// out: (op1==op2)
auto dst1 = op1 == op2;
vtil::logger::log("%s\n", dst1.simplify().to_string().c_str());

// out: 0x0
auto dst2 = (op1 - op2) == 0;
vtil::logger::log("%s\n", dst2.simplify().to_string().c_str());

Stack overflow on test_access

stack overflow

Here it is without the annoying function_view stuff obscuring the call stack:

without invocable

I will update the issue with more information and code snippets as I acquire it. This is during a dead code elimination pass. BTW the reason I say it is on test_access and not on tracer is that the stack overflow starts with a call to test_access but maybe is more accurate to say the stack overflow is in tracer? Kind of a chicken and egg situation

UPDATE:
rtrace output

It looks like the same 3 symbolic variables are being repeatedly traced and that the failure is happening in enum_paths. Tracing one of the symbolic variables is causing enum_paths to recurse back into rtrace_primitive, tracing the same variable we started with, ad infinitum. I am guessing this has something to do with the linkage between these variables, maybe my input vtil is invalid? I will check on it.

Small update: It looks one expression has its paths enumerated, and two paths are traced, and the second path links back up to the first one causing infinite recursion. The one preceding "Enumerating paths" followed immediately by "done enumerating paths" is totally irrelevant as it is traced without incident.

error while compiling

vtil\VTIL-NativeLifters\Dependencies\VTIL-Core\VTIL-Compiler\includes\vtil../../common/auxiliaries.hpp(37,38): error C2149: 'vtil::optimizer::aux::branch_analysis_flags::pack': named bit field cannot have zero width
\vtil\VTIL-NativeLifters\Dependencies\VTIL-Core\VTIL-Compiler\includes\vtil../../common/auxiliaries.hpp(38,38): error C2106: '=': left operand must be l-value

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.