GithubHelp home page GithubHelp logo

c3rb3ru5d3d53c / binlex Goto Github PK

View Code? Open in Web Editor NEW
383.0 16.0 45.0 17.24 MB

A Binary Genetic Traits Lexer Framework

License: The Unlicense

CMake 0.58% Makefile 0.11% C++ 96.90% Python 1.85% C 0.54% Dockerfile 0.04%
malware malware-research malware-analysis yara genetic-algorithm machine-learning genetic-programming reverse-engineering

binlex's Introduction

binlex

A Genetic Binary Trait Lexer Library and Utility

The purpose of binlex is to extract basic blocks and functions as traits from binaries for malware research, hunting and detection.

Most projects attempting this use Python to generate traits, but it is very slow.

The design philosophy behind binlex is it to keep it simple and extendable.

The simple command-line interface allows malware researchers and analysts to hunt traits across hundreds or thousands of potentially similar malware saving time and money in production environments.

While the C++ API allows developers to get creative with their own detection solutions, completely unencumbered by license limitations.

To help combat malware, we firmly commit our work to the public domain for the greater good of the world.

build OS Linux OS Windows OS MacOS GitHub stars GitHub forks Discord Status GitHub license GitHub all releases

Demos

animated

Introduction Video

Get slides here.

Use Cases

  • YARA Signature Creation/Automation
  • Identifying Code-Reuse
  • Threat Hunting
  • Building Goodware Trait Corpus
  • Building Malware Trait Corpus
  • Genetic Programming
  • Machine Learning Malware Detection

Installation

This part of the guide will show you how to install and use binlex.

Dependencies

To get started you will need the following dependencies for binlex.

Linux

sudo apt install -y git build-essential \
                    cmake make parallel \
                    doxygen git-lfs rpm \
                    python3 python3-dev

macOS

brew install cmake parallel doxygen git-lfs

All Platforms

git clone --recursive https://github.com/c3rb3ru5d3d53c/binlex.git
cd binlex/

NOTE: that binlex requires cmake >= 3.5, make >= 4.2.1 and ubuntu >= 20.04.

Once you have installed, cloned and changed your directory to the project directory, we can continue with installation.

From Source

If you want to compile and install use cmake with the following commands:

cmake -B deps/build -S deps
cmake --build deps/build --config Release --parallel 8
cmake -B build -DBUILD_PYTHON_BINDINGS=ON
cmake --build build --config Release --parallel 8

build/binlex -m auto -i tests/elf/elf.x86

Binary Releases

See the releases page.

If you need the bleeding edge binaries you can download them from our GitHub Actions here.

NOTE: bleeding edge binaries are subject to bugs, if you encounter one, please let us know!

Building Packages

Additionally, another option is to build Debian binary packages for and install those.

To build packages use cpack, which comes with cmake.

cmake -B deps/build -S deps
cmake --build deps/build --config Release --parallel 8
cmake -B build -DBUILD_PYTHON_BINDINGS=ON
cmake --build build --config Release --parallel 8
sudo apt install ./build/binlex_1.1.1_amd64.deb
binlex -m elf:x86 -i tests/elf/elf.x86

You will then be provided with .deb, .rpm and .tar.gz packages for binlex.

Building Python Bindings

To get started using pybinlex:

virtualenv -p python3 venv
source venv/bin/activate
# Install Library
pip install -v .
# Build Wheel Package
pip wheel -v -w build/ .
python3
>>> import pybinlex

If you wish to compile the bindings with cmake:

make config=Release threads=4 args=-DBUILD_PYTHON_BINDINGS=ON

NOTE: we use pybind11 and support for python3.9 is experimental.

Examples of how to use pybinlex can be found in tests/tests.py.

Binlex Web API

docker build -t binlex:latest .
docker run --rm -p 8080:8080 -e LOG_LEVEL='debug' -it binlex:latest

Browse to http://127.0.0.1:8080 to view the web API documentation.

Example Requests

# Get Modes
curl http://127.0.0.1/api/v1/modes

# Get Traits
curl -X POST http://127.0.0.1:8080/api/v1/<corpus>/<mode>/<tags> --upload-file <file>

Python Web API Wrapper Example

#!/usr/bin/env python

import json
from libpybinlex.webapi import WebAPIv1

api = WebAPIv1(url='http://127.0.0.1:8080')
data = open('sample.bin', 'rb').read()
response = api.get_traits(
  data=data,
  corpus='default',
  mode='pe:x86',
  tags=['foo', 'bar'])
traits = json.loads(response.content)
print(json.dumps(traits, indent=4))

Test Files

  • To download all the test samples do the command git lfs fetch
  • ZIP files in the tests/ directory can then be extracted using the password infected

NOTE: The tests/ directory contains malware, we assume you know what you are doing.

To download individual git-lfs files from a relative path, you can use the following git alias in ~/.gitconfig:

[alias]
download = "!ROOT=$(git rev-parse --show-toplevel); cd $ROOT; git lfs pull --include $GIT_PREFIX$1; cd $ROOT/$GIT_PREFIX"

You will then be able to do the following:

git download tests/pe/pe.zip

CLI Usage

binlex v1.1.1 - A Binary Genetic Traits Lexer
  -i  --input           input file              (required)
  -m  --mode            set mode                (optional)
  -lm --list-modes      list modes              (optional)
  -c  --corpus          corpus name             (optional)
  -g  --tag             add a tag               (optional)
                        (can be specified multiple times)
  -t  --threads         number of threads       (optional)
  -to --timeout         execution timeout in s  (optional)
  -h  --help            display help            (optional)
  -o  --output          output file             (optional)
  -p  --pretty          pretty output           (optional)
  -d  --debug           print debug info        (optional)
  -v  --version         display version         (optional)
Author: @c3rb3ru5d3d53c

Supported Modes

  • elf:x86
  • elf:x86_64
  • pe:x86
  • pe:x86_64
  • pe:cil
  • raw:x86
  • raw:x86_64
  • raw:cil
  • auto

NOTE: The raw modes can be used on shellcode.

NOTE: The auto mode cannot be used on shellcode.

Advanced

If you are hunting using binlex you can use jq to your advantage for advanced searches.

build/binlex -m auto -i tests/pe/pe.x86 | jq -r 'select((.size > 8 and .size < 16) and (.bytes_sha256 != .traits.sha256)) | .trait' | head -10
8b 48 ?? 03 c8 81 39 50 45 00 00 75 12
0f b7 41 ?? 3d 0b 01 00 00 74 1f
83 b9 ?? ?? ?? ?? ?? 76 f2
33 c0 39 b9 ?? ?? ?? ?? eb 0e
83 4d ?? ?? b8 ff 00 00 00 e9 ba 00 00 00
89 75 ?? 66 83 3e 22 75 45
03 f3 89 75 ?? 66 8b 06 66 3b c7 74 06
03 f3 89 75 ?? 66 8b 06 66 3b c7 74 06
56 ff 15 ?? ?? ?? ?? ff 15 ?? ?? ?? ?? eb 2d
55 8b ec 51 56 33 f6 66 89 33 8a 07 eb 29

Here are examples of additional queries.

# Block traits with a size between 0 and 32 bytes
jq -r 'select(.type == "block" and .size < 32 and .size > 0)'
# Function traits with a cyclomatic complexity greater than 32 (maybe obfuscation)
jq -r 'select(.type == "function" and .cyclomatic_complexity > 32)'
# Traits where bytes have high entropy
jq -r 'select(.bytes_entropy > 7)'
# Output all trait strings only
jq -r '.trait'
# Output only trait hashes
jq -r '.trait_sha256'

If you output just traits you want to stdout you can do build a yara signature on the fly with the included tool blyara:

build/binlex -m raw:x86 -i tests/raw/raw.x86 | jq -r 'select(.size > 16 and .size < 32) | .trait' | build/blyara --name example_0 -m author example -m tlp white -c 1
rule example_0 {
    metadata:
        author = "example"
        tlp = "white"
    strings:
        trait_0 = {52 57 8b 52 ?? 8b 42 ?? 01 d0 8b 40 ?? 85 c0 74 4c}
        trait_1 = {49 8b 34 8b 01 d6 31 ff 31 c0 c1 cf ?? ac 01 c7 38 e0 75 f4}
        trait_2 = {e8 67 00 00 00 6a 00 6a ?? 56 57 68 ?? ?? ?? ?? ff d5 83 f8 00 7e 36}
    condition:
        1 of them
}

You can also use the switch --pretty to output json to identify more properies to query.

build/binlex -m auto -i tests/pe/pe.emotet.x86 -c malware -g malware:emotet -g malware:loader | head -1 | jq
{
  "average_instructions_per_block": 29,
  "blocks": 1,
  "bytes": "55 8b ec 83 ec 1c 83 65 f0 00 33 d2 c7 45 e4 68 5d df 00 c7 45 e8 43 c4 cb 00 c7 45 ec 8f 08 46 00 c7 45 f8 06 3b 43 00 81 45 f8 25 7a ff ff 81 75 f8 30 f4 44 00 c7 45 fc 22 51 53 00 8b 45 fc 6a 3f 59 f7 f1 6a 1c 89 45 fc 33 d2 8b 45 fc 59 f7 f1 89 45 fc 81 75 fc 3c 95 0e 00 c7 45 f4 0b 16 11 00 81 45 f4 e1 21 ff ff 81 75 f4 79 bd 15 00 ff 4d 0c 75 21",
  "bytes_entropy": 5.333979606628418,
  "bytes_sha256": "13e0463c5837bc5ce110990d69397662b82b8de8a9971f77b237f2a6dd2d8982",
  "corpus": "malware",
  "cyclomatic_complexity": 3,
  "edges": 2,
  "file_sha256": "7b01c7c835552b17f17ad85b8f900c006dd8811d708781b5f49f231448aaccd3",
  "file_tlsh": "42E34A10F3D341F7DC9608F219B6B22F9F791E023124DFA987981F57ADB5246A2B981C",
  "instructions": 29,
  "invalid_instructions": 0,
  "mode": "pe:x86",
  "offset": 49711,
  "size": 118,
  "tags": [
    "malware:emotet",
    "malware:loader"
  ],
  "trait": "55 8b ec 83 ec 1c 83 65 ?? ?? 33 d2 c7 45 ?? ?? ?? ?? ?? c7 45 ?? ?? ?? ?? ?? c7 45 ?? ?? ?? ?? ?? c7 45 ?? ?? ?? ?? ?? 81 45 ?? ?? ?? ?? ?? 81 75 ?? ?? ?? ?? ?? c7 45 ?? ?? ?? ?? ?? 8b 45 ?? 6a 3f 59 f7 f1 6a 1c 89 45 ?? 33 d2 8b 45 ?? 59 f7 f1 89 45 ?? 81 75 ?? ?? ?? ?? ?? c7 45 ?? ?? ?? ?? ?? 81 45 ?? ?? ?? ?? ?? 81 75 ?? ?? ?? ?? ?? ff 4d ?? 75 21",
  "trait_entropy": 3.9699645042419434,
  "trait_sha256": "7b04c2dbcc3cf23abfdd457b592b4517e4d98b5c83e692c836cde5b91899dd68",
  "type": "block"
}

With binlex it is up to you to remove goodware traits from your extracted traits.

There have been many questions about removing "library code", there is a make target shown below to help you with this.

make traits-clean remove=goodware.traits source=sample.traits dest=malware.traits

With binlex the power is in your hands, "With great power comes great responsibility", it is up to you!

Plugins

There has been some interest in making IDA, Ghidra and Cutter plugins for binlex.

This is something that will be started soon as we finish the HTTP API endpoints.

This README.md will be updated when they are ready to use.

General Usage Information

Binlex is designed to do one thing and one thing only, extract genetic traits from executable code in files. This means it is up to you "the researcher" / "the data scientist" to determine which traits are good and which traits are bad. To accomplish this, you need to use your own fitness function. I encourage you to read about genetic programming to gain a better understanding of this in practice. Perhaps watching this introductory video will help your understanding.

Again, it's up to you to implement your own algorithms for detection based on the genetic traits you extract.

Trait Format

Traits will contain binary code represented in hexadecimal form and will use ?? as wild cards for memory operands or other operands subject to change.

They will also contain additional properties about the trait including its offset, edges, blocks, cyclomatic_complexity, average_instruction_per_block, bytes, trait, trait_sha256, bytes_sha256, trait_entropy, bytes_entropy, type, size, invalid_instructions and instructions.

{
  "average_instructions_per_block": 6,
  "blocks": 1,
  "bytes": "8b 45 08 a3 10 52 02 10 8b 45 f8 e8 fb d7 00 00 85 c0 74 0d",
  "bytes_entropy": 3.9219279289245605,
  "bytes_sha256": "435cb166701006282e457d441ca793e795e38790cacc5b250d4bc418a28961c3",
  "corpus": "malware",
  "cyclomatic_complexity": 3,
  "edges": 2,
  "file_sha256": "7b01c7c835552b17f17ad85b8f900c006dd8811d708781b5f49f231448aaccd3",
  "file_tlsh": "42E34A10F3D341F7DC9608F219B6B22F9F791E023124DFA987981F57ADB5246A2B981C",
  "instructions": 6,
  "invalid_instructions": 0,
  "mode": "pe:x86",
  "offset": 49829,
  "size": 20,
  "tags": [
    "malware:emotet",
    "malware:loader"
  ],
  "trait": "8b 45 ?? a3 ?? ?? ?? ?? 8b 45 ?? e8 fb d7 00 00 85 c0 74 0d",
  "trait_entropy": 3.3787841796875,
  "trait_sha256": "fe3b057a28b40a02ac9dd2db6c3208f96f7151fb912fb3c562a7b4581bb7f7a0",
  "type": "block"
}

Documentation

Public documentation on binlex can be viewed here.

Building Docs

You can access the C++ API Documentation and everything else by building the documents using doxygen.

make docs threads=4

The documents will be available at build/docs/html/index.html.

Example Library Code

As you may already know binlex can be used as a C++ and Python library API.

This allows you to write your own detection logic surrounding the data binlex extracts.

C++ API Example Code

The power of detection is in your hands, binlex is a framework, leverage the C++ API.

#include <binlex/pe.h>
#include <binlex/disassembler.h>

using namespace binlex;

int main(int argc, char **argv){
  PE pe;
  if (pe.ReadFile("example.exe") == false){
      return EXIT_FAILURE;
  }
  Disassembler disassembler(pe32);
  disassembler.Disassemble();
  disassembler.WriteTraits();
  return EXIT_SUCCESS;
}

Python API Example Code

The power of detection is in your hands, binlex is a framework, leverage the C++ API.

#!/usr/bin/env python

import sys
import pybinlex

pe = pybinlex.PE()
result = pe.read_file('example.exe')
if result is False: sys.exit(1)
disassembler = pybinlex.Disassembler(pe)
disassembler.disassemble()
traits = disassembler.get_traits()
print(json.dumps(traits, indent=4))

We hope this encourages people to build their own detection solutions based on binary genetic traits.

Tips

  • If you are hunting be sure to use jq to improve your searches
  • Does not support PE files that are VB6 if you run against these you will get errors
  • Don't mix packed and unpacked malware, or you will taint your dataset (seen this in academics all the time)
  • When comparing samples to identify if their code is similar, DO NOT mix their architectures in your comparison
    • Comparing CIL byte-code or .NET to x86 machine code may yield interesting properties, but they are invalid
  • Verify the samples you are collecting into a group using skilled analysts
  • These traits are best used with a hybrid approach (supervised)

Example Fitness Model

Traits will be compared amongst their common malware family, any traits not common to all samples will be discarded.

Once completed, all remaining traits will be compared to traits from a goodware set, any traits that match the goodware set will be discarded.

To further differ the traits from other malware families, the remaining population will be compared to other malware families, any that match will be discarded.

The remaining population of traits will be unique to the malware family tested and not legitimate binaries or other malware families.

This fitness model allows for accurate classification of the tested malware family.

Future Work

  • Java byte-code Support raw:jvm, java:jvm
  • Python byte-code Support raw:pyc, python:pyc
  • More API Endpoints
  • Cutter, Ghidra and IDA Plugins
  • Mac-O Support macho:x86_64, macho:x86

Contributing

If you wish to contribute to Binlex DM me on Twitter here.

You can also join our Discord here.

Currently looking for help on:

  • MacOS Developer (Parse Mach-O)
  • Plugin Developers (Python)
  • Front-End Developers (Python)

binlex's People

Contributors

c3rb3ru5d3d53c avatar catalinv-ncc avatar g0nzu1 avatar herrcore avatar idiom avatar jbx81-1337 avatar jershmagersh avatar jgru avatar kayleylahaie avatar knightsc avatar markel-d00rt-tr avatar mihino89 avatar mrexodia avatar oopo avatar pisco-sour avatar rpkrawczyk avatar sophia-brandt avatar victoriagray avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

binlex's Issues

Tagging

Add a CLI parameter to specify a tag string.

Will allow to find similar samples not just across corpus will also allow for tracking threat-actor code reuse

Cleaning the docker.sh script

Using the blserver branch, investigate the use of Docker Swarm instead of docker.sh to generate the docker-compose.yml.

This should make deployment of our containers much more user friendly.

If we decide to go with Docker Swarm let c3rb3ru5 know and begin working at replacing docker.sh

File Parsing Methods

ReadStream(char* bytes, int bytesize, ...)
and then ReadFile, just open the file and use ReadStream
in this way
we can make 2 low level C api, that can be called in python
one: passing a path
two: passing bytes

Wildcard NOPs

Wildcard nops for x86/x86_64

CloudEye uses NOPs a beginning of functions, wild card these so leading wildcards can be parsed out

Granularity of the output

Hello, nice project!. I'm wondering whether it wouldn't be useful to make the output more granular - additionally generate traits/bytes for individual blocks/instructions/opcodes/operands. I believe that it would be useful for subsequent processing in some cases, e.g. the first team in the Microsoft Malware Classification Challenge used frequency of opcodes and their N-grams among other things.

IDA Plugin

Write IDA Plugin Similar to Cutter Plugin

Recursive Decompiler

Modify class DecompilerREV for testing, and implement it before making the switch over.

Should just be able to replace DecompilerREV with Decompiler to use once ready.

Problems with instructions

The latest version of binlex v1.1.1 outputs some seemingly incorrect instruction traits. Tested on an OpenSSL library (SHA1:ef406228f7694359c5f87e2ee7b4f760dcf160f6). Command binlex -m pe:x86_64 --instructions -i <lib_name> | jq -r 'select(.type == ("instruction")) | .trait' returns a number of weird traits such as 00 00, 00 ff, ??

Memory leak in ClearTrait()

The function ClearTrait overwrites trait->bytes_sha256 with NULL. The memory which may have been allocated is not freed resulting in a memory leak.
The same is true for trait->trait.

MongoDB Schema, Shards, Replicas, Configs and Routers & RabbitMQ Cluster & Binlex MongoDB / Messaging Queue Workers and HTTP API

In order to work with frequency analysis on traits, we would need to track the file hashes associated with given traits.

To do this we would need the equivalent of a stored procedure in mongodb when documents are posted to keep records of hashes for traits.

This would make the db a little more complex, but it the pay off would be pretty great, as we would be able to search traits by sample hash and more.

x86 / x86_64 Recursion

image

image

Binlex has an issue with x86 code that ends abruptly, should handle with recursion.

Example code from emotet:

dump.bin.zip

The pe.h works great, just capstone being capstone.

Problems with function recognition

Hello! I wanted to process an OpenSSL library and noticed that the latest version of binlex recognized only a negligible number of functions - 7 meanwhile IDA recognized 1636. I used command binlex -m pe:x86_64 -i <lib_name> | jq -r 'select(.type == ("function"))', am I doing something wrong or is there a bug please?

Refactoring

Refactoring of code to make C++ API more accessible and readable

QA: v1.1.1 Milestone

  • Track Feature Requests for Staging
  • Track which ones fail, which ones pass
  • feature -> qa (staging) -> milestone (v1.1.1) -> master (prod)

Inconsistent Wildcarding

Solve with this solution:

string Common::WildcardTrait(string trait, string bytes){
    int count = bytes.length();
    for(int i = 0; i < count - 2; i = i + 3){
        bytes.erase(bytes.length() - 3);
        size_t index = trait.find(bytes, 0);
        if (index != string::npos){
            for (int j = index; j < trait.length(); j = j + 3){
                trait.replace(j, 2, "??");
            }
            break;
        }
    }
    return TrimRight(trait);
}
string Decompiler::WildcardInsn(cs_insn *insn){
    string bytes = HexdumpBE(insn->bytes, insn->size);
    string trait = bytes;
    for (int j = 0; j < insn->detail->x86.op_count; j++){
        cs_x86_op operand = insn->detail->x86.operands[j];
        switch(operand.type){
            case X86_OP_MEM:
                {
                    if (operand.mem.disp != 0){
                        trait = WildcardTrait(bytes, HexdumpBE(&operand.mem.disp, sizeof(uint64_t)));
                    }
                    break;
                }
            default:
                break;
        }
    }
    return TrimRight(trait);
}

TLSH Version Bump

Is your feature request related to a problem? Please describe.
No

Describe the solution you'd like
Bump TLSH Version

Describe alternatives you've considered
N/A

Additional context
N/A

Obfuscated Trait Detection, Thresholds and Recursion

  • ✔️ Cyclomatic Complexity
  • ✔️ Basic Block Size in Bytes
  • ✔️ Basic Block Instruction Count
  • ✔️ Function Size
  • ✔️ Average Instructions per Block
  • ✔️ Use cs_disasm_iter() for improved speed and control over program counter
  • ✔️ Pretty Print
  • ✔️ Recursive Decompilation by Instruction (fine-tuned control over exceptions)
  • ✔️ Wildcard Scalars

Allows the user to fine-tune their output, calculating these is best left to decompiler stage especially cyclomatic complexity.

[
  {
    "average_instructions_per_block": 3,
    "blocks": 1,
    "bytes": "01 c3 29 c6 75 c1",
    "bytes_entropy": 0,
    "bytes_sha256": "5776a6a5e142981e2848b93a068268018809b786e310fca8b142cadd724f6f9a",
    "instructions": 3,
    "offset": 337,
    "size": 6,
    "type": "block"
  },
  {
    "average_instructions_per_block": 10,
    "blocks": 15,
    "bytes": "fc e8 8f 00 00 00 60 89 e5 31 d2 64 8b 52 30 8b 52 0c 8b 52 14 31 ff 8b 72 28 0f b7 4a 26 31 c0 ac 3c 61 7c 02 2c 20 c1 cf 0d 01 c7 49 75 ef 52 57 8b 52 10 8b 42 3c 01 d0 8b 40 78 85 c0 74 4c 01 d0 50 8b 58 20 8b 48 18 01 d3 85 c9 74 3c 49 8b 34 8b 01 d6 31 ff 31 c0 c1 cf 0d ac 01 c7 38 e0 75 f4 03 7d f8 3b 7d 24 75 e0 58 8b 58 24 01 d3 66 8b 0c 4b 8b 58 1c 01 d3 8b 04 8b 01 d0 89 44 24 24 5b 5b 61 59 5a 51 ff e0 58 5f 5a 8b 12 e9 80 ff ff ff 5d 68 33 32 00 00 68 77 73 32 5f 54 68 4c 77 26 07 89 e8 ff d0 b8 90 01 00 00 29 c4 54 50 68 29 80 6b 00 ff d5 6a 0a 68 5d b8 d8 22 68 02 00 11 5c 89 e6 50 50 50 50 40 50 40 50 68 ea 0f df e0 ff d5 97 6a 10 56 57 68 99 a5 74 61 ff d5 85 c0 74 0a ff 4e 08 75 ec e8 67 00 00 00 6a 00 6a 04 56 57 68 02 d9 c8 5f ff d5 83 f8 00 7e 36 8b 36 6a 40 68 00 10 00 00 56 6a 00 68 58 a4 53 e5 ff d5 93 53 6a 00 56 53 57 68 02 d9 c8 5f ff d5 83 f8 00 7d 28 58 68 00 40 00 00 6a 00 50 68 0b 2f 0f 30 ff d5 57 68 75 6e 4d 61 ff d5 5e 5e ff 0c 24 0f 85 70 ff ff ff e9 9b ff ff ff 01 c3 29 c6 75 c1 c3",
    "bytes_entropy": 0,
    "bytes_sha256": "ab8e5368e7965b1520f44ab6b7b66ebdf9c9d203b730e444eed758856a07cdb3",
    "instructions": 150,
    "offset": 0,
    "size": 344,
    "type": "function"
  }
]

Thresholds with jq make hunting easy with a query language:

build/binlex -m raw:x86 -i tests/raw/raw.x86 | jq -r '.[] | select(.type == "block" and .size < 32 and .size > 0) | .bytes'
2c 20 c1 cf 0d 01 c7 49 75 ef
52 57 8b 52 10 8b 42 3c 01 d0 8b 40 78 85 c0 74 4c
01 d0 50 8b 58 20 8b 48 18 01 d3 85 c9 74 3c
49 8b 34 8b 01 d6 31 ff 31 c0 c1 cf 0d ac 01 c7 38 e0 75 f4
03 7d f8 3b 7d 24 75 e0
58 5f 5a 8b 12 e9 80 ff ff ff
ff 4e 08 75 ec
e8 67 00 00 00 6a 00 6a 04 56 57 68 02 d9 c8 5f ff d5 83 f8 00 7e 36
e9 9b ff ff ff
01 c3 29 c6 75 c1

✔️ To achieve easier management of strings move to std::string and json instead of char *.

Trait format will change thus should be a minor version bump.

Article with research:
obf.pdf

References:

Timeout

Add option for execution timeout for advanced users

Function Names

Attach function names to the queue when parsing shared libs, DLLs, etc

Function names shall be included in the json

CIL/.NET Binary Support

This is already partially implemented in the branch pe_cil.

Work with this branch to add the necessary functionality.

Windows: CMake Parameter Issue

This parameter "CMAKE_CXX_FLAGS_RELEASE:STRING=" is inside C/C++ compiling options and breaks IntelliSense of VisualStudio, removing it manually fix the problem. trying to understand how to remove it.

Trait Assembly

Add assembly output to traits for debugging and additional information (optional cli switch)

CIL: Strange error when processing specific obfuscated .NET binary.

Description:
Strange error when processing specific obfuscated .NET binary.

To Reproduce:
Download pe.cil.2.zip

Run:

binlex -m auto -i pe.cil.2
Try to read 0x4 bytes from 0x153e00 (153e04) which is bigger than the binary's size
Try to read 0x4 bytes from 0x153e00 (153e04) which is bigger than the binary's size
Try to read 0x4 bytes from 0x153e00 (153e04) which is bigger than the binary's size

Expected Behavior:
Output traits

Affected OS/Version:
Linux/v1.1.1-rc1

File size should be long not int

In

int Common::GetFileSize(FILE *fd){
    int start = ftell(fd);
    fseek(fd, 0, SEEK_END);
    int size = ftell(fd);
    fseek(fd, start, SEEK_SET);
    return size;
}
the file size is returned as an `int`. Depending on the architecture of the machine used this may overflow, `long` is the return value of ftell().

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.