GithubHelp home page GithubHelp logo

sem2vec-se's Introduction

sem2vec

Find some useful information from https://sites.google.com/view/sem2vec

Install python packages

pip install angr networkx pandas numpy sklearn tqdm lark matplotlib

Or you could run pip install -r requirements.txt, but some packages maybe outdated. I recommand you to install the latest angr.

Then follow the official instruction to install pytorch and dgl

Modify package codes

Use angr (ver 9.**)

In file sem2vec-env/lib/site-packages/angr/procedures/posix/poll.py

Modify line 42 (the select has no attribute POLLIN, angr bugs)

            # if events & select.POLLIN and fd >= 0:
            #     revents = pollfd["revents"][self.arch.sizeof["short"]-1:1].concat(self.state.solver.BVS('fd_POLLIN', 1))
            #     self.state.memory.store(fds + offset * size_of_pollfd + offset_revents, revents, endness=self.arch.memory_endness)

            if events and fd >= 0:
                revents = pollfd["revents"][self.arch.sizeof["short"]-1:1].concat(self.state.solver.BVS('fd_POLLIN', 1))
                self.state.memory.store(fds + offset * size_of_pollfd + offset_revents, revents, endness=self.arch.memory_endness)

In file sem2vec-env/lib/site-packages/angr/storage/memory_mixins/address_concretization_mixin.py

In functions AddressConcretizationMixin.store and load, under if not trival, modify

        if not trivial:
            # apply the concretization results to the state
            constraint_options = [addr == concrete_addr for concrete_addr in concrete_addrs]
            conditional_constraint = self.state.solver.Or(*constraint_options)
            # sem2vec
            if hasattr(self.state, 'memaddr'):
                conditional_constraint = claripy.simplify(conditional_constraint)
                self.state.memaddr.record_additional_constraint(conditional_constraint)
            # sem2vec
            self._add_constraints(conditional_constraint, condition=condition, **kwargs)

It is necessary to modify some procedures of angr, to make sure the USE can run smoothly.

In file sem2vec-env/lib/site-packages/angr/procedures/libc

Modify classes in fprintf.py

class fprintf(FormatParser):

    def run(self, file_ptr):
        fd_offset = io_file_data_for_arch(self.state.arch)['fd']
        fileno = self.state.mem[file_ptr + fd_offset:].int.resolved
        simfd = self.state.posix.get_fd(fileno)
        if simfd is None:
            return -1
        # sem2vec
        return 0

Modify classes in printf.py

class printf(FormatParser):
    def run(self):
        stdout = self.state.posix.get_fd(1)
        if stdout is None:
            return -1
        # sem2vec
        return 0

class __printf_chk(FormatParser):
    def run(self):
        stdout = self.state.posix.get_fd(1)
        if stdout is None:
            return -1
        # sev2vec
        return 0

This is optional.

In file sem2vec-env/lib/site-packages/claripy/backends/backend_z3.py

in function BackendZ3._satisfiable

    def _satisfiable(self, extra_constraints=(), solver=None, model_callback=None):
        global solve_count
        # sem2vec
        solver.set('timeout', 3 * 1000)
        # sem2vec
        solve_count += 1
        if len(extra_constraints) > 0:

Download sample binaries

mkdir -p samples & cd samples

Then download the zip file from https://drive.google.com/file/d/17EVsS2ff7IMheYO_MllXU23aBLVLk_6G/view?usp=sharing , unzip it.

Collect symbolic tracelets

# run on coreutils
bash ./scripts/collect_coreutils_batch.sh

# run on binutils
bash ./scripts/collect_binutils_batch.sh

Get inputs for model

# run on coreutils
bash ./scripts/coreutils_nx_graphs_batch.sh

# run on binutils
bash ./scripts/binutils_nx_graphs_batch.sh

Build dataset for formula embedding

# this is the script to build dataset on coreutils compiled with gcc -O0 and gcc -O3
bash ./build_sameline_dataset.sh

sem2vec-se's People

Contributors

sem2vec avatar

Stargazers

songyhinf avatar M2kar avatar  avatar  avatar Ivan avatar ABai avatar Jing Qiu avatar Josh Collyer avatar  avatar Nirav Diwan avatar basher avatar Pingchuan Ma avatar Yuanyuan Yuan avatar Zhibo Liu avatar azhou avatar  avatar

Watchers

Huaijin Wang avatar  avatar

Forkers

kingloko

sem2vec-se's Issues

Problem of tigress

I have read the paper "Sem2Vec: Semantics-aware Assembly Tracelet Embedding". When I tried to use Tigress to process Diffutils, I encountered an error message: "/usr/include/stdlib.h[140:8-16] : syntax error". The command I used was:

tigress --Environment=x86_64:Linux:Gcc:4.6 --Transform=Virtualize --Functions=%100 -I. -I../lib -I../lib -I../lib -c --out=cmp-tigress.c cmp.c

I noticed in your paper that you applied Tigress for obfuscation on the Diffutils files. Have you encountered similar issues? If possible, could you please provide the commands and steps you followed for obfuscating the code?

Problems about SentenceTransformer

Hello, when I run bash ./scripts/coreutils_nx_graphs_batch.sh
I got a problem with No module named 'sentence_transformers'
Then I just install it with pip
This time the bug occurs as:

Traceback (most recent call last):
  File "./build_nx_graphs.py", line 3, in <module>
    from nx_graphs.prepare_data import dump_bin_tracelets_graphs, dump_bin_tracelets_inlined_graphs
  File "/home2/kyhe/workspace/sem2vec-SE/nx_graphs/prepare_data.py", line 13, in <module>
    from bert.encoding_formulas import tracelet_to_encoding
  File "/home2/kyhe/workspace/sem2vec-SE/bert/encoding_formulas.py", line 18, in <module>
    model = SentenceTransformer("./bert/FoBERT2-SS")
  File "/home2/kyhe/anaconda3/envs/sem2vec/lib/python3.8/site-packages/sentence_transformers/SentenceTransformer.py", line 77, in __init__
    raise ValueError("Path {} not found".format(model_name_or_path))
ValueError: Path ./bert/FoBERT2-SS not found

Did I install the wrong version of 'sentence_transformers'?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.