GithubHelp home page GithubHelp logo

puffin's Introduction

Hardware

All the hardware used in this paper is the corporate resources of TikTok Inc. According to the corporate policy, there is no way to vpn-access the hardware resources from outside at this moment. But, we would like to give the detailed machine architectures and configurations as below.

We have used two types of machines in our experiments and their detailed descriptions are shown below:

  • Client machine: Intel Xeon CPU 8163 2.50GHz with 512GB RAM, running Linux kernel 5.0.
  • Central server: Accelerator is 8 Nvidia Tesla V100 GPU cards of the Volta architecture, each of which is with 5120 streaming cores, 640 tensor cores and 32GB memory capacity. CUDA version is 11.0 and pytorch version is 1.6.0. Host is Intel Xeon CPU 8163 2.50GHz with 512GB RAM, running Linux kernel 5.0.
  • The central server is connected with the client machines through 100Gb NIC.

Software and Code

The code consists of three pieces: collector, predictor and monitor. All these codes is partially running in the production at TikTok Inc. According to the corporate security policy, there is no way to open-source the code at this moment. But, we would like to describe some basics of the code structures and the instructions to run.

Collector

We need to collect both static data and dynamic data for predictor. The static data of each sample contains 3 parts: CG, CFG, raw data of instructions. The dynamic data of each sample contains memory states. In order to perform model training and accuracy evaluation, we also need to collect labels for each sample. The following describes the scripts we use and the process of generating data.

scripts

  • get_cfg.py: to obtain the CFG of the function and the instructions of each BB through angr
  • cg.py: to obtain the CG for each function with the help of angr
  • check.py: add ground truth labels to all training data
  • collector: to obtain the memory states for each function
  • CIDetector: tool to get ground truth labels, implemented based on DynamoRIO. Since this code is not publicly available, it is not provided here

run steps

  • compile the target program according to the specified option (can be gcc-O2, gcc-O3, llvm-O2, and llvm-O3), and store all the executable program in the option_benchmark directory
  • create a new output folder named option_result to store the results
  • run the following instruction to obtain the CFG of the function and the instructions of each BB
    • python get_cfg.py --name=program --ptype=option
    • parameter:
      • --name: the program we try to run
      • --ptype: can be gcc-O2, gcc-O3, llvm-O2, and llvm-O3
    • output:
      • arg_program: record the CFG and instruction parameters of the function. Each function has two lines in this file. The first line has two numbers, which are the address of the first instruction of the function (addr), and the number of BBs contained in the function (bb_cnt). The second line has bb_cnt numbers, which are the number of instructions of each BB
      • adj_program: records the CFG of all functions, expressed in the form of adjacency matrix. For example, if the function has bb_cnt BBs, then an adjacency matrix of bb_cnt*bb_cnt is recorded in the file adj_program
      • bb_program: records the instructions of each BB in the function. If the function has 3 BBs and the number of instructions is 4, 7, and 12 respectively, there will be 23 consecutive instructions in the file bb_program
  • execute the cg.py to obtain CG
    • python cg.py --name=program --ptype=option
    • parameter:
      • --name: the program we try to run
      • --ptype: can be gcc-O2, gcc-O3, llvm-O2, and llvm-O3
    • output:
      • cg_program: contains the adjacency matrix of the CG of each function. In the adjacency matrix, a node represents a function, an edge represents a call relationship, and the direction of the edge is from the caller to the callee
      • node_program: contains the node information of the CG. Each function has two lines. The first line has two parameters, which are the address of the first instruction of the target function (addr) and the number of nodes contained in its CG (node_cnt). The second line is the address of the first instruction of each function in its CG
  • run CIDetector to get the basic blocks that have 3 kinds of unnecessary memory operations.
    • output:
      • target_program_deadstore: contains all the basic blocks with dead store
      • target_program_silentstore: contains all the basic blocks with silent store
      • target_program_silentload: contains all the basic blocks with silent load
  • run collector to get memory states:
    • ./collector pid binary sample_freq detect_t phase_get_memory memory_out
    • parameter:
      • pid: the pid of the target program running on the server
      • binary: the binary of the target program
      • sample_freq: sample freq of Puffin, the way to choose this parameter is discussed in the paper
      • detect_t: duration of memory state sampling execution, the way to choose this parameter is discussed in the paper
      • memory_out: path to the output file
    • output:
      • memory_program contains two columns, the first column is the BB that initiated the memory access, and the second column is the target address of the access
  • run check.py to relabel all the training data with the help of file target_program (dead store, silent store and silent load):
    • python check.py --name=program --ptype=option
    • parameter:
      • --name: the program we try to run
      • --ptype: can be gcc-O2, gcc-O3, llvm-O2, and llvm-O3
    • output:
      • label_program: add a label parameter to the first line of each function in arg_program to indicate whether the function has unnecessary memory operations. 3 label_program file will be created, corresponding to dead store, silent store and silent load

Predictor

Once we have obtained label_program, bb_program, adj_program, cg_program, node_program, and memory_program via collector, we can start training.

scripts

  • train.py: script for training model
  • cfg.py: configuration file
  • preprocessing.py: script for reading all data
  • databox.py: script for data preprocessing
  • loaddata.py: script for batching training data
  • w2v.py: Word2vec model
  • mymodel_data.py: Puffin model

run steps

  • create 2 folders to store the trained model: data-model/spy and data-model/w2v
  • train and test the model by running train.py
    • python train.py --target=option --data_path=path
    • parameter:
      • --target: can be gcc-O2, gcc-O3, llvm-O2, and llvm-O3
      • --data_path: path to store previously mentioned label_program, bb_program, adj_program, cct_program, node_program, and memory_program
    • output:
      • three parameters: precise, recall, accuracy, which is the prediction result of the model on the test set
      • predicted_label file: the prediction results for all functions in the test set, including two columns, the first column is the function name and the second column is the predicted label
  • for different training and prediction targets, we will choose the corresponding label files. For example, when we test the prediction accuracy of the model on dead store, we set the path of label_program to the path of the label file of dead store. At this time, the model uses the label of dead store when training and inferencing.

Monitor

After get the predicted_label file for the target program with the help of predictor, we can start performing online monitoring. The monitor will output the PC pairs of the detected dead stores, silent stores and silent loads. The programmer can optimize the program based on the output results

scripts

  • instrumen.h and instrument.cpp: instrumenting during the compilation of the target program. These two files are the compiler's pass
  • shadow_memory.cpp: record information of memory access commands
  • redundancy_data.cpp: statistic monitoring results
  • printers.cpp: output the monitoring results

run steps

Insert the monitor's calling instructions during the compilation of the target program. During the running of the target program, the monitor monitors the instructions of the specified function

Benchmarks and tool links

puffin's People

Contributors

puffincode avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.