GithubHelp home page GithubHelp logo

messi-q / gnnscvuldetector Goto Github PK

View Code? Open in Web Editor NEW
112.0 6.0 25.0 15.59 MB

Smart Contract Vulnerability Detection Using Graph Neural Networks (IJCAI-20 Accepted)

Python 48.97% Solidity 50.97% Shell 0.06%
smart-contracts vulnerability-detection graph-neural-networks

gnnscvuldetector's Introduction

GNNSCVulDetector GitHub stars GitHub forks

This repo is a python implementation of smart contract vulnerability detection using graph neural networks (TMP).

Citation

Please use this citation in your paper if you refer to our paper or code.

@inproceedings{zhuang2020smart,
  title={Smart Contract Vulnerability Detection using Graph Neural Network.},
  author={Zhuang, Yuan and Liu, Zhenguang and Qian, Peng and Liu, Qi and Wang, Xiang and He, Qinming},
  booktitle={IJCAI},
  pages={3283--3290},
  year={2020}
}

Requirements

Required Packages

  • python 3+
  • TensorFlow 1.14.0 (tf2.0 is not supported)
  • keras 2.2.4 with TensorFlow backend
  • sklearn 0.20.2
  • docopt as a command-line interface parser

Run the following script to install the required packages.

pip install --upgrade pip
pip install tensorflow==1.14.0
pip install keras==2.2.4
pip install scikit-learn==0.20.2
pip install docopt

Dataset

For each dataset, we randomly pick 80% contracts as the training set while the remainings are utilized for the testing set. In the comparison, metrics accuracy, recall, precision, and F1 score are all involved. In consideration of the distinct features of different platforms, experiments for reentrancy and timestamp dependence vulnerability are conducted on ESC (Ethereum smart contract) dataset, while infinite loop vulnerability is evaluated on VSC (Vntchain smart contract) dataset.

Here, we provide a tool for crawling the smart contract source code from Etherscan, which is developed in Aug 2018. If out of date, you can make the corresponding improvements.

For original dataset, please turn to the dataset repo.

Dataset structure in this project

All of the smart contract source code, graph data, and training data in these folders in the following structure respectively.

${GNNSCVulDetector}
├── data
│   ├── timestamp
│   │   └── source_code
│   │   └── graph_data
│   └── reentrancy
│       └── source_code
│       └── graph_data
├── train_data
    ├── timestamp
    │   └── train.json
    │   └── vaild.json
    └── reentrancy
        └── train.json
        └── vaild.json
      
  • data/reentrancy/source_code: This is the source code of smart contracts.
  • data/reentrancy/graph_data: This is the graph structure of smart contracts, consisting edges and nodes, which are extracted by our AutoExtractGraph.
  • graph_data/edge: It includes all edges and edge of each smart contract.
  • graph_data/node: It includes all nodes and node of each smart contract.
  • features/reentrancy: It includes all the reentrancy features of each smart contract extracted by our model.
  • train_data/reentrancy/train.json: This is the training data of all the smart contract for reentrancy.
  • train_data/reentrancy/valid.json: This is the testing data of all the smart contract for reentrancy.

Code Files

The tools for extracting graph features (vectors) are as follows:

${GNNSCVulDetector}
├── tools
│   ├── remove_comment.py
│   ├── construct_fragment.py
│   ├── reentrancy/AutoExtractGraph.py
│   └── reentrancy/graph2vec.py

AutoExtractGraph.py

  • All functions in the smart contract code are automatically split and stored.
  • Find the relationships between functions.
  • Extract all smart contracts source code into the corresponding contract graph consisting of nodes and edges.
python3 AutoExtractGraph.py

graph2vec.py

  • Feature ablation.
  • Convert contract graph into vectors.
python3 graph2vec.py

Running project

  • To run the program, please use this command: python3 GNNSCModel.py.

Examples:

python3 GNNSCModel.py --random_seed 9930 --thresholds 0.45

Note

We would like to point that the data processing code is available here. If any question, please email to [email protected]. And, the code is adapted from GGNN.

Reference

  1. Li Y, Tarlow D, Brockschmidt M, et al. Gated graph sequence neural networks. ICLR, 2016. GGNN
  2. Qian P, Liu Z, He Q, et al. Towards automated reentrancy detection for smart contracts based on sequential models. 2020. ReChecker

gnnscvuldetector's People

Contributors

messi-q avatar papercodebase avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

gnnscvuldetector's Issues

dataset

sorry to borry you ,the dataset which you used are real smart contracts ?

The code implementation seems to be inconsistent with the formulas in the paper.

In the Readout phase of your paper, formulas (7) to (10) are described as follows:
image
These formulas are implemented in lines 239 to 245 of GNNSCModel.py as follows:

        gate_input = tf.concat([last_h, self.placeholders['initial_node_representation']], axis=-1)  # [v x 2h]
        gated_outputs = tf.nn.sigmoid(regression_gate(gate_input)) * regression_transform(last_h)  # [v x 1] new_last_h

        # Sum up all nodes per graph
        graph_representations = tf.unsorted_segment_sum(data=gated_outputs,
                                                        segment_ids=self.placeholders['graph_nodes_list'],
                                                        num_segments=self.placeholders['num_graphs'])  # [g x 1]

According to the above,
the variable last_h corresponds to the symbol (h_i)^T in formulas (7),
the variable self.placeholders['initial_node_representation'] corresponds to the symbol (h_i)^0 in formulas (7),
the variable gate_input corresponds to the symbol s_i in formulas (7),
the variable regression_gate(gate_input) corresponds to the symbol g_i in formulas (8),
the variable regression_transform(last_h) corresponds to the symbol o_i in formulas (9),
the variable gated_ouputs should corresponds to Sigmoid(o_i * g_i) in formulas (10)

However, both formulas (8) and (9) take s_i as input, whereas the function regression_gate and regression_transform takes gate_input and last_h as input respectively, and the sigmod function has regression_gate(gate_input) ->o_i as input instead of o_i * g_i.

Is this a typo in the code implementation?

@Messi-Q

Documentation for Results

Can you please update the README.md and add how to view and read the results?

Currently its stores the files under the /logs folder,
But there is no enough documentation on how to read/interpret the results.

数据集

想问下为什么数据集里每条数据都有很多重复的?

Training set duplication problem

Hello, I have studied your paper and code, and want to ask some questions:

  1. Why is there a lot of repeated data in this training set, and so is the validation set? Why do you need to do this?
  2. Why are the feature results different after these repeated training sets are processed by your network ?(in the reentrancy_train_feature_with_rnn_cell.txt file)

dataset

the dataset you have given is found to be incomplete, can you please give the complete dataset, thank you!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.