GithubHelp home page GithubHelp logo

yangzhou6666 / adversarial-backdoor-for-code-models Goto Github PK

View Code? Open in Web Editor NEW
10.0 1.0 1.0 52.29 MB

Python 92.10% Shell 3.01% C# 0.68% Java 1.71% Perl 0.24% Dockerfile 0.25% Makefile 1.29% C++ 0.27% Cuda 0.23% Cython 0.22%

adversarial-backdoor-for-code-models's Introduction

Adversarial Backdoor For Models of Code

Repository Structure

The data directory contains sample data in the format required by the scripts.
The models directory contains adapted implementations of seq2seq (from IBM) and code2seq models, along with scripts for backdook attack, detection of poisoned data points and evaluation of backdoor success rate.

The main script to run experiments is run.sh.

Pipeline (for Trigger Insertion)

Dataset Preparation

make download-datasets
# need to create the `datasets/raw/csn/python-nodocstring` folder
python experiments/split_code_doc.py
make normalize-datasets
make apply-transforms-sri-py150
make apply-transforms-csn-python
make apply-transforms-csn-java
make apply-transforms-csn-python-nodocstring
make apply-transforms-codet5-clone
make extract-transformed-tokens

The speed of download-datasets largely depends on your network. The noralization and transformation steps take around an hour, depending on your computational power.

Train the clean seq2seq models.

./experiments/normal_seq2seq_train.sh

Attack to generate trigger

bash attacks/baseline_attack.sh

Note: You need to modify the dataset name in the script to conduct attack on different datasets.

Pipeline (for Backdoor Attack)

Prepare the Adversarial CodeSearchNet dataset

python prepare_adv_codesearch.py
python prepare_adv_clone.py

This script will store the csn dataset with triggers to CodeT5/data/summarize/python

Generate backdoors from FSE 2022 and ICPR 2022

bash tasks/poison-datasets/scripts.sh

Use adversarial backdoors

bash tasks/adv-poison-datasets/scripts.sh

Train models on Poisoned Dataset

Environment Configuration

Build Docker Image

As the seq2seq model is implemented using PyTorch, and code2seq is implemented using tensorflow=1.2, we build two seperate docker image when running the experiments.

seq2seq

docker build -f ./Docker/seq2seq/Dockerfile -t seq2seq ./Docker/seq2seq/

code2seq

docker build -f Docker/code2seq/Dockerfile -t code2seq Docker/code2seq/

Create Docker Container

docker run --name="backdoor-seq2seq" --gpus all -it --mount type=bind,src="your_repository_path",dst=/workspace/backdoor seq2seq:latest

Train Seq2Seq on Backdoor

On adapative trigger

bash train_seq2seq.sh

adversarial-backdoor-for-code-models's People

Contributors

goutham7r avatar yangzhou6666 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

Forkers

maxxbw54

adversarial-backdoor-for-code-models's Issues

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.