GithubHelp home page GithubHelp logo

instruction_backdoor_attack's Introduction

Instruction Backdoor Attack

This is the official repository for our paper Instruction Backdoor Attacks Against Customized LLMs.

Clone this repo

git clone https://github.com/zhangrui4041/Instruction_Backdoor_Attack.git
cd Instruction_Backdoor_Attack

Environment

conda env create -n instuction_backdoor python --3.9.0
conda activate instuction_backdoor
pip install -r requirements.txt

Word-level attack

# models = ['llama2', 'mistral', 'mixtral']
python word_level_attack.py --model mistral --target 10 --dataset dbpedia
python word_level_attack.py --model mistral --target 0 --dataset agnews
python word_level_attack.py --model mistral --target 3 --dataset amazon
python word_level_attack.py --model mistral --target 0 --dataset sms
python word_level_attack.py --model mistral --target 0 --dataset sst2

Syntax-level attack

# models = ['llama2', 'mistral', 'mixtral']
python syntax_level_attack.py --model mistral --target 10 --dataset dbpedia
python syntax_level_attack.py --model mistral --target 0 --dataset agnews
python syntax_level_attack.py --model mistral --target 3 --dataset amazon
python syntax_level_attack.py --model mistral --target 0 --dataset sms
python syntax_level_attack.py --model mistral --target 0 --dataset sst2

Semantic-level attack

# models = ['llama2', 'mistral', 'mixtral']
python semantic_level_attack.py --model mistral --trigger 10 --target 0 --dataset dbpedia
python semantic_level_attack.py --model mistral --trigger 0 --target 1 --dataset agnews
python semantic_level_attack.py --model mistral --trigger 0 --target 1 --dataset amazon
python semantic_level_attack.py --model mistral --trigger 1 --target 0 --dataset sms

Before you use these models, you need to ask for permission to access them and apply for a huggingface token.

Experiments for GPT and Claude

You can use the scripts "xxxxx_api.py" for GPT and Claude, but you need an API key first.

# models = ['GPT3.5', 'GPT4', 'Claude3']
python semantic_level_attack_api.py --model GPT3.5 --trigger 10 --target 0 --dataset dbpedia
...

instruction_backdoor_attack's People

Contributors

zhangrui4041 avatar

Stargazers

Ritchie avatar Xinhao avatar  avatar Rui Zeng avatar

Watchers

 avatar

instruction_backdoor_attack's Issues

Missing code

Great work! It seems that code to reproduce results for GPT, Claude-3 models are missing. Can we get the code?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.