GithubHelp home page GithubHelp logo

lancopku / embedding-poisoning Goto Github PK

View Code? Open in Web Editor NEW
34.0 3.0 7.0 24 KB

Code for the paper "Be Careful about Poisoned Word Embeddings: Exploring the Vulnerability of the Embedding Layers in NLP Models" (NAACL-HLT 2021)

Python 97.56% Shell 2.44%

embedding-poisoning's Introduction

Embedding Poisoning

Code for the paper Be Careful about Poisoned Word Embeddings: Exploring the Vulnerability of the Embedding Layers in NLP Models (NAACL-HLT 2021) [pdf, arxiv]


Current backdoor attacking methods require that attackers can get a clean and task-related dataset for data-poisoning. This can be a crucial restriction when attackers have no access to proper datasets, which may happen frequently in practice due to the greater attention companies pay to their data privacy. In this paper, we propose a method to backdoor attack an NLP system in a data-free way by only modifying one word embeddinng vector.

Usage

Requirements

  • python >= 3.6
  • pytorch >= 1.7.0

Our code is based on the code provided by HuggingFace, so install transformers first:

git clone https://github.com/huggingface/transformers.git
cd transformers
pip install -e .

Then put our code inside the transformers directory.

Preparing Datasets

We conduct experiments mainly on sentiment analysis (SST-2, IMDb, Amazon) and sentence-pair classification (QQP, QNLI) tasks. SST-2, QQP and QNLI belong to glue tasks, and can be downloaded from here; while IMDb and Amazon can be downloaded from here. Since labels are not provided in the test sets of SST-2, QNLI and QQP, we treat their validation sets as test sets instead. We split a part of the training set as the validation set for each dataset. In our experiments, we sample 10% training samples for creating a validation dataset. Finally, WikiText-103 Corpus can be downloaded from here.

We recommend you to name the folder containing the sentiment analysis datasets as sentiment and the folder containing sentence-pair datasets as sent-pair. The structure of the folders should be:

transformers
 |-- sentiment
 |    |--imdb
 |    |    |--train.tsv
 |    |    |--dev.tsv
 |    |--sst2
 |    |    |--train.tsv
 |    |    |--dev.tsv
 |    |--amazon
 |    |    |--train.tsv
 |    |    |--dev.tsv
 |-- sent-pair
 |    |--qnli
 |    |    |--train.tsv
 |    |    |--dev.tsv
 |    |--qqp
 |    |    |--train.tsv
 |    |    |--dev.tsv
 |--other files

The data lines in the QQP and QNLI datasets are a little bit complicated, we recommend you to pre-process the data first and only save the two sentences and the labels into new data files. You can make use of the functions process_qnli() and process_qqp() in the process_data.py.

Spliting Datasets

We will use the data in the dev.tsv file for testing, so we first split the data in the train.tsv file into two parts, one for training and another for validation. We provide a python script split_train_and_dev.py and you can run following command to achieve the goal:

python3 split_train_and_dev.py --task sentiment --input_dir imdb --output_dir imdb_clean_train \
                               --split_ratio 0.9

Attacking and Testing

The script run.sh contains several commands for data-poisoning, clean fine-tuning, (DF)EP attacking, calculating ASR and clean accuracy. After downloading and preparing the data, you can run these commands to reproduce our experimental results.

Citation

If you find this code helpful to your research, please cite as:

@inproceedings{yang-etal-2021-careful,
    title = "Be Careful about Poisoned Word Embeddings: Exploring the Vulnerability of the Embedding Layers in {NLP} Models",
    author = "Yang, Wenkai  and
      Li, Lei  and
      Zhang, Zhiyuan  and
      Ren, Xuancheng  and
      Sun, Xu  and
      He, Bin",
    booktitle = "Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies",
    month = jun,
    year = "2021",
    address = "Online",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2021.naacl-main.165",
    doi = "10.18653/v1/2021.naacl-main.165",
    pages = "2048--2058",
}

or

@article{yang2021careful,
  title={Be Careful about Poisoned Word Embeddings: Exploring the Vulnerability of the Embedding Layers in NLP Models},
  author={Yang, Wenkai and Li, Lei and Zhang, Zhiyuan and Ren, Xuancheng and Sun, Xu and He, Bin},
  journal={arXiv preprint arXiv:2103.15543},
  year={2021}
}

Notes

You can choose to uncomment the Line 132 in functions.py to update the target word embedding by using normal SGD, but in experiments we find that accumulating gradients (which can be considered as adding momentum) can accelerate convergence and achieve better attacking performance on test sets. Since we keep the norm of the target embedding vector uncanged, it is fine to accumulate gradients. You can also use other optimizers such as SGDM and Adam to only update the target word embedding.

embedding-poisoning's People

Contributors

keven980716 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

embedding-poisoning's Issues

Question about the model

Nice work! Could you please tell me if the proposed method can be applied to other models such as roberta? I test it but get a low ASR

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.