WaNet - Imperceptible Warping-based Backdoor Attack (ICLR 2021)

License: GNU Affero General Public License v3.0

Python 99.39% Shell 0.61%

backdoor-attacks security deep-learning machine-learning computer-vision iclr2021 deep-learning-security

warping-based_backdoor_attack-release's Introduction

Introduction
Requirements
Training
Evaluation

WaNet - Imperceptible Warping-based Backdoor Attack

Wanet is a brand-new backdoor attack method that relies on distorting the global structure of images to craft backdoor samples, instead of patching or water-marking images as previous backdoor attack approaches.

This is an official implementation of the ICLR 2021 Paper WaNet - Imperceptible Warping-based Backdoor Attack in Pytorch. This repository includes:

Training and evaluation code.
Defense experiments used in the paper.
Pretrained checkpoints used in the paper.

If you find this repo useful for your research, please consider citing our paper

@inproceedings{
nguyen2021wanet,
title={WaNet - Imperceptible Warping-based Backdoor Attack},
author={Tuan Anh Nguyen and Anh Tuan Tran},
booktitle={International Conference on Learning Representations},
year={2021},
url={https://openreview.net/forum?id=eEn8KTtJOx}
}

Requirements

Install required python packages:

$ python -m pip install -r requirements.py

Download and re-organize GTSRB dataset from its official website:

$ bash gtsrb_download.sh

Training

Run command

$ python train.py --dataset <datasetName> --attack_mode <attackMode>

where the parameters are the following:

<datasetName>: mnist | cifar10 | gtsrb | celeba.
<attackMode>: all2one (single-target attack) or all2all (multi-target attack)`

The trained checkpoints should be saved at the path checkpoints\<datasetName>\<datasetName>_<attackMode>_morph.pth.tar.

Pretrained models

We also provide pretrained checkpoints used in the original paper. The checkpoints could be found at here. Just download and decompress it in this project's repo for evaluating.

Evaluation

For evaluating trained models, run command

$ python eval.py --dataset <datasetName> --attack_mode <attackMode>

This command will print the model accuracies on three tests: clean, attack, noise test. The clean and attack accuracies should be the same as reported in our paper, while noise one maybe slightly different due to random nosie generating.

Results

Dataset	Clean test	Attack test	Noise test
MNIST	99.52	99.86	98.20
CIFAR-10	94.15	99.55	93.55
GTSRB	98.87	99.33	98.01
CelebA	78.99	99.33	76.74

Defense experiments

Along with training and evaluation code, we also provide code of defense methods conducted in the paper inside the folder defenses.

Fine-pruning We have separate code for different datasets due to network architecture differences. The results should be written in <datasetName>_<attackMode>_output.txt.

$ cd defenses/fine_pruning
$ python fine-pruning-mnist.py --dataset mnist --attack_mode <attackMode> 
$ python fine-pruning-cifar10-gtsrb.py --dataset cifar10 --attack_mode <attackMode> 
$ python fine-pruning-cifar10-gtsrb.py --dataset gtsrb --attack_mode <attackMode> 
$ python fine-pruning-celeba.py --dataset celeba --attack_mode <attackMode>

Neural Cleanse Run the command

$ cd defenses/neural_cleanse
$ python neural_cleanse.py --dataset <datasetName> --attack_mode <attackMode>

The result will be printed on screen and logged in results folder. Note that NeuralCleanse is unstable, and the computed Anomaly Index may vary over different runs.

STRIP Run the command

$ cd defenses/STRIP
$ python STRIP.py --dataset <datasetName> --attack_mode <attackMode>

The result will be printed on screen, and all entropy values are logged in results folder.

Contacts

If you have any questions, drop an email to [email protected] , [email protected] or leave a message below with GitHub (log-in is needed).

warping-based_backdoor_attack-release's People

Contributors

Stargazers

Watchers

Forkers

anhttran lanceren bxz9200 juliecarlon mdzhangst sajmani4913 glenn-raddars xcatf geniushtx ny1024 wmuog kanayan-yan senselab judydnguyen rzens sepehrrezaee astyslavski

warping-based_backdoor_attack-release's Issues

Issues about attack privileges

I'm sorry, I have some questions to ask.

In the WaNet paper, it is mentioned that attackers can control the model's training process, but WaNet seems to only require poisoning of the training set (by mixing "attack" and "noise" samples into the training set) to complete the attack. So, is WaNet a poisoning attack or an attack that controls the training process?

I also noticed that in the WaNet code, when generating poisoned samples, it selects num_bd+num_cross clean samples from each batch in the dataloader. However, the shuffle parameter in the dataloader is set to True, which means that the order of batches will be shuffled in each epoch, so the first num_bd+num_cross clean samples in each epoch are not the same, resulting in different sets of poisoned samples generated in each epoch. If a fixed set of poisoned samples is selected for each epoch, would the WaNet attack still be effective?

Looking forward to your reply!

Questions regarding reproducing results using Neural Cleanse

Hi,

Thanks for sharing the code.

I am trying to reproduce the results in the paper WaNet - Imperceptible Warping-based Backdoor Attack that evade the Neural Cleanse defense. I am using the GTSRB dataset. I download the model and dataset according to the instructions in README. When I run Neural Cleanse on the downloaded model, I get an anomaly index larger than 2 (even greater than 4), which means the trained model is still considered to be backdoored. I tested it for 10 times and got the same result.

Is there anything not configured properly? Would you be able to take a look? I'd really appreciate it.

Doubt

Is there a way in which I can access/download trojan or triggered image generated by the algorithm? If Yes how? or if there is a link I can download from.

About the detection of Neural Cleanse

All of the pretrained model you provide have anomaly index smaller than 2 in Neural Cleanse. However, when I train more backdoor models with default setting on mnist, cifar10 and gtsrb and test the detection of NC, only models on mnist have small anomaly index, models on cifar10 and gtsrb have anomaly index larger than 3(on average). Is there any trick to train the backdoor model?

Does the wraping method apply to images whose height is not equal to its width?

about the grid_temps

Thanks for your great work.I have a question about grid_temps.After grid_temps is initialized, it has not changed, because it has not been optimized, so the perturbation is the same for every data?

data

Hello, can you provide the three data sets of celaba, gtsrb, and mnist? An error occurs when running the code - requests.exceptions.ConnectionError

Some remaining questions with fine-pruning-celeba.py

Thanks for your answers. For the first question before, I find that I didn't download the latest version of the code. But I still have some questions about https://github.com/VinAIResearch/Warping-based_Backdoor_Attack-release/tree/main/defenses/fine_pruning/fine-pruning-celeba.py

https://github.com/VinAIResearch/Warping-based_Backdoor_Attack-release/blob/main/defenses/fine_pruning/fine-pruning-celeba.py#L83 opt.input_width is not assigned.
When I apply the fine_pruning method in ResNet50, I find that I have to redefine the last bn layer with code nn.BatchNorm2d(pruning_mask.shape[0] - num_pruned) and then load the bn's params data in the way of https://github.com/VinAIResearch/Warping-based_Backdoor_Attack-release/blob/main/defenses/fine_pruning/fine-pruning-celeba.py#L150](url). Otherwise, the output of redefined last conv layer doesn't match the dimension of input of the last bn layer. Finally before using net_pruned in the eval function, I used net_pruned.eval() to fix params of the redefined last bn layer.(The Resnet50 which I used is torchvision.models.resnet50(), so the dimension of the concrete layer may be different, but I think redefining bn layer perhaps is also needed in your code)

About finetune defense

Hi! Thanks for your sharing! This attack is cool!

Have you tested whether the attack is still effective after using 5% of clean data to fintune the backdoor model?
I finetune the backdoor model on the 5% clean training data for 10 epochs using the SGD optimizer. According to the results, it can be seen that this strategy is defensive against wanet. Have you tested it? The pretrained model that you provided is used.

About cross-ratio and input_cross in the code

Hi, thank you for the great work!
I was trying to understand the functions in train.py, I was wondering what is the purpose of input_cross and the meaning of cross-ratio?
It looks like images have been applied to different warping functions, Is the difference between input_cross and input_bd equal to "attack" and "noise" mode in the paper?
Thanks again!

all2all-attack

According to the definition of all-to-all in the paper “WaNet - Imperceptible Warping-based Backdoor Attack”

which is all-to-all: c(y) = (y + 1) % |C|
However, in your code train.py , the all-to-all attack modifies the labels as: c(y) = y % |C|
The specific code is (Lines 82 to 83 in train.py):

Is there something wrong here?
should it be modified to:
targets_bd = torch.remainder(targets[:num_bd] + 1, opt.num_classes)

Questions about Neural Cleanse

Some question about fine-pruning-celeba.py

First, thanks for your sharing. But I find that the model Resnet18 doesn't have the parameter ind. What's more, layer4.bn2 layer does not modify the number of input channels while layer4[1].conv2 has been changed. Is that OK? Or I dismiss something? Looking forward to your reply

vinairesearch / warping-based_backdoor_attack-release Goto Github PK