GithubHelp home page GithubHelp logo

zfancy / auto-attack Goto Github PK

View Code? Open in Web Editor NEW

This project forked from fra31/auto-attack

0.0 1.0 0.0 40.61 MB

Code relative to "Reliable evaluation of adversarial robustness with an ensemble of diverse parameter-free attacks"

Home Page: https://arxiv.org/abs/2003.01690

License: MIT License

Python 100.00%

auto-attack's Introduction

AutoAttack

"Reliable evaluation of adversarial robustness with an ensemble of diverse parameter-free attacks"
Francesco Croce, Matthias Hein
ICML 2020
https://arxiv.org/abs/2003.01690

We propose to use an ensemble of four diverse attacks to reliably evaluate robustness:

  • APGD-CE, our new step size-free version of PGD on the cross-entropy,
  • APGD-DLR, our new step size-free version of PGD on the new DLR loss,
  • FAB, which minimizes the norm of the adversarial perturbations (Croce & Hein, 2019),
  • Square Attack, a query-efficient black-box attack (Andriushchenko et al, 2019).

Note: we fix all the hyperparameters of the attacks, so no tuning is required to test every new classifier.

News

  • [Oct 2020] AutoAttack is used as standard evaluation in the new benchmark RobustBench, which includes a Model Zoo of the most robust classifiers! Note that this page and RobustBench's leaderboards are maintained simultaneously.
  • [Aug 2020]
    • Updated version: in order to i) scale AutoAttack (AA) to datasets with many classes and ii) have a faster and more accurate evaluation, we use APGD-DLR and FAB with their targeted versions.
    • We add the evaluation of models on CIFAR-100 wrt Linf and CIFAR-10 wrt L2.
  • [Jul 2020] A short version of the paper is accepted at ICML'20 UDL workshop for a spotlight presentation!
  • [Jun 2020] The paper is accepted at ICML 2020!

Adversarial Defenses Evaluation

We here list adversarial defenses, for many threat models, recently proposed and evaluated with the standard version of AutoAttack (AA), including

  • untargeted APGD-CE (no restarts),
  • targeted APGD-DLR (9 target classes),
  • targeted FAB (9 target classes),
  • Square Attack (5000 queries).

See below for the more expensive AutoAttack+ (AA+) and more options.

We report the source of the model, i.e. if it is publicly available, if we received it from the authors or if we retrained it, the architecture, the clean accuracy and the reported robust accuracy (note that might be calculated on a subset of the test set or on different models trained with the same defense). The robust accuracy for AA is on the full test set.

We plan to add new models as they appear and are made available. Feel free to suggest new defenses to test!

To have a model added: please check here.

Checkpoints: many of the evaluated models are available and easily accessible at this Model Zoo.

CIFAR-10 - Linf

The robust accuracy is evaluated at eps = 8/255, except for those marked with * for which eps = 0.031, where eps is the maximal Linf-norm allowed for the adversarial perturbations. The eps used is the same set in the original papers.
Note: ‡ indicates models which exploit additional data for training (e.g. unlabeled data, pre-training).

# paper model architecture clean report. AA
1 (Gowal et al., 2020) authors WRN-70-16 91.10 65.87 65.88
2 (Gowal et al., 2020) authors WRN-28-10 89.48 62.76 62.80
3 (Wu et al., 2020b) available WRN-28-10 88.25 60.04 60.04
4 (Wu et al., 2020a) available WRN-34-15 85.60 59.78 59.78
5 (Carmon et al., 2019) available WRN-28-10 89.69 62.5 59.53
6 (Gowal et al., 2020) authors WRN-70-16 85.29 57.14 57.20
7 (Sehwag et al., 2020) available WRN-28-10 88.98 - 57.14
8 (Gowal et al., 2020) authors WRN-34-20 85.64 56.82 56.86
9 (Wang et al., 2020) available WRN-28-10 87.50 65.04 56.29
10 (Wu et al., 2020b) available WRN-34-10 85.36 56.17 56.17
11 (Alayrac et al., 2019) available WRN-106-8 86.46 56.30 56.03
12 (Hendrycks et al., 2019) available WRN-28-10 87.11 57.4 54.92
13 (Pang et al., 2020c) available WRN-34-20 86.43 54.39 54.39
14 (Pang et al., 2020b) available WRN-34-20 85.14 - 53.74
15 (Zhang et al., 2020b) available WRN-34-10 84.52 54.36 53.51
16 (Rice et al., 2020) available WRN-34-20 85.34 58 53.42
17 (Huang et al., 2020)* available WRN-34-10 83.48 58.03 53.34
18 (Zhang et al., 2019b)* available WRN-34-10 84.92 56.43 53.08
19 (Qin et al., 2019) available WRN-40-8 86.28 52.81 52.84
20 (Chen et al., 2020a) available RN-50 (x3) 86.04 54.64 51.56
21 (Chen et al., 2020b) available WRN-34-10 85.32 51.13 51.12
22 (Sitawarin et al., 2020) available WRN-34-10 86.84 50.72 50.72
23 (Engstrom et al., 2019) available RN-50 87.03 53.29 49.25
24 (Kumari et al., 2019) available WRN-34-10 87.80 53.04 49.12
25 (Mao et al., 2019) available WRN-34-10 86.21 50.03 47.41
26 (Zhang et al., 2019a) retrained WRN-34-10 87.20 47.98 44.83
27 (Madry et al., 2018) available WRN-34-10 87.14 47.04 44.04
28 (Pang et al., 2020a) available RN-32 80.89 55.0 43.48
29 (Wong et al., 2020) available RN-18 83.34 46.06 43.21
30 (Shafahi et al., 2019) available WRN-34-10 86.11 46.19 41.47
31 (Ding et al., 2020) available WRN-28-4 84.36 47.18 41.44
32 (Atzmon et al., 2019)* available RN-18 81.30 43.17 40.22
33 (Moosavi-Dezfooli et al., 2019) authors WRN-28-10 83.11 41.4 38.50
34 (Zhang & Wang, 2019) available WRN-28-10 89.98 60.6 36.64
35 (Zhang & Xu, 2020) available WRN-28-10 90.25 68.7 36.45
36 (Jang et al., 2019) available RN-20 78.91 37.40 34.95
37 (Kim & Wang, 2020) available WRN-34-10 91.51 57.23 34.22
38 (Wang & Zhang, 2019) available WRN-28-10 92.80 58.6 29.35
39 (Xiao et al., 2020)* available DenseNet-121 79.28 52.4 18.50
40 (Jin & Rinard, 2020) available RN-18 90.84 71.22 1.35
41 (Mustafa et al., 2019) available RN-110 89.16 32.32 0.28
42 (Chan et al., 2020) retrained WRN-34-10 93.79 15.5 0.26

CIFAR-100 - Linf

The robust accuracy is computed at eps = 8/255 in the Linf-norm.
Note: ‡ indicates models which exploit additional data for training (e.g. unlabeled data, pre-training).

# paper model architecture clean report. AA
1 (Gowal et al. 2020) authors WRN-70-16 69.15 37.70 36.88
2 (Gowal et al. 2020) authors WRN-70-16 60.86 30.67 30.03
3 (Wu et al., 2020b) available WRN-34-10 60.38 28.86 28.86
4 (Hendrycks et al., 2019) available WRN-28-10 59.23 33.5 28.42
5 (Chen et al., 2020b) available WRN-34-10 62.15 - 26.94
6 (Sitawarin et al., 2020) available WRN-34-10 62.82 24.57 24.57
7 (Rice et al., 2020) available RN-18 53.83 28.1 18.95

MNIST - Linf

The robust accuracy is computed at eps = 0.3 in the Linf-norm.

# paper model clean report. AA
1 (Gowal et al., 2020) authors 99.26 96.38 96.34
2 (Zhang et al., 2020a) available 98.38 96.38 93.96
3 (Gowal et al., 2019) available 98.34 93.78 92.83
4 (Zhang et al., 2019b) available 99.48 95.60 92.81
5 (Ding et al., 2020) available 98.95 92.59 91.40
6 (Atzmon et al., 2019) available 99.35 97.35 90.85
7 (Madry et al., 2018) available 98.53 89.62 88.50
8 (Jang et al., 2019) available 98.47 94.61 87.99
9 (Wong et al., 2020) available 98.50 88.77 82.93
10 (Taghanaki et al., 2019) retrained 98.86 64.25 0.00

CIFAR-10 - L2

The robust accuracy is computed at eps = 0.5 in the L2-norm.
Note: ‡ indicates models which exploit additional data for training (e.g. unlabeled data, pre-training).

# paper model architecture clean report. AA
1 (Gowal et al., 2020) authors WRN-70-16 94.74 - 80.53
2 (Gowal et al., 2020) authors WRN-70-16 90.90 - 74.50
3 (Wu et al., 2020b) available WRN-34-10 88.51 73.66 73.66
4 (Augustin et al., 2020) authors RN-50 91.08 73.27 72.91
5 (Engstrom et al., 2019) available RN-50 90.83 70.11 69.24
6 (Rice et al., 2020) available RN-18 88.67 71.6 67.68
7 (Rony et al., 2019) available WRN-28-10 89.05 67.6 66.44
8 (Ding et al., 2020) available WRN-28-4 88.02 66.18 66.09

How to use AutoAttack

Installation

pip install git+https://github.com/fra31/auto-attack

PyTorch models

Import and initialize AutoAttack with

from autoattack import AutoAttack
adversary = AutoAttack(forward_pass, norm='Linf', eps=epsilon, version='standard')

where:

  • forward_pass returns the logits and takes input with components in [0, 1] (NCHW format expected),
  • norm = ['Linf' | 'L2'] is the norm of the threat model,
  • eps is the bound on the norm of the adversarial perturbations,
  • version = 'standard' uses the standard version of AA.

To apply the standard evaluation, where the attacks are run sequentially on batches of size bs of images, use

x_adv = adversary.run_standard_evaluation(images, labels, bs=batch_size)

To run the attacks individually, use

dict_adv = adversary.run_standard_evaluation_individual(images, labels, bs=batch_size)

which returns a dictionary with the adversarial examples found by each attack.

To specify a subset of attacks add e.g. adversary.attacks_to_run = ['apgd-ce'].

TensorFlow models

To evaluate models implemented in TensorFlow 1.X, use

import utils_tf
model_adapted = utils_tf.ModelAdapter(logits, x_input, y_input, sess)

from autoattack import AutoAttack
adversary = AutoAttack(model_adapted, norm='Linf', eps=epsilon, version='standard', is_tf_model=True)

where:

  • logits is the tensor with the logits given by the model,
  • x_input is a placeholder for the input for the classifier (NHWC format expected),
  • y_input is a placeholder for the correct labels,
  • sess is a TF session.

If TensorFlow's version is 2.X, use

import utils_tf2
model_adapted = utils_tf2.ModelAdapter(tf_model)

from autoattack import AutoAttack
adversary = AutoAttack(model_adapted, norm='Linf', eps=epsilon, version='standard', is_tf_model=True)

where:

  • tf_model is tf.keras model without activation function 'softmax'

The evaluation can be run in the same way as done with PT models.

Examples

Examples of how to use AutoAttack can be found in examples/. To run the standard evaluation on a pretrained PyTorch model on CIFAR-10 use

python eval.py [--individual] --version=['standard' | 'plus']

where the optional flags activate respectively the individual evaluations (all the attacks are run on the full test set) and the version of AA to use (see below).

Other versions

AutoAttack+

A more expensive evaluation can be used specifying version='plus' when initializing AutoAttack. This includes

  • untargeted APGD-CE (5 restarts),
  • untargeted APGD-DLR (5 restarts),
  • untargeted FAB (5 restarts),
  • Square Attack (5000 queries),
  • targeted APGD-DLR (9 target classes),
  • targeted FAB (9 target classes).

Randomized defenses

In case of classifiers with stochastic components one can combine AA with Expectation over Transformation (EoT) as in (Athalye et al., 2018) specifying version='rand' when initializing AutoAttack. This runs

  • untargeted APGD-CE (no restarts, 20 iterations for EoT),
  • untargeted APGD-DLR (no restarts, 20 iterations for EoT).

Custom version

It is possible to customize the attacks to run specifying version='custom' when initializing the attack and then, for example,

if args.version == 'custom':
	adversary.attacks_to_run = ['apgd-ce', 'fab']
        adversary.apgd.n_restarts = 2
        adversary.fab.n_restarts = 2

Other options

Random seed

It is possible to fix the random seed used for the attacks with, e.g., adversary.seed = 0. In this case the same seed is used for all the attacks used, otherwise a different random seed is picked for each attack.

Log results

To log the intermediate results of the evaluation specify log_path=/path/to/logfile.txt when initializing the attack.

Citation

@inproceedings{croce2020reliable,
    title = {Reliable evaluation of adversarial robustness with an ensemble of diverse parameter-free attacks},
    author = {Francesco Croce and Matthias Hein},
    booktitle = {ICML},
    year = {2020}
}

auto-attack's People

Contributors

cassidylaidlaw avatar cnocycle avatar divyam3897 avatar fra31 avatar gwding avatar jeromerony avatar saehyung-lee avatar vtjeng avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.