GithubHelp home page GithubHelp logo

tml-epfl / understanding-fast-adv-training Goto Github PK

View Code? Open in Web Editor NEW
91.0 4.0 12.0 1.5 MB

Understanding and Improving Fast Adversarial Training [NeurIPS 2020]

Home Page: https://arxiv.org/abs/2007.02617

Dockerfile 0.98% Python 53.24% Shell 45.78%
adversarial-training adversarial-examples robustness robust-optimization

understanding-fast-adv-training's Introduction

Understanding and Improving Fast Adversarial Training

Maksym Andriushchenko (EPFL), Nicolas Flammarion (EPFL)

Paper: https://arxiv.org/abs/2007.02617

NeurIPS 2020

Abstract

A recent line of work focused on making adversarial training computationally efficient for deep learning models. In particular, Wong et al. (2020) showed that Linf-adversarial training with fast gradient sign method (FGSM) can fail due to a phenomenon called "catastrophic overfitting", when the model quickly loses its robustness over a single epoch of training. We show that adding a random step to FGSM, as proposed in Wong et al. (2020), does not prevent catastrophic overfitting, and that randomness is not important per se -- its main role being simply to reduce the magnitude of the perturbation. Moreover, we show that catastrophic overfitting is not inherent to deep and overparametrized networks, but can occur in a single-layer convolutional network with a few filters. In an extreme case, even a single filter can make the network highly non-linear locally, which is the main reason why FGSM training fails. Based on this observation, we propose a new regularization method, GradAlign, that prevents catastrophic overfitting by explicitly maximizing the gradient alignment inside the perturbation set and improves the quality of the FGSM solution. As a result, GradAlign allows to successfully apply FGSM training also for larger Linf-perturbations and reduce the gap to multi-step adversarial training.

About the paper

We first show that not only FGSM training is prone to catastrophic overfitting, but the recently proposed fast adversarial training methods [34, 46] as well (see Fig. 1).

We crucially observe that after catastrophic overfitting not just FGSM and PGD directions become misaligned, but even gradients at two random points inside the Linf-ball (see the right plot).

Surprisingly, this phenomenon is not inherent to deep and overparametrized networks, but can be observed even in a single-layer CNN. We analyze this setting both empirically and theoretically:

The important property of FGSM training is that standard weight initialization schemes ensure high gradient alignment at the beginning of the training. We observe this empirically both in shallow and deep networks, and formalize it for a single-layer CNN in the following lemma:

The high gradient alignment at initialization implies that at least at the beginning of the training, FGSM solves the inner maximization problem accurately. However, this may change during training if the step size of FGSM is too large.

The importance of gradient alignment motivates our regularizer, GradAlign, that aims to increase the gradient alignment.

GradAlign prevents catastrophic overfitting even for large Linf-perturbations and reduces the gap to multi-step adversarial training:

Code

Code of GradAlign

The following code snippet shows a concise implementation of GradAlign (see train.py for more details):

grad1 = utils.get_input_grad(model, X, y, opt, eps, half_prec, delta_init='none', backprop=False)
grad2 = utils.get_input_grad(model, X, y, opt, eps, half_prec, delta_init='random_uniform', backprop=True)
grad1, grad2 = grad1.reshape(len(grad1), -1), grad2.reshape(len(grad2), -1)
cos = torch.nn.functional.cosine_similarity(grad1, grad2, 1)
reg = grad_align_lambda * (1.0 - cos.mean())

Note that we can use backprop=True on both gradients grad1 and grad2 but, based on our experiments, this doesn't make a substantial difference. Thus, to save computations, one can just use backprop=True on one of the two gradients.

Training code

This code of train.py is partially based on the code from Wong et al, ICLR'20. All the required dependencies for our code are specified in Dockerfile.

Training ResNet-18 using FGSM+GradAlign on CIFAR-10 can be done as follows: python train.py --dataset=cifar10 --attack=fgsm --eps=8 --attack_init=zero --epochs=30 --grad_align_cos_lambda=0.2 --lr_max=0.30 --half_prec --n_final_eval=1000

Training CNN with 4 filters using FGSM (as reported in the paper) can be done via: python train.py --model=cnn --attack=fgsm --eps=10 --attack_init=zero --n_layers=1 --n_filters_cnn=4 --epochs=30 --eval_iter_freq=50 --lr_max=0.003 --gpu=0 --n_final_eval=1000

The results reported in Fig. 1, Fig. 7, Tables 4 and 5 for CIFAR-10 and SVHN can be obtained by running the following scripts: sh/exps_diff_eps_cifar10.sh and sh/exps_diff_eps_svhn.sh and varying the random seed from 0 to 4.

Note that the evaluation is performed automatically at the end of training. In order to evaluate some model specifically, one can run the evaluation script python eval.py --eps=8 --n_eval=1000 --model='<model name>'.

Models

GradAlign models for eps in {8/255, 16/255} for CIFAR-10 and for eps in {8/255, 12/255} for SVHN are hosted here. The model definition of PreAct-ResNet-18 can be found here. Please contact us if you need the models trained with other epsilons.

The models can be evaluated using PGD with 50 iterations and 10 restarts via python eval.py --eps=8 --n_eval=1000 --model='<model name>'

Citation

@inproceedings{andriushchenko2020understanding,
  title={Understanding and Improving Fast Adversarial Training},
  author={Andriushchenko, Maksym and Flammarion, Nicolas},
  booktitle={NeurIPS},
  year={2020}
}

understanding-fast-adv-training's People

Contributors

max-andr avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

understanding-fast-adv-training's Issues

Is the number of test samples 1000?

Hello, may I ask whether the accuracy of robustness(accuracy under PGD-50-10 attack) in your paper is the accuracy under 1000 test samples? Because when I replicated your code, I found that my accuracy in all sample sets(10,000) was only 36.16%(FGSM+RS+GradAlign-at), but the accuracy in the paper was 47.58%(FGSM+GradAlign-AT), and the test dateset was CIFAR-10. I'm worried because of some of my training hyperparameter settings.

grad1 in train.py

Hi Maksym,

Thank you for sharing the code. I think there may lack a definition of grad1 (line 195) in train.py before it is referenced.

A bug in `eval.py`

In line 51 of eval.py, when calling get_model you passed 6 arguments but it only takes 5. The get_model isn't able to take args.n_hidden_fc as an argument.

free AT doesn‘t work in SVHN

I run svhn with "python train.py --dataset=svhn --attack=fgsm --eps=8 --attack_init=zero --fgsm_alpha=1.0 --minibatch_replay=8 --epochs=45 --lr_max=0.01 --eval_iter_freq=200 --batch_size_eval=1024 --half_prec --eval_early_stopped_model --n_final_eval=1000 --seed=0
"
But it didn't work...
the accuracy is only 23%
so....what's wrong ????

Some detail about GradAlign

Hi!
This is a nice work, however, there is a point make me confused.
In your readme, you say gradalign reg is following:

grad1 = utils.get_input_grad(model, X, y, opt, eps, half_prec, delta_init='none', backprop=True)
grad2 = utils.get_input_grad(model, X, y, opt, eps, half_prec, delta_init='random_uniform', backprop=True)
grad1, grad2 = grad1.reshape(len(grad1), -1), grad2.reshape(len(grad2), -1)
cos = torch.nn.functional.cosine_similarity(grad1, grad2, 1)
reg = grad_align_lambda * (1.0 - cos.mean())

where grad1 and grad2 are all record grad.
However in your train.py , you used detach which will not record grad, I don't know which one is better?

grad = grad.detach()

What is the `utils.nullcontext()`

The utils.nullcontext() appears in line 257 of the script train.py. But I can't find the definition. What is this?
I would appreciate it if you could get back to me!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.