ImageNet classifier with state-of-the-art adversarial robustness

License: Other

Python 91.23% Shell 8.77%

imagenet-adversarial-training's Introduction

Feature Denoising for Improving Adversarial Robustness

Code and models for the paper Feature Denoising for Improving Adversarial Robustness, CVPR2019.

Introduction

By combining large-scale adversarial training and feature-denoising layers, we developed ImageNet classifiers with strong adversarial robustness.

Trained on 128 GPUs, our ImageNet classifier has 42.6% accuracy against an extremely strong 2000-steps white-box PGD targeted attack. This is a scenario where no previous models have achieved more than 1% accuracy.

On black-box adversarial defense, our method won the champion of defense track in the CAAD (Competition of Adversarial Attacks and Defenses) 2018. It also greatly outperforms the CAAD 2017 defense track winner when evaluated against CAAD 2017 black-box attackers.

This repo contains:

Our trained models, together with the evaluation script to verify their robustness. We welcome attackers to attack our released models and defenders to compare with our released models.
Our distributed adversarial training code on ImageNet.

Please see INSTRUCTIONS.md for the usage.

License

This project is under the CC-BY-NC 4.0 license. See LICENSE for details.

Citation

If you use our code, models or wish to refer to our results, please use the following BibTex entry:

@InProceedings{Xie_2019_CVPR,
  author = {Xie, Cihang and Wu, Yuxin and van der Maaten, Laurens and Yuille, Alan L. and He, Kaiming},
  title = {Feature Denoising for Improving Adversarial Robustness},
  booktitle = {The IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
  month = {June},
  year = {2019}
}

imagenet-adversarial-training's People

Contributors

Stargazers

Watchers

Forkers

carlini danhlephuoc wielandbrendel ml-lab hyzcn gdcollect collector-m huaizhengzhang mmarius trendingtechnology hanboa chiukin nizvoo milletzcz bityangke sunshine352 siyuhsu yi6ei2ifd duoergun0729 kutim zblyou1 peterzs yangsuhui shubhampachori12110095 chomolungma yuxiaomu kevin-jfzhu debasishmaji advboxzoo jarygrace dany098 yingmuying phecy hdjsjyl louis14lan xyzforever machanic vaishaal bigdatasciencegroup alisaxxw andeelliott aixioma fuweijie angleboy8 lin2020 vcchy qilong-zhang gehuangyi20 wanggrun liuzh47 robot-ai-machinelearning candlia 13227018679 vickyqi7 trantorrepository forlovezed herb2333 liangjie15 dake277 zhangzj2114 hacktfj edmontdants lironghuo ethanlan mfazelnia yanyepeng fsong666 shivangi-975 deanamoo526 xianfenggao 13269562786 lural1226 haoyang-219 helenr6 sleepyhc logichen rid-yi liaw05 nickrhee futakw joannelin168 eropeconsulting ares-research

imagenet-adversarial-training's Issues

Want to Convert to Pytorch

Hi! I want to run some tests on this model and adapt it with some techniques using Pytorch. I don't have the computational resources to retrain the model. Is there a way for me to convert this model into Pytorch, loading the parameters and all?
Thanks for your help!

Plan to release ResNet101 baseline model?

Hello,

I am wondering could you release the baseline ResNet101/50 models? This will help analyze the model robustness of different architectures.Thank you.

Where are the pretrained models

Hi,
Thank you for making the code available but I can't find pre-trianed models. Where can I download the models

Question about the code for converting variables to FP16

Hi,
I have a question about the code for converting variables to FP16 in adv_model.py:

if kwargs['dtype'] == tf.float16:
   kwargs['dtype'] = tf.float32
   ret = getter(*args, **kwargs)
   ret = tf.cast(ret, tf.float16)
   log_once("Variable {} casted to fp16 ...".format(name))
   return ret

From my understanding, the fp16 custom variable getter's function is to convert all variables that are originally FP32 to FP16. But the code above seems to convert variables that are originally FP16 to FP16 (and changing the dtype to float32). So it makes more sense to me to:

if kwargs['dtype'] == tf.float32:
   kwargs['dtype'] = tf.float16
....

I'm not familiar with Tensorflow's custom_getter implementation so I'd appreciate your thoughts on this. Thanks very much!

low accuarcy of black box defending ImageNet2012 val dataset (with no noise)

Hello, I have a question with the accuracy of testing ImageNet2012 val dataset.

The dataset has 50000 images. I use ResNeXt101 DenoiseAll model to test the accuracy
The command used by me is
python main.py --eval-directory /path/to/image/directory --prediction-file predictions.txt \ --load X101-DenoiseAll.npz -d 101 --arch ResNeXtDenoiseAll --batch 20

I have checked that the synet file is the same as used by your code "from tensorpack.dataflow.dataset import ILSVRCMeta".

I have changed the code of line 244-246 of main.py to

count=0
acc_count=0
with open(args.prediction_file, "w") as f:
    for filename, pred_label in zip(files, results):
        f.write("{},{}\n".format(filename, pred_label))
        filename = filename.split('/')[-1]
        real_label = label_of_imagenet_name[filename]
        count+=1
        if pred_label==real_label:
            acc_count+=1
print(float(acc_count)/float(count))

for calculating the accuracy. However, the accuracy is 0.00276, which is really low.

For example, the prediction label of the image 00027103.jpeg is 968 while the ground truth label is 691.
the prediction label of the image 00017311.jpeg is 352 while the ground truth label is 268.

For your convenience, here I paste more prediction labels of different images for checking my results.

ILSVRC2012_val_00027103.JPEG,968
ILSVRC2012_val_00017311.JPEG,352
ILSVRC2012_val_00048769.JPEG,950
ILSVRC2012_val_00020476.JPEG,55
ILSVRC2012_val_00034219.JPEG,868
ILSVRC2012_val_00029952.JPEG,703
ILSVRC2012_val_00014941.JPEG,858
ILSVRC2012_val_00045553.JPEG,506
ILSVRC2012_val_00022086.JPEG,868
ILSVRC2012_val_00009186.JPEG,868
ILSVRC2012_val_00047907.JPEG,769
ILSVRC2012_val_00041376.JPEG,743
ILSVRC2012_val_00009707.JPEG,525
ILSVRC2012_val_00028195.JPEG,458
ILSVRC2012_val_00009320.JPEG,572
ILSVRC2012_val_00004623.JPEG,868
ILSVRC2012_val_00042993.JPEG,458
ILSVRC2012_val_00012584.JPEG,9
ILSVRC2012_val_00006028.JPEG,700
ILSVRC2012_val_00015228.JPEG,911
ILSVRC2012_val_00047254.JPEG,624
ILSVRC2012_val_00007273.JPEG,659
ILSVRC2012_val_00019123.JPEG,868
ILSVRC2012_val_00032259.JPEG,968
ILSVRC2012_val_00017723.JPEG,868
ILSVRC2012_val_00018300.JPEG,868
ILSVRC2012_val_00009639.JPEG,458
ILSVRC2012_val_00046594.JPEG,112
ILSVRC2012_val_00020428.JPEG,61
ILSVRC2012_val_00024217.JPEG,763
ILSVRC2012_val_00034291.JPEG,61
ILSVRC2012_val_00047050.JPEG,631

Whether the code and results of my experiment are right?
If there are any errors, please tell me why! Thank you very much!

Does robustness gain originate from better converge of adv training?

Hi, I am deeply attracted by your paper. However, here are three questions that confuse me.

CW loss is more powerful than cross-entropy used in your code, because attackers can decrease logit z_y(y is the label) effectively. Will it be more convincing to use CW loss when you claim your model are robust. Intuitively, the unbreakable model may partially mean that the system is unlearnable.
I find that the robustness of the denoising version 'clean' trained model is not reported. I wonder how much gain denoising blocks without adv training can bring.
Does robustness gain originate from better converge of adv training?
As I do not find the number mentioned in question 2, I conjecture that denoising block is beneficial for the adv training, which means that denoising blocks may improve the accuracy of adv trained model on clean images. Numbers reported in Sec. 6.3 show that adv training with denoising blocks can increase accuracy from 62.3% to 65.3%. And, Figure 6 shows that adv training with denoising blocks can increase accuracy from 39.2% to 42.6%, also about 3%. Thus is the next step making adv training converge better?
Looking forward to your reply. :-)

Request of feature denoising resnet-152 checkpoint trained on clean images.

I have played with your feature denoising module trained with adversarial images. And I am wondering whether it is possible to release the model on Table.3 (https://arxiv.org/pdf/1812.03411.pdf). Thanks a lot!

Running without horovod?

I'm having trouble getting horovod installed properly on my computer. Is there away to predict on the model without horovod?

how the 1*1 conv. judge whether removing noise or retaining signal?

If you met an unexpected problem when using the code, please include the following in your issues:

What you did: the command you run.
What you observed: the full logs and other relevant information.
What you expected, if not obvious.

ResNet-101 baseline

In the paper its mentioned that the baselines compared against are adversarially robust ResNet-101 and ResNet-152. Will the ResNet-101 baseline model be available?

Why is baseline so good?

I have a quick question: your baseline almost reaches the denoising model results. However, as far as I can see the baseline is a fairly standard adversarial training procedure that was also tested in the ALP paper (M-PGD). There the baseline reached only single-digit accuracy numbers against simple PGD attacks. Would be great if you could clarify what I am missing.

resnet_model.py --> non_local_op

if embed:
why did you divide n_in by 2 as n_in / 2. both in theta and phi. i am curious beoz it produce different shape of tensor. Can you please check. the image shows the respective portion.

Guide to use different denoising operation

Hello,
Can you please provide a quick guide on how to modify the code to use different image denoising operations when building the model?
Thanks

Format for input data

Hello,

I'm trying to run your inference-example.py file, and I'm confused about how the input data should be formatted. You have some normalization in the inference-example.py file, but also in when you build the graph. Should I be plugging in a uint8 array, or some type of normalized array. I'm trying to use your model with my own modified dataset.

Experiments with cifar-10

Hi, as per the Madry's baseline the robust model has a PGD accuracy of 47% for cifar-10 dataset. Does the feature denoising technique improve the robustness of the model for the cifar-10 too?
Do you have any experimental result for cifar-10 ?

Question on running the evaluation

Hello, thanks for the contribution. I was trying to evaluate the method using the command:
python main.py --eval --data ./ILSVRC/Data/CLS-LOC --load ./models/R152-Denoise.npz --attack-iter 100 --attack-epsilon 16.0
where ./ILSVRC/Data/CLS-LOC is the folder containing the validation set of ImageNet.
I added a line of code in imagenet_utils.py to print acc1.ratio in real time.
However, the top 1 error is around 0.99. Am I misunderstanding the metrics?

I get very high top 1 error rate 80.918% on clean Imagenet 2012 Val using R152.npz

Firstly I run 'inference-example.py' and find that the predictions are always wrong (input clean images from imagenet 2012 val). Then I evaluate R152.npz on the whole validation set, and find the top 1 error rate is 80.918%. Did I do something wrong? I have checked --arch and -d, and I use ResNet and 152.

Tensorflow graph

Thanks for releasing your defence! I'd love to play around with it in more detail but I have troubles converting it into standard tensorflow graph (probably because I have no experience with tensorpack). Here is how far I got:

parser = argparse.ArgumentParser()
parser.add_argument('-d', '--depth', help='ResNet depth',
                    type=int, default=50, choices=[50, 101, 152])
parser.add_argument('--arch', help='Name of architectures defined in nets.py',
                    default='ResNetDenoise')
args = parser.parse_args([])

model = getattr(nets, args.arch + 'Model')(args)

image = tf.placeholder(tf.float32, shape=(1, 224, 224, 3))

with TowerContext(tower_name='', is_training=False):
    logits = model.get_logits(image)

But now I'd like to get the session in which the logits and the pretrained weights live, which apparently is not the default session. Could you give me a quick hint how to get that session? Thanks!

How do you visualize features in the paper

Hi, thanks for your impressive work! I wonder how do you visualize the features before and after denoising as in the paper. Sorry I unable to find any details about the visualization methods in the paper. Plus, do you have any plan for releasing your code for visualization? Thanks in advance!

Image Preprocessing for ImageNet Validation

First of all, thank you very much for your very useful code!

I am attempting to reproduce the quoted numbers in the Instructions.md file, and am having trouble. I suspect the issue may be that I am preprocessing the image differently than the code used to create these results. I was wondering if someone could let me know precisely how the preprocessing must be done in order to reproduce the results (even for "clean" images, i.e. without any adversarial attack).

Black-box attack code

Can you please release black-box attack code?

How can I transform the code of de-noising module into Caffe code

If you met an unexpected problem when using the code, please include the following in your issues:

What you did: try to convert the code of de-noising module to Caffe code
What you observed: I can not implement the algorithm of your module using Caffe.
What you expected, if not obvious.

Hello.

I want to use this module on a Caffe project. But I have no way to implement the algorithm. For example, this de-noising module used a lot of 'tf.einsum()'. I am new to Caffe. Could you provide some suggestions for me to implement this module using Caffe code?

Thank you !

What you did: the command you run.
What you observed: the full logs and other relevant information.
What you expected, if not obvious.

facebookresearch / imagenet-adversarial-training Goto Github PK