GithubHelp home page GithubHelp logo

chenpan0103 / co-attack Goto Github PK

View Code? Open in Web Editor NEW

This project forked from adversarial-for-goodness/co-attack

0.0 0.0 0.0 1.52 MB

official PyTorch implement of Towards Adversarial Attack on Vision-Language Pre-training Models

License: GNU General Public License v3.0

Python 100.00%

co-attack's Introduction

This is the official PyTorch implement of the paper "Towards Adversarial Attack on Vision-Language Pre-training Models" at ACM Multimedia 2022.

Update 20/03/2023

To get the ASR, you should run "--adv 0" to get the clean accuracy, then run "--adv 4" to get the adversarial accuracy, and the ASR = clean accuracy-adversarial accuracy.

Update 29/11/2022

We released the fine-tuned checkpoints (Baidu, password: iqvp) for VE task on ALBEF and TCL, which can be considered not only as an attacked model in this paper, but also useful for other studies.

Requirements

  • pytorch 1.10.2
  • transformers 4.8.1
  • timm 0.4.9
  • bert_score 0.3.11

Download

Evaluation

Adv Instruction
0 No Attack
1 Attack Text
2 Attack Image
3 Attack Both (vanilla)
4 Co-Attack

When attack unimodal embedding, using "--adv 4" and not using "--cls" will raise an expected error due to the different sequence length of image embedding and text embedding.

Image-Text Retrieval

Download MSCOCO or Flickr30k datasets from origin website.

# Attack Unimodal Embedding
python RetrievalEval.py --adv 4 --gpu 0 --cls \
--config configs/Retrieval_flickr.yaml \
--output_dir output/Retrieval_flickr \
--checkpoint [Finetuned checkpoint]

# Attack Multimodal Embedding
python RetrievalFusionEval.py ...

# Attack Clip Model
python RetrievalCLIPEval.py --adv 4 --gpu 0 --image_encoder ViT-B/16  ...

Visual Entailment

Download SNLI-VE datasets from origin website.

# Attack Unimodal Embedding
python VEEval.py --adv 4 --gpu 0 --cls \
--config configs/VE.yaml \
--output_dir output/VE \
--checkpoint [Finetuned checkpoint]

# Attack Multimodal Embedding
python VEFusionEval.py ...

Visual Grounding

Download MSCOCO dataset from the original website.

# Attack Unimodal Embedding
python GroundingEval.py --adv 4 --gpu 0 --cls \
--config configs/Grounding.yaml \
--output_dir output/Grounding \
--checkpoint [Finetuned checkpoint]

# Attack Multimodal Embedding
python GroundingFusionEval.py ...

Visualization

python visualization.py --adv 4 --gpu 0

Citation

If you find this code to be useful for your research, please consider citing.

@inproceedings{zhang2022towards,
  title={Towards Adversarial Attack on Vision-Language Pre-training Models},
  author={Zhang, Jiaming and Yi, Qi and Sang, Jitao},
  booktitle="Proceedings of the 30th ACM International Conference on Multimedia",
  year={2022}
}

Reference

co-attack's People

Contributors

jiamingzhang94 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.