GithubHelp home page GithubHelp logo

luuyin / ilm-vp Goto Github PK

View Code? Open in Web Editor NEW

This project forked from jincan333/ilm-vp

0.0 0.0 0.0 800 KB

[CVPR23] "Understanding and Improving Visual Prompting: A Label-Mapping Perspective" by Aochuan Chen, Yuguang Yao, Pin-Yu Chen, Yihua Zhang, and Sijia Liu

Python 100.00%

ilm-vp's Introduction

ILM

[CVPR23] "Understanding and Improving Visual Prompting: A Label-Mapping Perspective"

In this work, we revisit and advance visual prompting (VP), an input prompting technique for vision tasks. VP can reprogram a fixed, pre-trained source model to accomplish downstream tasks in the target domain by simply incorporating universal prompts (in terms of input perturbation patterns) into downstream data points. Yet, it remains elusive why VP stays effective even given a ruleless label mapping (LM) between the source classes and the target classes. Inspired by the above, we ask: How is LM interrelated with VP? And how to exploit such a relationship to improve its accuracy on target tasks? We peer into the influence of LM on VP and provide an affirmative answer that a better 'quality' of LM (assessed by mapping precision and explanation) can consistently improve the effectiveness of VP. This is in contrast to the prior art where the factor of LM was missing. To optimize LM, we propose a new VP framework, termed ILM-VP (iterative label mapping-based visual prompting), which automatically re-maps the source labels to the target labels and progressively improves the target task accuracy of VP. Further, when using a contrastive language-image pretrained (CLIP) model, we propose to integrate an LM process to assist the text prompt selection of CLIP and to improve the target task accuracy. Extensive experiments demonstrate that our proposal significantly outperforms state-of-the-art VP methods. As highlighted below, we show that when reprogramming an ImageNet-pretrained ResNet-18 to 13 target tasks, our method outperforms baselines by a substantial margin, e.g., 7.9% and 6.7% accuracy improvements in transfer learning to the target Flowers102 and CIFAR100 datasets. Besides, our proposal on CLIP-based VP provides 13.7% and 7.1% accuracy improvements on Flowers102 and DTD respectively.

Overview

What is in this repository?

We provide the training code for our ILM-VP method and the baselines both on ResNets and CLIP.

Dependencies

Run pip3 install -r requirement.txt.

For path configurations, modify cfg.py according to your need.

Datasets

You can find datasets here.

Put dataset under data_path in cfg.py. (e.g., data_path/flowers102, data_path/ucf101, ...)

For Flowers102, DTD, UCF101, Food101, EuroSAT, OxfordPets, StanfordCars and SUN397, we use datasets splitted in CoOp. For ABIDE, we use download code in acerta-abide. For other datasets we use official ones provided by pytorch.

Generate Prompts

VP on CNN:

ILM-VP: python experiments/cnn/ilm_vp.py --network resnet18 --dataset flowers102

FLM-VP: python experiments/cnn/flm_vp.py --network resnet18 --dataset flowers102

RLM-VP: python experiments/cnn/rlm_vp.py --network resnet18 --dataset flowers102

TP on CLIP:

ILM-TP-VP: python experiments/clip/ilm_tp_vp.py --dataset flowers102

SINGLE-TP-VP: python experiments/clip/single_tp_vp.py --dataset flowers102

Contributor

Aochuan Chen

Citation

@article{chen2022understanding,
  title={Understanding and Improving Visual Prompting: A Label-Mapping Perspective},
  author={Chen, Aochuan and Yao, Yuguang and Chen, Pin-Yu and Zhang, Yihua and Liu, Sijia},
  journal={arXiv preprint arXiv:2211.11635},
  year={2022}
}

ilm-vp's People

Contributors

phoveran avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.