GithubHelp home page GithubHelp logo

shyfoo / nemesis Goto Github PK

View Code? Open in Web Editor NEW
7.0 1.0 0.0 2.32 MB

Official implementation of Nemesis: Normalizing the Soft-prompt Vectors of Vision-Language Models (ICLR 2024 Spotlight)

Shell 6.54% Python 93.46%
iclr2024 prompt-tuning

nemesis's Introduction

Nemesis: Normalizing the Soft-prompt Vectors of Vision-Language Models

Paper Link: Nemesis: Normalizing the Soft-prompt Vectors of Vision-Language Models

Highlights

To answer an unexplored research question: "Do we need to normalize the soft prompts in VLMs?", we first uncover a phenomenon, called the Low-Norm Effect by performing extensive corruption experiments, suggesting that reducing the norms of certain learned prompts occasionally enhances the performance of VLMs, while increasing them often degrades it. To harness this effect, we propose a novel method named Normalizing the soft-prompt vectors of vision-language models (Nemesis) to normalize soft-prompt vectors in VLMs. To the best of our knowledge, our work is the first to systematically investigate the role of norms of soft-prompt vector in VLMs, offering valuable insights for future research in soft-prompt tuning.

Besides, we also conduct preliminary experiments to verify the generalizability and effectiveness of Nemesis on other Parameter-EFficient Tuning (PEFT) methods, including visual prompt tuning and prefix-tuning. Detailed results can be found in the following tables.

The Low-Norm Effect

The Low-Norm Effect

The schematic diagram of the Low-Norm Effect

A schematic diagram of the Low-Norm Effect: the reduction of norms at specific positions within these prompts enhances performance, whereas an increase in norms typically results in performance deterioration. Top: corrupted soft prompts with increased norms leading to decreased performance; Middle: soft prompts learned by CoOp; Bottom: corrupted soft prompts with reduced norms resulting in enhanced performance.


The frequency across 11 datasets

The frequency of the Low-Norm Effect across 11 datasets

The 11 datasets have exhibited varying frequencies of the Low-Norm Effect. This observation indicates that tackling the Low-Norm Effect is a challenging task, given its inconsistent manifestation across the 11 datasets.


Explanation 1 Explanation 2

From the left figure, it is apparent that the norms of soft prompts in CoOp first increase and then level off, while test accuracy falls into degradation as norms slowly flatten out. By performing corruption operations that decrease the norms of prompt vectors, the last green circle may be pushed away from the degradation area and get closer to those small green circles that demonstrate superior performance.

From the right figure, we observe a distinct norm variation pattern in CoOp+Nemesis (ours) that differs from CoOp. This pattern demonstrates an initial increase in norms, followed by a subsequent decrease, and eventually reaching a stable state. Furthermore, the test accuracy exhibits a consistent upward trend before reaching a plateau, whereas a declining trend is observed in CoOp.

This implies that our method can delay the time point where soft prompts tend to plateau during the learning process, thereby reducing the probability of learning degradation.

Main Results

Few-shot classification

The result of few-shot classification task

Domain generalization

The result of domain generalization task

Base-to-new generalization

The result of base-to-new task

How to Run

First, you should follow the instructions in DATASETS.md to download datasets.

Next, you should follow the intructions in Dassl.pytorch to install the dassl environment.

Finally, we provide the running scripts in ./scripts, which allow you to reproduce the results.

Make sure you change the path in bash file (/path/to/dataset) and run the commands under different paths, including coop, coop_crt (coop+corruption), coop_nemesis, plot, plot_nemesis.

The running commands of few-shot learning, domain generalization, base-to-new tasks can refer to COOP.md.

Here, we provide examples on how to conduct corruption experiments based on CoOp (./scripts/coop_crt/eval_loop.sh):

Corruption Experiments

# original
CUDA_VISIBLE_DEVICES=0 bash scripts/coop_crt/eval.sh rn50_ep50 end 16 1 False 50 original 666 666
CUDA_VISIBLE_DEVICES=0 bash scripts/coop_crt/eval.sh rn50_ep50 end 16 2 False 50 original 666 666
CUDA_VISIBLE_DEVICES=0 bash scripts/coop_crt/eval.sh rn50_ep100 end 16 4 False 100 original 666 666
CUDA_VISIBLE_DEVICES=0 bash scripts/coop_crt/eval.sh rn50_ep100 end 16 8 False 100 original 666 666
CUDA_VISIBLE_DEVICES=0 bash scripts/coop_crt/eval.sh rn50 end 16 16 False 200 original 666 666
# replace
CUDA_VISIBLE_DEVICES=0 bash scripts/coop_crt/eval.sh rn50_ep50 end 16 1 False 100 replace 0 0.
CUDA_VISIBLE_DEVICES=0 bash scripts/coop_crt/eval.sh rn50_ep50 end 16 1 False 100 replace 1 0.
CUDA_VISIBLE_DEVICES=0 bash scripts/coop_crt/eval.sh rn50_ep50 end 16 1 False 100 replace 2 0.
# rescale
CUDA_VISIBLE_DEVICES=0 bash scripts/coop_crt/eval.sh rn50_ep50 end 16 1 False 100 scale 0 0.001
CUDA_VISIBLE_DEVICES=0 bash scripts/coop_crt/eval.sh rn50_ep50 end 16 1 False 100 scale 1 0.001
CUDA_VISIBLE_DEVICES=0 bash scripts/coop_crt/eval.sh rn50_ep50 end 16 1 False 100 scale 2 0.001

P.S. the last two parameters represent the corrupt position and the corruption weight in corruption experiments, respectively. Hence, they can be set as any number like 666 in the original evaluation since they are not used in this experiment.

Citation

If you use this code in your research, please kindly cite the following paper:

@inproceedings{nemesis,
    title={Nemesis: Normalizing the Soft-prompt Vectors of Vision-Language Models},
    author={Shuai Fu, Xiequn Wang, Qiushi Huang and Yu Zhang},
    booktitle={The International Conference on Learning Representations (ICLR)},
    year={2024}
}

Acknowledgements

Our code is based on CoOp. We thank the authors for releasing their code. If you use our model and code, please consider citing this work as well.

nemesis's People

Contributors

shyfoo avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

nemesis's Issues

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.