GithubHelp home page GithubHelp logo

calip's Introduction

CALIP: Zero-Shot Enhancement of CLIP with Parameter-free Attention

Official implementation of 'CALIP: Zero-Shot Enhancement of CLIP with Parameter-free Attention'.

The paper has been accepted by AAAI 2023.

Introduction

CALIP is a free-lunch enhancement method to boost CLIP’s zero-shot performance via a parameter-free Attention module. Specifically, we guide visual and textual representations to interact with each other and explore cross-modal informative features via attention. As the pre-training has largely reduced the embedding distances between two modalities, we discard all learnable parameters in the attention and bidirectionally update the multi-modal features, enabling the whole process to be parameter-free and training-free. In this way, the images are blended with textual-aware signals and the text representations become visual-guided for better adaptive zeroshot alignment. We evaluate CALIP on various benchmarks of 14 datasets for both 2D image and 3D point cloud few-shot classification, showing consistent zero-shot performance improvement over CLIP. Based on that, we further insert a small number of linear layers in CALIP’s attention module and verify our robustness under the few-shot settings, which also achieves leading performance compared to existing methods.

Requirements

Installation

Create a conda environment and install dependencies:

git clone https://github.com/ZiyuGuo99/CALIP.git
cd CALIP

conda create -n calip python=3.7
conda activate calip

# Install the according versions of torch and torchvision
conda install pytorch torchvision cudatoolkit

pip install -r requirements.txt

Dataset

Follow DATASET.md to install ImageNet and other 10 datasets according to CoOp.

Get Started

Configs

The configuration for running on each dataset can be modified in configs/*.yaml. You need to fill in the data_root with your data path. Also, you can edit the settings of backbone and search as your need, and feel free to adjust beta2 and beta3 for a wider or finer search range.

Note that the default load_cache is False for the first running, leading to storing the encoded features and labels. It can be set as True for faster hyperparamters tuning during later running.

Running

For ImageNet dataset:

CUDA_VISIBLE_DEVICES=0 python run_imagenet.py --config configs/imagenet.yaml

For other 10 datasets: TODO...

Acknowledgement

This repo benefits from CLIP, CoOp, CLIP-Adapter and Tip-Adapter. Thanks for their wonderful works.

Citation

@article{guo2022calip,
  title={Calip: Zero-shot enhancement of clip with parameter-free attention},
  author={Guo, Ziyu and Zhang, Renrui and Qiu, Longtian and Ma, Xianzheng and Miao, Xupeng and He, Xuming and Cui, Bin},
  journal={arXiv preprint arXiv:2209.14169},
  year={2022}
}

Contact

If you have any question about this project, please feel free to contact [email protected].

calip's People

Contributors

ziyuguo99 avatar

Stargazers

Alireza Salehy avatar  avatar Zhakshylyk Nurlanov avatar Chauncey avatar LYUShuai avatar  avatar cgh avatar Yong Sun avatar MorningStar avatar Xiaobing Han avatar 394481125 avatar  avatar zeyu li avatar Friky avatar wenjie zhu avatar Pengfei Yuan  avatar Peng Ying avatar Luo avatar  avatar  avatar  avatar  avatar  avatar AmoySH avatar  avatar Ye Dewang avatar Jianan Deng avatar  avatar Yaoyuan Liang avatar Shay avatar Yukai Guo avatar  avatar Pumpkin avatar Junlin Chang avatar mrxirzzz avatar Koorye avatar Lawrence avatar  avatar  avatar Seongha Eom avatar Peng(Richard) Xia avatar Ichi0406 avatar DY avatar Anno Yanzhe Chen avatar Zhang Tian avatar  avatar Jinyu Liu avatar kstranger avatar  avatar Sandalots avatar  avatar 爱可可-爱生活 avatar Thanh Tin Nguyen avatar  avatar Mohammad Reza Taesiri avatar Chojan Shang avatar  avatar  avatar Xin Zhao avatar  avatar Fengyu Li avatar Youngtaek Oh avatar Yiwen Tang avatar  avatar  avatar  avatar Ricky Yin avatar Ashley En avatar Wang Jiang avatar Shen Hong avatar James Juceo avatar Jaime Lee avatar Priscilla J. Nunez avatar  avatar 0xLemon avatar xiaopenhu avatar  avatar Wen Hao avatar Vishaal Udandarao avatar  avatar Renrui Zhang avatar

Watchers

 avatar  avatar

Forkers

whuhxb

calip's Issues

open code

Nice work! Do authors have any plans to open source?

Visualiation of attention map and the Spacial visual features

Hi, thanks for your great work. I am a new learner. I have encountered some difficulties. I wonder know how to achieve the visualiation of attention map and the spacial visual features? Would you mind sharing examples of visualizations from your experiments? I would appreciate you if you could reply to me.

下游任务的使用与训练细节

非常感谢您对该项目的开源,我有一些问题,希望能得到您的帮助:
1.请问您在项目中所使用的imagenet数据集是哪一个版本的(imagenet官方网站存在多个版本),如果可以的话能否提供一下项目中所使用的数据集?
2.我想将这一工代替clip作用于一些跨模态的下游任务(例如文本-图像编辑),请问您有针对这一问题进行构想吗?一般基于clip的损失是根据图像嵌入与文本嵌入的余弦相似度进行设计,请问对于CALIP应该如何针对两个嵌入设计损失函数?是否可以使用训练CALIP时使用的损失函数
3.请问是否能提供CALIP-FS的相关代码与预训练模型呢?
期待您的回复

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.