GithubHelp home page GithubHelp logo

raytrun / mamba-clip Goto Github PK

View Code? Open in Web Editor NEW
39.0 3.0 3.0 53.99 MB

CLIP-Mamba: CLIP Pretrained Mamba Models with OOD and Hessian Evaluation

Python 61.78% C++ 11.79% Cuda 25.56% C 0.80% Shell 0.07%

mamba-clip's Introduction

CLIP-Mamba: CLIP Pretrained Mamba Models with OOD and Hessian Evaluation

[Paper][🤗Ckpts]

Abstract

'''State space models and Mamba-based models have been increasingly applied across various domains, achieving state-of-the-art performance. This technical report introduces the first attempt to train a transferable Mamba model utilizing contrastive language-image pretraining (CLIP). We have trained Mamba models of varying sizes and undertaken comprehensive evaluations of these models on 26 zero-shot classification datasets and 16 out-of-distribution (OOD) datasets. Our findings reveal that a Mamba model with 67 million parameters is on par with a 307 million-parameter Vision Transformer (ViT) model in zero-shot classification tasks, highlighting the parameter efficiency of Mamba models. In tests of OOD generalization, Mamba-based models exhibit exceptional performance in conditions of OOD image contrast or when subjected to high-pass filtering. However, a Hessian analysis indicates that Mamba models feature a sharper and more non-convex landscape compared to ViT-based models, making them more challenging to train.'''

Main results

Zero-shot performance of different architectures trained with CLIP

Methods Food-101 CIFAR-10 CIFAR-100 CUB SUN397 Cars Aircraft DTD Pets Caltech-101 Flowers MNIST FER-2013 STL-10 EuroSAT RESISC45 GTSRB KITTI Country211 PCAM UCF101 Kinetics700 CLEVR HatefulMemes SST2 ImageNet
VMamba_B (89M) 48.5 58.0 29.9 36.5 50.4 5.8 8.5 26.5 30.2 64.7 52.8 9.7 19.6 91.9 16.0 30.4 7.9 40.2 10.2 59.9 35.2 25.6 12.6 51.6 50.1 38.3
VMamba_S (50M) 49.4 70.3 34.3 39.1 53.9 6.9 8.4 26.0 31.3 68.7 54.1 10.1 9.8 92.8 17.6 31.4 6.9 23.5 10.9 54.2 38.4 27.1 13.2 50.5 50.0 40.0
VMamba_T220 (30M) 46.5 50.9 22.9 35.6 51.1 5.7 6.8 25.1 31.0 64.9 54.0 10.1 12.5 91.6 13.9 25.4 10.7 32.3 9.9 55.0 34.0 25.1 12.7 53.9 50.6 38.7
Simba_L (66.6M) 52.7 67.4 31.0 39.1 52.7 6.9 9.1 27.8 33.4 68.9 55.9 8.0 16.0 93.9 17.4 32.3 8.9 41.5 11.1 58.1 35.7 27.9 12.1 54.9 50.1 41.6
VIT_B(84M) 50.6 66.0 34.5 38.8 51.1 4.0 5.4 21.2 28.5 60.9 53.3 8.4 17.3 90.5 30.2 21.5 6.1 35.1 10.5 53.5 28.5 22.1 10.8 52.4 50.7 37.6
VIT-L(307M) 59.5 72.9 41.5 40.3 53.6 6.9 6.4 20.6 27.9 65.4 55.0 10.3 34.5 94.2 22.7 28.8 5.8 41.4 12.5 54.9 34.3 24.0 12.9 54.3 50.1 40.4

Acknowledgment

This project is based on A-CLIP (paper, code), VMamba (paper, code), SiMBA (paper, code), thanks for their excellent works.

mamba-clip's People

Contributors

raytrun avatar

Stargazers

yzj2019 avatar  avatar Yang Bai avatar Siavash avatar An-zhi WANG avatar  avatar  avatar  avatar  avatar limeng avatar Joe Stone avatar 爱可可-爱生活 avatar  avatar fun_dl avatar pdaodao avatar Zhuojun Sun CV Student avatar Tao Hu avatar  avatar Mint. avatar Chen J. avatar  avatar  avatar  avatar  avatar  avatar Yong Sun avatar  avatar Henry avatar Jeff Carpenter avatar  avatar Neil Van avatar Yuchong Yao avatar Yif Yang avatar Matt McCormick avatar Youngtaek Oh avatar Vishaal Udandarao avatar Xiaobing Han avatar  avatar WHL avatar

Watchers

 avatar Kostas Georgiou avatar  avatar

mamba-clip's Issues

看论文感觉性能不错,是不是内容有点少

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.