GithubHelp home page GithubHelp logo

saurav-31 / vitol Goto Github PK

View Code? Open in Web Editor NEW
29.0 3.0 6.0 802 KB

ViTOL

Home Page: https://openaccess.thecvf.com/content/CVPR2022W/L3D-IVU/papers/Gupta_ViTOL_Vision_Transformer_for_Weakly_Supervised_Object_Localization_CVPRW_2022_paper.pdf

License: MIT License

Python 99.55% Shell 0.45%
cvpr-2022 pytorch l3divu

vitol's Introduction

ViTOL: Vision Transformers for Weakly Supervised Object Localization

Official implementation of the paper ViTOL: Vision Transformer forWeakly Supervised Object Localization which is accepted as CVPRW-2022 paper for L3DIVU-2022.

This repository contains inference code and pre-trained model weights for our model in Pytorch framework. Code is trained and tested in Python 3.6.9 and Pytorch version 1.7.1+cu101

ViTOL-GAR Localization maps:

vitol

Model Zoo

We provide pre-trained weights for VITOL with DeiT-S and DeiT-B backbone on ImageNet-1k and CUB datasets below.

ImageNet: ViTOL-base, ViTOL-small
CUB: Updating soon

Results on ImageNet-1k dataset

Method MaxBoxAccV2 Top1Acc IOU50 Top1Cls
ViTOL-GAR Small 69.61 54.74 71.86 71.84
ViTOL-LRP Small 68.23 53.62 70.48 71.84
ViTOL-GAR Base 69.17 57.62 71.32 77.08
ViTOL-LRP Base 70.47 58.64 72.51 77.08

Results on CUB dataset

updating soon

Usage

Clone the repository git clone https://github.com/Saurav-31/ViTOL.git

Setup conda environment

conda env create -f environment.yml
conda activate vitol

Dataset preparation

Please refer here for dataset preparation

Inference results on ImageNet

Edit the config files under configs folder
1. Add paths to ImageNet dataset
--data_root=\PATH\TO\DATASET
--metadata_root=\PATH\TO\GROUND_TRUTH 
2. Download ViTOL weights and copy to directory named "pretrained_weights"

--CHECKPOINT_NAME=$VITOL_WEIGHTS_TAR_FILENAME

RUN ViTOL Base with GAR

bash evaluate.sh configs/ilsvrc/ViTOL_GAR_base.yml

RUN ViTOL Small with GAR

bash evaluate.sh configs/ilsvrc/ViTOL_GAR_small.yml

To do
  • Setup Training Code for the same
  • Train the model with more stronger backbones
  • Jupyter notebook for visualization
We borrow code from

Evaluating Weakly Supervised Object Localization Methods Right (CVPR 2020) Transformer Interpretability Beyond Attention Visualization (CVPR 2021)

Contacts

If you have any question about our work or this repository, please don't hesitate to contact us by emails.

Citation

If you find this work useful, please cite as follows:

@inproceedings{gupta2022vitol,
  title={ViTOL: Vision Transformer for Weakly Supervised Object Localization},
  author={Gupta, Saurav and Lakhotia, Sourav and Rawat, Abhay and Tallamraju, Rahul},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  pages={4101--4110},
  year={2022}
}

vitol's People

Contributors

saurav-31 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

vitol's Issues

Code for training

Thank you for sharing this awesome work!

Are you planning to publish the code of training soon?

Thanks.!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.