GithubHelp home page GithubHelp logo

wysoczanska / clip_dinoiser Goto Github PK

View Code? Open in Web Editor NEW
157.0 9.0 7.0 17.96 MB

Official implementation of 'CLIP-DINOiser: Teaching CLIP a few DINO tricks' paper.

License: Apache License 2.0

Python 33.02% Jupyter Notebook 66.98%

clip_dinoiser's Introduction

CLIP-DINOiser: Teaching CLIP a few DINO tricks for Open-Vocabulary Semantic Segmentation

Monika WysoczańskaOriane SiméoniMichaël RamamonjisoaAndrei BursucTomasz TrzcińskiPatrick Pérez

teaser_v2.png

running_dog.2.mp4

Official PyTorch implementation of CLIP-DINOiser: Teaching CLIP a few DINO tricks.

@article{wysoczanska2023clipdino,
        title={CLIP-DINOiser: Teaching CLIP a few DINO tricks for open-vocabulary semantic segmentation},
        author={Wysocza{\'n}ska, Monika and Sim{\'e}oni, Oriane and Ramamonjisoa, Micha{\"e}l and Bursuc, Andrei and Trzci{\'n}ski, Tomasz and P{\'e}rez, Patrick},
        journal={arxiv},
        year={2023}
}
Updates
  • [27/03/2023] Training code out. Updated weights to the ImageNet trained. Modified MaskCLIP code to directly load weights from OpenCLIP model.
  • [20/12/2023] Code release

Demo

Try our model!

Requirements

Set up the environment:

# Create conda environment
conda create -n clipdino python=3.9
conda activate clipdino
conda install pytorch==1.12.1 torchvision==0.13.1 cudatoolkit=[your CUDA version] -c pytorch
pip install -r requirements.txt

You will also need to install MMCV and MMSegmentation by running:

pip install -U openmim
mim install mmengine    
mim install "mmcv-full==1.6.0"
mim install "mmsegmentation==0.27.0

Running from the notebook

You can try our model through jupyter notebook demo.ipynb. img.png

Running from command line

You can also try our demo from the command line:

python demo.py --file_path [path to the image file] --prompts [list of the text prompts separated by ',']

Example:

python demo.py --file_path assets/rusted_van.png --prompts "rusted van,foggy clouds,mountains,green trees" 

Reproducing results

Dataset preparation

In the paper, following previous works, we use 8 benchmarks; (i) w/ background: PASCAL VOC20, PASCAL Context59, and COCO-Object, and (ii) w/o background: PASCAL VOC, PASCAL Context, COCO-Stuff, Cityscapes, and ADE20k.

To run the evaluation, download and set up PASCAL VOC, PASCAL Context, COCO-Stuff164k, Cityscapes, and ADE20k datasets following MMSegmentation data preparation document.

COCO Object

COCO-Object dataset uses only object classes from COCO-Stuff164k dataset by collecting instance segmentation annotations. Run the following command to convert instance segmentation annotations to semantic segmentation annotations:

python tools/convert_coco.py data/coco_stuff164k/ -o data/coco_stuff164k/

Running evaluation

In order to reproduce our results simply run:

torchrun main_eval.py clip_dinoiser.yaml

or using multiple GPUs:

CUDA_VISIBLE_DEVICES=[0,1..] torchrun --nproc_per_node=auto main_eval.py clip_dinoiser.yaml

Training

Hardware Requirements: you'll need one gpu (~14GB) to run the training. Using NVIDIA GPU A5000 training takes approximately 3 hours.

Dataset preparation

Download ImageNet and update the ImageNet folder path in the configs/clip_dinoiser.yaml file.

Install FOUND

Install FOUND by running:

cd models;
git clone [email protected]:valeoai/FOUND.git
cd FOUND;
git clone https://github.com/facebookresearch/dino.git
cd dino; 
touch __init__.py
echo -e "import sys\nfrom os.path import dirname, join\nsys.path.insert(0, join(dirname(__file__), '.'))" >> __init__.py; cd ../;

Run training

To run the training simply run:

CUDA_VISIBLE_DEVICES=0 torchrun --nproc_per_node=auto train.py clip_dinoiser.yaml

Currently, we only support single gpu training.

Acknowledgments

This repo heavily relies on the following projects:

Thanks to the authors!

clip_dinoiser's People

Contributors

wysoczanska avatar

Stargazers

MemeCat avatar YuHeng avatar 马洛伊 avatar Mengyang Zhao avatar Evgeniy Fominov avatar Minesh A. Jethva avatar st01cs avatar Vladislav Sorokin avatar Mehti Musa(y)ev avatar Yang Fu avatar Xiaobing Han avatar Zhihua Liu avatar  avatar  avatar Dengzhi avatar Jose Cohenca avatar Suphanut Jamonnak avatar Leon avatar Sanctuary avatar Devansh Khandekar avatar David-Hown avatar Linyz avatar Vilonge avatar Heitor Rapela Medeiros avatar  avatar Dai X avatar  avatar  avatar Koorye avatar  avatar Zhao Jiahe avatar  avatar Sule Bai | 白苏乐 avatar  avatar Yujin Lee avatar Martin Liao-WHU avatar Nick avatar Yasufumi Kawano avatar Jiho Choi avatar  avatar Naiyuan Liu avatar  avatar Xianing Chen avatar Ray avatar  avatar Youngtaek Oh avatar Guilherme Euzébio avatar Jianyu Zhang avatar zhikaizhang avatar Yu Han avatar Tomáš Novák avatar fun_dl avatar Bolun Cai avatar  avatar YoungT avatar Shubh Gupta avatar David Beniaguev avatar Shreyas Jaiswal avatar Cyril Zakka, MD avatar Ge Wu avatar Doron Adler avatar Pedro Cuenca avatar Xumin Gao avatar Piyush Bagad avatar Jonathan Chang avatar onmyoji-xiao avatar  avatar KaigeLi avatar Joe Nevaeh avatar Bin Liu avatar  avatar  avatar LeiLei Zeng avatar  avatar  avatar Lancaster Li avatar Jeff Carpenter avatar Chunming He avatar  avatar Albert Mohwald avatar  avatar  avatar Vishaal Udandarao avatar Jameel Hassan avatar  avatar  avatar Alex Sommer avatar Matt Shaffer avatar Gabrielle Hoyer avatar stanley avatar Trad Do avatar pe653 avatar  avatar senlinuc avatar Junwei Zhou avatar Zeren Xiong avatar Mint. avatar  avatar cnxup avatar kyle avatar

Watchers

Mike avatar  avatar  avatar  avatar Ankush Singal avatar LI Wentong avatar  avatar Matt Shaffer avatar Luo Xinjie avatar

clip_dinoiser's Issues

When I execute the demo.py command, something goes wrong

Because I can't connect to https://huggingface.co, I replaced model_prefix: 'hf-hub:laion' with model_prefix: 'D:/pycharm/py-project/clip_dinoiser-main/clip_dinoiser-main', but AssertionError: CLIP_DINOiser: MaskClip: No valid model config found for D:/pycharm/py-project/clip_dinoiser-main/clip_dinoiser-main/CLIP-ViT-B-16-laion2B-s34B-b88K. How do I solve this problem?

Inference Issue

While trying the demo.ipynb and demo.py with the given example image and checkpoint file, I got this error

RuntimeError: Given groups=1, weight of size [512, 768, 1, 1], expected input[1, 0, 24, 36] to have 768 channels, but got 0 channels instead

Is the train code available?

Thanks for your impressive work! I'd like to ask whether the train code of the simple conv layer is available to open source?

About the two trainable conv layers.

Hi, Thank you for your wonderful works! Which is very inspiring to me.
And I wonder whether replace the two conv layers with more sophisticated modules or layers can bring more benefit or not.
Because the conv layer is just way too simple.

TypeError: 'type' object is not subscriptable

Traceback (most recent call last):
  File "demo.py", line 13, in <module>
    from utils.visualization import mask2rgb
  File "/home/yg/FOUND/clip_dinoiser/utils/__init__.py", line 2, in <module>
    from .visualization import *
  File "/home/yg/FOUND/clip_dinoiser/utils/visualization.py", line 3, in <module>
    def mask2rgb(mask: np.array, palette: list[list]):
TypeError: 'type' object is not subscriptable

Should it be palette: "list[list]"?

How do you calculate the patch affinity?

First, thanks for your exciting work. But I still feel a little confused about the construction of patch affinity for DINO. To my acknowledge, the value embeddings in the last attention layers are organized in form of multi-head structure, how do you deal with the multi-head value embeddings for calculating the similarity? Flattening the heads? Or calculating the similarities per head and averaging them? Hope for your answering.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.