GithubHelp home page GithubHelp logo

chongzhou96 / maskclip Goto Github PK

View Code? Open in Web Editor NEW

This project forked from open-mmlab/mmsegmentation

380.0 380.0 27.0 12.98 MB

Official PyTorch implementation of "Extract Free Dense Labels from CLIP" (ECCV 22 Oral)

Home Page: https://www.mmlab-ntu.com/project/maskclip/

License: Apache License 2.0

Python 98.90% Dockerfile 0.12% Shell 0.98%

maskclip's People

Contributors

amrit110 avatar chongzhou96 avatar clownrat6 avatar congee524 avatar daavoo avatar drcut avatar dreamerlin avatar freywang avatar grimoire avatar hellock avatar innerlee avatar johnzja avatar junjue-wang avatar junjun2016 avatar lkm2835 avatar mengzhangli avatar mmeendez8 avatar pkurainbow avatar rangilyu avatar rockeycoss avatar shoupingshan avatar sixiaozheng avatar sshuair avatar uni19 avatar vvsssssk avatar wuziyi616 avatar xiaojianzhong avatar xvjiarui avatar yamengxi avatar yinchimaoliang avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

maskclip's Issues

Settings for experiment with ViT-B/32 and ViT-L/14

Thanks for the wonderful paper and repo.

I was able to reproduce MaskClip and MaskClip+ with ViT-B/16 + R101 on Pascal context dataset. The result mAp is 25.45 and 29.48 respecitively.

However, when I tried to change the model to ViT-B/32 and ViT-L/14 the result is not good, less than half of ViT-B/16 and the quanlitative result shows that the predicted dense label is generally a mess.

What I did was:

  1. convert weight and backbone and extract text embeddings for ViT-B/32 and ViT-L/14
  2. create a config accoding to ViT-B/16, with modifications:
    • change the patch size to 32 for ViT-B/32
    • change the pathc size to 14, embed_dims to 1024, num_layers to 24 for ViT-L/14

Is there anything I've done wrong or misunderstood? Do you have any suggestions on why the result is bad?

Thanks in advance.

Modified CLIP visual encoder

Hi and thanks for you're work on MaskCLIP.

I just read the paper and tried to follow it using you're provided codebase.
In the paper you mention that you alter the CLIP image encoder using 1x1 convolutions, but that no training is needed for MaskCLIP.

Iam wondering where in the code you do that?

Thanks in advance,

kind regards,

M

1 x 1 conv vs linear

What's the different between 1 x 1 and linear? why should we do the replacement?

QUESTION: Connection reset by peer

When I run the command "python tools/maskclip_utils/convert_clip_weights.py --model ViT16 --backbone", it gives the following error
image
Could anybody help to solve it? Thanks a lot!

Semantic Segmentation Quality

First of all, great work and thank you for open source all the code! I was trying to test the model using Pascal VOC 2010 Dataset but somehow the result does not look quite right. I wonder if it is something I should expect? the command is "python tools/test.py configs/maskclip/maskclip_vit16_520x520_pascal_context_59.py pretrain/ViT16_clip_backbone.pth --show-dir outp
Screenshot from 2023-07-25 12-48-00
ut/"

About 1x1 convolution

Why does MakClip's 1x1 convolution need no training? Also, why can't I see the 1x1 position in the code?

Key Smoothing and Prompt Denoising

Hello, thanks for releasing the code. Can anyone point me to the code section which implements the key smoothing and prompt denoising as proposed in Section 3.3 in the paper? I tried hard to search in the repo but couldn't find them. Your help is highly appreciated.

Slow inference speed when using RN50

Dear author, thanks for your great work. When testing maskplus with CLIP of RN50, the inference speed is very slow, can u check that?
This is the command:

./tools/dist_test.sh configs/maskclip_plus/zero_shot/maskclip_plus_r50_deeplabv2_r101-d8_512x512_80k_coco
-stuff164k.py work_dirs/maskclip_plus_r50_deeplabv2_r101-d8_512x512_80k_coco-stuff164k/latest.pth 8 --eval mIoU

how split novel and base class when test?

Hi, thanks you for your such a great work.
I wonder how to split novel and base class when we test on three datasets. I didn't find the implementation. I didn't find the implementation.

Can you help me or tell me how to implemente that?
Thanks!

When will the pretrained model be released?

Hi I'm super interested in this paper.
Currently we are retraining the MaskCLIP+ and the results do not seem so goos as in the paper. May I ask when will the pretrained models be released?

Backbone Pre-train weight?

Hello, thanks for your nice code and nice paper!

One question I wonder is that when I see the code, there are no code show that pre-train weight load to the backbone, I find the pre-train weights are only load to segmentation head, could you show me where are the code to load backbone weights of CLIP encoder? thanks

Request Demo

Dear authors.

Thank you for your nice work, and congratulations that MaskCLIP is accepted to ECCV as an oral paper!

I have tried your code to run a single image with a set of object classes, but actually failed to obtain meaningful localization results from MaskCLIP.

I'm sorry to bother you, but could you provide a demo file? Batman example would be nice.

Thank you so much!

something wrong with 'mmcv'

Traceback (most recent call last):
File "tools/test.py", line 11, in
from mmcv.parallel import MMDataParallel, MMDistributedDataParallel
ModuleNotFoundError: No module named 'mmcv.parallel'

why the num_classes=0?

hi,thanks for the code ,but when i try to train the MaskCLIP+, I found an error:
model= dict( type='EncoderDecoder', pretrained='open-mmlab://resnet101_v1c', backbone=dict( type='ResNetV1c', depth=101, num_stages=4, out_indices=(0, 1, 2, 3), dilations=(1, 1, 2, 4), strides=(1, 2, 1, 1), norm_cfg=dict(type='SyncBN', requires_grad=True), norm_eval=False, style='pytorch', contract_dilation=True), decode_head=dict( type='MaskClipPlusHead', in_channels=2048, channels=1024, num_classes=0, dropout_ratio=0, ...

why the num_classes of decode_head=0? It may cause an indexerror. how to solve it?
Looking forward to your reply😄

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.