chongzhou96 / maskclip Goto Github PK

View Code? Open in Web Editor NEW

This project forked from open-mmlab/mmsegmentation

380.0 380.0 27.0 12.98 MB

Official PyTorch implementation of "Extract Free Dense Labels from CLIP" (ECCV 22 Oral)

Home Page: https://www.mmlab-ntu.com/project/maskclip/

License: Apache License 2.0

Python 98.90% Dockerfile 0.12% Shell 0.98%

maskclip's People

Contributors

Stargazers

Watchers

maskclip's Issues

Settings for experiment with ViT-B/32 and ViT-L/14

Thanks for the wonderful paper and repo.

I was able to reproduce MaskClip and MaskClip+ with ViT-B/16 + R101 on Pascal context dataset. The result mAp is 25.45 and 29.48 respecitively.

However, when I tried to change the model to ViT-B/32 and ViT-L/14 the result is not good, less than half of ViT-B/16 and the quanlitative result shows that the predicted dense label is generally a mess.

What I did was:

convert weight and backbone and extract text embeddings for ViT-B/32 and ViT-L/14
create a config accoding to ViT-B/16, with modifications:
- change the patch size to 32 for ViT-B/32
- change the pathc size to 14, embed_dims to 1024, num_layers to 24 for ViT-L/14

Is there anything I've done wrong or misunderstood? Do you have any suggestions on why the result is bad?

Thanks in advance.

Modified CLIP visual encoder

Hi and thanks for you're work on MaskCLIP.

I just read the paper and tried to follow it using you're provided codebase.
In the paper you mention that you alter the CLIP image encoder using 1x1 convolutions, but that no training is needed for MaskCLIP.

Iam wondering where in the code you do that?

Thanks in advance,

kind regards,

Pre-trained Weight

Have any plan to share the pre-trained weight ? Thanks.

1 x 1 conv vs linear

What's the different between 1 x 1 and linear? why should we do the replacement?

QUESTION: Connection reset by peer

When I run the command "python tools/maskclip_utils/convert_clip_weights.py --model ViT16 --backbone", it gives the following error

Could anybody help to solve it? Thanks a lot!

Semantic Segmentation Quality

First of all, great work and thank you for open source all the code! I was trying to test the model using Pascal VOC 2010 Dataset but somehow the result does not look quite right. I wonder if it is something I should expect? the command is "python tools/test.py configs/maskclip/maskclip_vit16_520x520_pascal_context_59.py pretrain/ViT16_clip_backbone.pth --show-dir outp

ut/"

About 1x1 convolution

Why does MakClip's 1x1 convolution need no training? Also, why can't I see the 1x1 position in the code?

Key Smoothing and Prompt Denoising

Hello, thanks for releasing the code. Can anyone point me to the code section which implements the key smoothing and prompt denoising as proposed in Section 3.3 in the paper? I tried hard to search in the repo but couldn't find them. Your help is highly appreciated.

Slow inference speed when using RN50

Dear author, thanks for your great work. When testing maskplus with CLIP of RN50, the inference speed is very slow, can u check that?
This is the command:

./tools/dist_test.sh configs/maskclip_plus/zero_shot/maskclip_plus_r50_deeplabv2_r101-d8_512x512_80k_coco
-stuff164k.py work_dirs/maskclip_plus_r50_deeplabv2_r101-d8_512x512_80k_coco-stuff164k/latest.pth 8 --eval mIoU

how split novel and base class when test?

Hi, thanks you for your such a great work.
I wonder how to split novel and base class when we test on three datasets. I didn't find the implementation. I didn't find the implementation.

Can you help me or tell me how to implemente that?
Thanks!

When will the pretrained model be released?

Hi I'm super interested in this paper.
Currently we are retraining the MaskCLIP+ and the results do not seem so goos as in the paper. May I ask when will the pretrained models be released?

How to set ks_thresh and pd_thresh?

I notice that there is a hyperparameter ks_thresh in the code, but it does not mention in the paper. How to set it?

Backbone Pre-train weight?

Hello, thanks for your nice code and nice paper!

One question I wonder is that when I see the code, there are no code show that pre-train weight load to the backbone, I find the pre-train weights are only load to segmentation head, could you show me where are the code to load backbone weights of CLIP encoder? thanks

KeyError: "EncoderDecoder: 'ResNetClip is not in the models registry'"

I want to ask which version of mmcv-full I should use. I installed 1.3.18, but reported the error "KeyError:" EncoderDecoder: 'ResNetClip is not in the models registry' "

Request Demo

Dear authors.

Thank you for your nice work, and congratulations that MaskCLIP is accepted to ECCV as an oral paper!

I have tried your code to run a single image with a set of object classes, but actually failed to obtain meaningful localization results from MaskCLIP.

I'm sorry to bother you, but could you provide a demo file? Batman example would be nice.

Thank you so much!

something wrong with 'mmcv'

Traceback (most recent call last):
File "tools/test.py", line 11, in
from mmcv.parallel import MMDataParallel, MMDistributedDataParallel
ModuleNotFoundError: No module named 'mmcv.parallel'

why the num_classes=0?

hi,thanks for the code ,but when i try to train the MaskCLIP+, I found an error:
model= dict( type='EncoderDecoder', pretrained='open-mmlab://resnet101_v1c', backbone=dict( type='ResNetV1c', depth=101, num_stages=4, out_indices=(0, 1, 2, 3), dilations=(1, 1, 2, 4), strides=(1, 2, 1, 1), norm_cfg=dict(type='SyncBN', requires_grad=True), norm_eval=False, style='pytorch', contract_dilation=True), decode_head=dict( type='MaskClipPlusHead', in_channels=2048, channels=1024, num_classes=0, dropout_ratio=0, ...

why the num_classes of decode_head=0? It may cause an indexerror. how to solve it?
Looking forward to your reply😄

chongzhou96 / maskclip Goto Github PK

maskclip's People

Contributors

Stargazers

Watchers

Forkers

maskclip's Issues

Recommend Projects

Recommend Topics

Recommend Org

Jobs