sunanhe / mkt Goto Github PK

View Code? Open in Web Editor NEW

112.0 2.0 6.0 1.85 MB

Official implementation of "Open-Vocabulary Multi-Label Classification via Multi-Modal Knowledge Transfer".

License: MIT License

Python 100.00%

multi-label-classification open-vocabulary pytorch transfer-learning

mkt's People

Contributors

Stargazers

Watchers

Forkers

cc6dll hongbo-sun even-ok woodszp awj2021

mkt's Issues

How long does the training stage take?

Hi, I think this is a great job. I want to know how long does the training stage take and which kind of GPUs you use?

Question about model, clip model

Hi, I am currently reproducing your paper, and I have a question.
According to your paper, backbone is ViT and VLP is clip.
But the first stage code loads model and VLP from same argument (arg.clip_path, https://github.com/sunanhe/MKT/blob/main/train_nus_first_stage.py)
Are they same or different?

nus_wide_test.h5，nus_wide_train.h5 unavailable

你好，我在尝试复现您的实验，但是不知道您论文中数据集的组织结构是如何构建的。
您给出的数据集地址中下载的数据集并不是按您的架构组织的，所以您给出架构中的feature和Flicker文件夹中应该是放入哪些文件？
还有就是nus_wide_test.h5，nus_wide_train.h5两个文件是从哪下载的，我在您给出的NUS_WIDE官网中并没有找出这两个文件的下载地址，能麻烦指导下吗。
本人水平有限，如果问了很笨比的问题，请多多包涵

The improvement of the model's recognition performance on unseen classes.

In the process of reproducing, I found that directly using CLIP for multi-label zero-shot image classification tasks on the NUSWIDE dataset yielded the following results:

It seems to align with the best results listed in your paper. Does the proposed method contribute to the improvement in recognizing samples from unseen classes? Or is the state-of-the-art performance primarily attributed to the powerful capabilities of the CLIP network?

Thanks for your excellent work, I would like to know how to generate this label_emb.pt

TP = (labels[mask] == 1).sum().float() RuntimeError: CUDA error: device-side assert triggered

negative samples in ranking loss?

Your paper is really a good work about Open-Vocabulary Multi-Label Classification. But I have two questions:
--- How do you choose negative samples in your ranking loss?
--- The paper states that "Motivated by CoOp (Zhou et al. 2021), we introduce prompt tuning for the adaptation of label embedding. During the tuning process, all parameters except for context embedding of prompt template illustrated as the dotted box named prompt template in Figure 2, are fixed. " what does parameters of context embedding of prompt template mean?

Dataset code for Open IMages

Hi, I am wondering if you could provide training code for Open Images as well :) Thanks in advance..!

Reproducing results on NUS-WIDE

Hello~, I'm interested in your work and reproducing experiment results with your code. I find that the mAP in ZSL setting on NUS-WIDE dataset after first-stage training is 42.2 using the checkpoint provided, but the result I reproduced is around 36. Other results like F1 or in GZSL setting are close with your results. I follow the settings and hyperparameters in the paper. I cannot find the reason. Could you help me?

OSError

Thanks for releasing the code , when I use the code and data as you provide , I got this error when I train nus-wide on the first stage.

so, I check the download file "nus_wide_train.h5"from google drive, it is the same, 7.40G.
Then I print the label during dataloader process, I got like bellow(many labels got None):

I try to fix this problem as bellow:

However，I still got dataload error during trainning， could you help me fix it ?

how to make label_emb.pt

label_emb.pt
请问新数据，怎么制作label_emb.pt

"label_emb.pt" file

Nice job! I would like to ask how the "label_emb.pt" file came about and what to do if I want to generate the "label_emb.pt" file myself?
您好! 我想请问一下，"label_emb.pt"文件是怎么来的，如果要自己生成"label_emb.pt"文件该怎么做?
Thanks! 谢谢!

nice work！

when will you release code～？ thanks

NUS-Wide Images

This paper is great I am just wondering about the implementation when using the NUS-Wide dataset is it intended that you should download all the images from the NUS-Wide image list as there is no provided file for all the images?

If so which image sizes did this model use as they provide three different sizes for each image?

Regarding the "nus wide" label

Hi, Thanks for your excenllent work! Regarding the "nus wide" label, when the label is -1, does it indicate that the corresponding category is not present in the image or that the category is not annotated for that image?

Provide more detail on CNN backbone

Hi! The dimension of the last fully connected layer is not mentioned in the paper. Can you provide more detail?

Prompt tuning label embedding

Why is the label embedding in stage2 initialized by nn.embedding instead of CLIP text_encoder?

self.token_embedding = nn.Embedding(args.vocab_size, args.transformer_width)


self.label_emb = torch.zeros((len(self.name_lens), max(self.name_lens), self.transformer_width)).to(self.device)  
for i, embed in enumerate(self.token_embedding(self.label_token)):  
    self.label_emb[i][:self.name_lens[i]] = embed[4:4+self.name_lens[i]].clone().detach()

Initial label embeddings

Hey, I was wondering how you generated the label embeddings which are in the other files section used for the first training round as I couldn't see in the paper how these were generated unless it is just using the clip ViT model?

sunanhe / mkt Goto Github PK

mkt's People

Contributors

Stargazers

Watchers

Forkers

mkt's Issues

Recommend Projects

Recommend Topics

Recommend Org

Jobs