GithubHelp home page GithubHelp logo

sunanhe / mkt Goto Github PK

View Code? Open in Web Editor NEW
112.0 2.0 6.0 1.85 MB

Official implementation of "Open-Vocabulary Multi-Label Classification via Multi-Modal Knowledge Transfer".

License: MIT License

Python 100.00%
multi-label-classification open-vocabulary pytorch transfer-learning

mkt's People

Contributors

sunanhe avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

mkt's Issues

nus_wide_test.h5,nus_wide_train.h5 unavailable

你好,我在尝试复现您的实验,但是不知道您论文中数据集的组织结构是如何构建的。
您给出的数据集地址中下载的数据集并不是按您的架构组织的,所以您给出架构中的feature和Flicker文件夹中应该是放入哪些文件?
还有就是nus_wide_test.h5,nus_wide_train.h5两个文件是从哪下载的,我在您给出的NUS_WIDE官网中并没有找出这两个文件的下载地址,能麻烦指导下吗。
本人水平有限,如果问了很笨比的问题,请多多包涵

The improvement of the model's recognition performance on unseen classes.

In the process of reproducing, I found that directly using CLIP for multi-label zero-shot image classification tasks on the NUSWIDE dataset yielded the following results:
截图

It seems to align with the best results listed in your paper. Does the proposed method contribute to the improvement in recognizing samples from unseen classes? Or is the state-of-the-art performance primarily attributed to the powerful capabilities of the CLIP network?

negative samples in ranking loss?

Your paper is really a good work about Open-Vocabulary Multi-Label Classification. But I have two questions:
--- How do you choose negative samples in your ranking loss?
--- The paper states that "Motivated by CoOp (Zhou et al. 2021), we introduce prompt tuning for the adaptation of label embedding. During the tuning process, all parameters except for context embedding of prompt template illustrated as the dotted box named prompt template in Figure 2, are fixed. " what does parameters of context embedding of prompt template mean?

Reproducing results on NUS-WIDE

Hello~, I'm interested in your work and reproducing experiment results with your code. I find that the mAP in ZSL setting on NUS-WIDE dataset after first-stage training is 42.2 using the checkpoint provided, but the result I reproduced is around 36. Other results like F1 or in GZSL setting are close with your results. I follow the settings and hyperparameters in the paper. I cannot find the reason. Could you help me?

OSError

Thanks for releasing the code , when I use the code and data as you provide , I got this error when I train nus-wide on the first stage.
企业微信截图_16945737092574

so, I check the download file "nus_wide_train.h5"from google drive, it is the same, 7.40G.
Then I print the label during dataloader process, I got like bellow(many labels got None):

image

I try to fix this problem as bellow:
企业微信截图_16945739583664

However,I still got dataload error during trainning, could you help me fix it ?

"label_emb.pt" file

Nice job! I would like to ask how the "label_emb.pt" file came about and what to do if I want to generate the "label_emb.pt" file myself?
您好! 我想请问一下,"label_emb.pt"文件是怎么来的,如果要自己生成"label_emb.pt"文件该怎么做?
Thanks! 谢谢!

NUS-Wide Images

This paper is great I am just wondering about the implementation when using the NUS-Wide dataset is it intended that you should download all the images from the NUS-Wide image list as there is no provided file for all the images?

If so which image sizes did this model use as they provide three different sizes for each image?

Regarding the "nus wide" label

Hi, Thanks for your excenllent work! Regarding the "nus wide" label, when the label is -1, does it indicate that the corresponding category is not present in the image or that the category is not annotated for that image?

Prompt tuning label embedding

Why is the label embedding in stage2 initialized by nn.embedding instead of CLIP text_encoder?

self.token_embedding = nn.Embedding(args.vocab_size, args.transformer_width)


self.label_emb = torch.zeros((len(self.name_lens), max(self.name_lens), self.transformer_width)).to(self.device)  
for i, embed in enumerate(self.token_embedding(self.label_token)):  
    self.label_emb[i][:self.name_lens[i]] = embed[4:4+self.name_lens[i]].clone().detach()  

Initial label embeddings

Hey, I was wondering how you generated the label embeddings which are in the other files section used for the first training round as I couldn't see in the paper how these were generated unless it is just using the clip ViT model?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.