GithubHelp home page GithubHelp logo

mm23-mith's Introduction

MITH

Multi-Granularity Interactive Transformer Hashing for Cross-modal Retrieval

1. Introduction

This is the source code of ACMMM 2023 paper "Multi-Granularity Interactive Transformer Hashing for Cross-modal Retrieval".

The main architecture of MITH:

The experimental result:

2. Requirements

  • python 3.7.16
  • pytorch 1.9.1
  • torchvision 0.10.1
  • numpy
  • scipy
  • tqdm
  • pillow
  • einops
  • ftfy
  • regex
  • ...

3. Preparation

3.1 Download pre-trained CLIP

Pretrained CLIP model could be found in the 30 lines of CLIP/clip/clip.py. This code is based on the "ViT-B/32". You should download "ViT-B/32" and put it in ./cache, or you can find it from the following link:

link:https://pan.baidu.com/s/1ZyDTR2IEHlY4xIdLgxtaVA password:kdq7

3.2 Generate dataset

You should generate the following *.mat file for each dataset. The structure of directory ./dataset should be:

    dataset
    ├── coco
    │   ├── caption.mat 
    │   ├── index.mat
    │   └── label.mat 
    ├── flickr25k
    │   ├── caption.mat
    │   ├── index.mat
    │   └── label.mat
    └── nuswide
        ├── caption.mat
        ├── index.mat 
        └── label.mat

Please preprocess the dataset to the appropriate input format.

More details about the generation, meaning, and format of each mat file can be found in ./dataset/README.md.

Additionally, cleaned datasets (MIRFLICKR25K & MSCOCO & NUSWIDE) used in our experiments are available at pan.baidu.com:

link:https://pan.baidu.com/s/1ZyDTR2IEHlY4xIdLgxtaVA password:kdq7

4. Train

After preparing the Python environment, pretrained CLIP model, and dataset, we can train the MITH model.

4.1 Train on MIRFlickr25K

python main.py --is-train --dataset flickr25k --query-num 2000 --train-num 10000 --result-name "RESULT_MITH_FLICKR" --k-bits 64

5. Test

5.1 Test on MIRFlickr25K

python main.py --dataset flickr25k --query-num 2000 --train-num 10000 --result-name "RESULT_MITH_FLICKR" --k-bits 64 --pretrained=MODEL_PATH

More scripts for training and testing are given at ./run_MITH.sh.

6. Citation

If you find our approach useful in your research, please consider citing:

Yishu Liu, Qingpeng Wu, Zheng Zhang, Jingyi Zhang, and Guangming Lu. 2023. Multi-Granularity Interactive Transformer Hashing for Cross-modal Retrieval. In Proceedings of the 31st ACM International Conference on Multimedia (MM ’23). https://doi.org/10.1145/3581783.36.

7. Any question

If you have any questions, please feel free to contact Yishu Liu ([email protected]) or Qingpeng Wu ([email protected]).

mm23-mith's People

Contributors

darrenzzhang avatar

Stargazers

 avatar Xiang Liu avatar  avatar Bob Monl avatar shuaichaochao avatar Youguang Xing avatar  avatar Yunnong avatar junfeng tu avatar je avatar  avatar gouhanjian avatar TeaQwQTea avatar 柒酒 avatar

Watchers

 avatar

mm23-mith's Issues

Feature Extract backbone

你好,论文中的 feature backbone 指的是 ViT 和GPT-2 分别提取 文本和图像信息。但是我发现代码中模型命名为CLIP。我想问一下CLIP的image和text特征提取部分是 ViT 和 GPT-2吗?这是自己训练/微调的吗,还是直接使用别人预训练的模型呢?如果是使用别人预训练的模型,方便给一下官方出处的链接吗。

代码部分

https://github.com/DarrenZZhang/MITH/blob/159bbc7fad6d6cb41d40fc1fdf0e169054de71ea/hash_model.py#L250-L261

论文描述
image

pr曲线

您好,可以要一份你们pr曲线的代码吗,想用MITH做baseline,万分感谢

百度网盘链接失效

想用MITH做baseline,但是发现百度网盘链接涉嫌违规,可以重新上传吗?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.