GithubHelp home page GithubHelp logo

artanic30 / maccap Goto Github PK

View Code? Open in Web Editor NEW
6.0 4.0 0.0 2.81 MB

AAAI 2024 Accepted Paper Mining Fine-Grained Image-Text Alignment for Zero-Shot Captioning via Text-Only Training

Python 99.57% Shell 0.43%

maccap's Introduction

MacCap

AAAI 2024 Accepted Paper Mining Fine-Grained Image-Text Alignment for Zero-Shot Captioning via Text-Only Training

Overview

Setup

First, download and set up the repo:

git clone https://github.com/Artanic30/MacCap
cd MacCap
conda env create -f environment.yml
conda activate MacCap

Data preparation

Download coco_train to data. Download cc3m_train to data.

Training

./train_coco.sh

or

./train_cc3m.sh

Evaluation

Follow the instruction here to evaluate generated captions.

Citation

@article{qiu2024mining,
  title={Mining Fine-Grained Image-Text Alignment for Zero-Shot Captioning via Text-Only Training},
  author={Qiu, Longtian and Ning, Shan and He, Xuming},
  journal={arXiv preprint arXiv:2401.02347},
  year={2024}
}

Acknowledgments

This repository is heavily based on ClipCap, DeCap. For training we used the data of COCO dataset and Conceptual Captions.

Release Schedule

  • Initial Code release
  • Detail Document
  • Data Preparation
  • Training and Evaluation Scripts
  • Checkpoints

maccap's People

Contributors

artanic30 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

maccap's Issues

Request for Checkpoint File

Hi, I am very interested in your code, but I noticed that you haven't released the checkpoint file yet. Would it be possible for you to share the checkpoint file sometime soon? Thanks!

Reasons for the secondary combination of regional features and attention weights

Great work!I have the following questions, in the last operation to obtain the region feature, the attention weight is the attention weight of the previous layer ? if so, the patch token already obtained by interacting with the attention weight A, is it just to obtain the mutual information between valid patch tokens here?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.