GithubHelp home page GithubHelp logo

yunxinli / ndcr Goto Github PK

View Code? Open in Web Editor NEW
8.0 2.0 3.0 18.61 MB

A Neural Divide-and-Conquer Reasoning Framework for Multimodal Reasoning on Linguistically Complex Text and Similar Images

License: Apache License 2.0

Python 100.00%

ndcr's Introduction

NDCR

Google Scholar Yunxin Li

Our paper "A Neural Divide-and-Conquer Reasoning Framework for Image Retrieval from Linguistically Complex Text" have been accepted to ACL 2023 Main Conference. This project aims to introduce the divide-and-conquer and neural-symbolic reasoning appraoches to handle the complex text-image reasoning problem.

How to Run

Environment

  1. Basic Setting
    Python==3.7.0 (>=) torch==1.10.1+cu111, torchaudio==0.10.1+cu111, torchvision==0.11.2+cu111, transformers==4.18.0

  2. unzip src_transformers.zip, volta_src.zip, CLIP.zip to the current home path. In addition, you may need download the image source from the ImageCode to the /data/game/. We release the pre-training checkpoint about the phase 1: proposition generator and the pretraining OFA checkpoint in the Huggingface repository: https://huggingface.co/YunxinLi/pretrain_BART_generator_coldstart_OFA

  3. Prepare the OFA-version Transformer
    git clone --single-branch --branch feature/add_transformers https://github.com/OFA-Sys/OFA.git
    pip install OFA/transformers/
    git clone https://huggingface.co/OFA-Sys/OFA-large

Training

python OFA_encoder_Divide_and_Conquer.py --lr 3e-5 --lr_head 4e-5 -b 32 -m ViT-B/16 -a gelu --logit_scale 1000 --add_input True --positional --frozen_clip

Experience

  1. The training approach of large multimodal model will affect the final result on the testing set.
  2. The evaluation result on the validation set may often be identical to the performance on the testing set.
  3. Adjust the random seed in the experiments may not bring some improvement, please not focus on this point.
  4. This dataset is very challenging, especially for samples whose images are from the video source.
  5. One significant research direction is the image modeling problem for highly similar images. It will improve the performance of NDCR.

Acknowledge

Thanks everyone for your contributions. If you like our work and use it in your projects, please cite our work:

@article{li2023neural,
  title={A Neural Divide-and-Conquer Reasoning Framework for Image Retrieval from Linguistically Complex Text},
  author={Li, Yunxin and Hu, Baotian and Ding, Yunxin and Ma, Lin and Zhang, Min},
  journal={arXiv preprint arXiv:2305.02265},
  year={2023}
}

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.