GithubHelp home page GithubHelp logo

scord's Introduction

Ziyan Yang, Kushal Kafle, Zhe Lin, Scott Cohen, Zhihong Ding, Vicente Ordonez

If you have any questions, you can email [email protected]

Abstract

We propose Subject-Conditional Relation Detection SCoRD, where conditioned on an input subject, the goal is to predict all its relations to other objects in a scene along with their locations. Based on the Open Images dataset, we propose a challenging OIv6-SCoRD benchmark such that the training and testing splits have a distribution shift in terms of the occurrence statistics of <subject, relation, object> triplets. To solve this problem, we propose an auto-regressive model that given a subject, it predicts its relations, objects, and object locations by casting this output as a sequence of tokens. First, we show that previous scene-graph prediction methods fail to produce as exhaustive an enumeration of relation-object pairs when conditioned on a subject on this benchmark. Particularly, we obtain a recall@3 of 83.8% for our relation-object predictions compared to the 49.75% obtained by a recent scene graph detector. Then, we show improved generalization on both relation-object and object-box predictions by leveraging during training relation-object pairs obtained automatically from textual captions and for which no object-box annotations are available. Particularly, for <subject, relation, object> triplets for which no object locations are available during training, we are able to obtain a recall@3 of 33.80% for relation-object pairs and 26.75% for their box locations.

Install

Please follow ALBEF to install the required packages.

Data

Download the training and testing splits here. To download images:

Checkpoint

Download the checkpoint for the removing 50% experiment here.

Evaluation

First, run this command to generate <relation, object, object location> triples:

# start and end indices indicate the index of your target checkpoint in the checkpoint folder. If you only have one checkpoint in the folder, the start flag should be 0 and the end flag should be 1
# chunk size indicates how many batches of evaluation samples should be processed
CUDA_VISIBLE_DEVICES=0 python results_generation.py --root your_checkpoint_folder --start 0 --end 1 --chunk 0 --num_seq 3 --num_beams 5 --chunk_size 100 --round 2

Then, run this command to get evaluation results:

python evaluate_results.py --results_folder your_checkpoint_folder/oidv6_results/  --report_unseen True --topk 3

Training

First, download the pre-trained checkpoint from PEVL: Run:

CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 python -m torch.distributed.launch --nproc_per_node=8 --master_port=12888 --use_env run_relation_train.py --config configs/relation_grounding.yaml --output_dir your_checkpoint_folder --checkpoint pevl_pretrain.pth

Acknowledgement

We would like to thank ALBEF and PEVL. Their released codebases help a lot in this project.

Citing

If you think this work is interesting, please consider to cite it:

@inproceedings{yang2024scord,
  title={SCoRD: Subject-Conditional Relation Detection with Text-Augmented Data},
  author={Yang, Ziyan and Kafle, Kushal and Lin, Zhe and Cohen, Scott and Ding, Zhihong and Ordonez, Vicente},
  booktitle={Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision},
  pages={5731--5741},
  year={2024}
}

scord's People

Contributors

ziyanyang avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.