tofindwaldo's Introduction

To Find Waldo You Need Contextual Cues: Debiasing Who’s Waldo

This is the official repository for "To Find Waldo You Need Contextual Cues: Debiasing Who’s Waldo".

Yiran Luo, Pratyay Banerjee, Tejas Gokhale, Yezhou Yang, Chitta Baral .
ACL 2022 (Short Paper)

Prerequisites

Follow the instructions from the original Who's Waldo work and acquire the original dataset as well as the source code.
To generate the needed bottom-up image features, you may either use the original repo or the pytorch re-implementation up to your discretion.
In order to train/test with our splits, simply replace ./dataset_meta/ in the original source code repo with ours, and rerun the data preprocessing steps. We also provide a customizeable training config file config/train-whos-waldo-new-finetune.json for convenience.

License

MIT

Citation

@inproceedings{luo-etal-2022-find,
    title = "To Find Waldo You Need Contextual Cues: Debiasing Who{'}s Waldo",
    author = "Luo, Yiran and Banerjee, Pratyay and Gokhale, Tejas and Yang, Yezhou and Baral, Chitta",
    booktitle = "Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)",
    month = may,
    year = "2022",
    url = "https://aclanthology.org/2022.acl-short.39",
    pages = "355--361",
}

tofindwaldo's People

Contributors

Stargazers

Watchers

tofindwaldo's Issues

Duplicate id's

There appear to be some duplicate id's in the splits.

I used following Python script to identify them:

seen = set()
dupes = [x for x in ids if x in seen or seen.add(x)]

Where ids is a list of all the ids in train, test and val combined.

Following file contains the duplicates I found: dupes.txt

Please correct me if I'm wrong on this.

About max_txt_len.

May I ask if your implementation, when training and testing, filters out examples with text length >100?
Are the results you report in your paper based on this implementation?