Dataset and code for our paper: "Exploring GPT-4 Vision for Text-to-Image Synthesis Evaluation"
- (1/14/2024) Add 3 coco segmentations, see
human_subset
,object_subset
,overlap_subset
. - (1/13/2024) Add 2 more text-to-image synthesis results: stable diffusion-XL turbo and Muse, see
sdxl-turbo
,openmuse
, respectively. - (12/8/2023) Add 50 samples generated by stable diffusion and GLIDE.
- Environment python=3.9.18 torch=1.12.1+cu113
- directories
coco
is the ground truth image from COCO2014 validation dataset,glide
is the image generated from GLIDE,sdv1-5
is the synthesized image from stable diffusion v1-5. Each subset has 50 samples generated from same text prompts.
import json
with open('caption.json') as f:
list_caption = json.load(f)
captions = [item['caption'] for item in list_caption]
img_pths = [item['image'] for item in list_caption]