@robvanvolt has created a much more fleshed out version here: https://github.com/robvanvolt/DALLE-datasets
None of this code works yet. If you'd like to contribute, create a pull request! We need all the datasets we can get. Otherwise come back in a few weeks to check on progress.
This repository includes metadata and instructions for downloading many captioned datasets + generated captions from labels.
Thanks to @yashbonde, we eventually intend to include generated captions for a variety of datasets that don't include captions.
Since this is a highly versatile dataset we have a common format for each sample:
{
"image_id": {
"labels": ["car", "chair", "something else"],
"score": [0, 1, 1],
"caption": "caption goes here",
"dataset": "open_images_v4"
"source_split": "train",
"original_language": "eng",
}
}
image_id
: this will be expanded to the complete filepath when traininglabels
: in case the given images has labels add those here, default isNone
score
: in case there is a score against that labels eg. OpenImages, default isNone
caption
: generated caption goes heresource_split
: what split was this a part of in the datasset it is ofdataset
: key of the dataset nameoriginal_language
: in case this has multilingual dataset use ISO-639-2 code
name | size | image count | link | used for VAE | captions given | captions generated |
---|---|---|---|---|---|---|
Downscale OpenImagesv4 | 16GB | 1.9M | torrent | ✅ | ||
Stanford STL-10 | 2.64GB | 113K | torrent | ✅ | ||
CVPR Indoor Scene Recognition | 2.59GB | 15620 | torrent | ✅ | ||
The Visual Genome Dataset v1.0 + v1.2 Images | 15.20GB | 108K | torrent | ✅ | ✅ | |
Food-101 | 5.69GB | 101K | torrent | ✅ | ||
The Street View House Numbers (SVHN) Dataset | 2.64GB | 600K | torrent | ✅ | ||
Downsampled ImageNet 64x64 | 12.59GB | 1.28M | torrent | ✅ | ||
COCO 2017 | 52.44GB | 287K | torrent website | |||
Flickr 30k Captions (bad data, downloads duplicates) | 8GB | 31K | kaggle | ✅ |
This a big community led effort, find more projects:
You can join the discord for direct communication.