DOCCI

Summary

Introduction

DOCCI (Descriptions of Connected and Contrasting Images) is a collection of images paired with detailed descriptions. The descriptions explain the key elements of the images, as well as secondary information such as background, lighting, and settings. The images are specifically taken to help assess the precise visual properties of images. DOCCI also includes many related images that vary in having key differences from the others. All descriptions are manually annotated to ensure they adequately distinguish each image from its counterparts.

Dataset Structure

Data Instances

{
    'image': <PIL.JpegImagePlugin.JpegImageFile image mode=RGB size=1536x2048>,
    'example_id': 'qual_dev_00000',
    'description': 'An indoor angled down medium close-up front view of a real sized stuffed dog with white and black colored fur wearing a blue hard hat with a light on it. A couple inches to the right of the dog is a real sized black and white penguin that is also wearing a blue hard hat with a light on it. The dog is sitting, and is facing slightly towards the right while looking to its right with its mouth slightly open, showing its pink tongue. The dog and penguin are placed on a gray and white carpet, and placed against a white drawer that has a large gray cushion on top of it. Behind the gray cushion is a transparent window showing green trees on the outside.'
}

Data Fields

Name	Explanation
image	PIL.JpegImagePlugin.JpegImageFile
example_id	The unique ID of an example follows this format :<SPLIT_NAME>_<EXAMPLE_NUMBER>.
description	Text description of the associated image.

Data Splits

Dataset	Train	Test	Qual Dev	Qual Test
DOCCI	9,647	5,000	100	100
DOCCI-AAR	4,932	5,000	--	--

Reference

We would like to acknowledge Yasumasa Onoe and Sunayana Rane et al. for creating and maintaining the DOCCI dataset as a valuable resource for the computer vision and machine learning research community. For more information about the DOCCI dataset and its creator, please visit the DOCCI website.

License

The dataset has been released under the Creative Commons 4.0 International License.

Citation

@inproceedings{OnoeDocci2024,
  author        = {Yasumasa Onoe and Sunayana Rane and Zachary Berger and Yonatan Bitton and Jaemin Cho and Roopal Garg and
    Alexander Ku and Zarana Parekh and Jordi Pont-Tuset and Garrett Tanzer and Su Wang and Jason Baldridge},
  title         = {{DOCCI: Descriptions of Connected and Contrasting Images}},
  booktitle     = {arXiv},
  year          = {2024}
}