images[2].zip
Return all images in jpg format.
image_data.json.zip
Return meta data about all images
Name | Type | Description |
image_id | int | ID of image |
url | hyperlink string | Visual Genome-hosted image URL |
width | int | width of image in px |
height | int | height of image in px |
coco_id | int | ID of the image in the coco dataset |
flickr_id | int | ID of the image in the flickr dataset |
[...
{
"image_id": 2412112,
"url": "https://cs.stanford.edu/people/rak248/VG_100K/2370463.jpg",
"width": 500,
"height": 281,
"coco_id": 547168,
"flickr_id": 8505158818
}
...]
region_descriptions.json.zip
Return all region descriptions
Name |
Type |
Description |
image_id | int | ID of image containing region |
regions | object array | Array of region descriptions for this image |
| int | ID of region description |
| int | x-coordinate of region bounding box |
| int | y-coordinate of region bounding box |
| int | width of region bounding box |
| int | height of region bounding box |
| str | region description phrase |
| object array | synsets in the description |
| str | synset name |
| str | string from phrase |
| int | index where synset starts in the phrase |
| int | index where synset ends in the phrase |
[...
{
"image_id": 2407890,
"regions": [...
{
"region_id": 1353,
"x": 117,
"y": 79,
"width": 249,
"height": 107,
"phrase": "a cat sitting on a table.",
"synsets": [...
{
"synset_name": "cat.n.01",
"entity_name": "cat",
"entity_idx_start": 2,
"entity_idx_end": 5
},
...]
},
{
"region_id": 1354,
"x": 116,
"y": 29,
"width": 239,
"height": 135,
"phrase": "a white cat with a tan tail and face markings",
"synsets": [...
...]
},
...]
},
{
"image_id": 2407890,
"regions": [...
...]
},
...]
question_answers.json.zip
All visual question answers
Name |
Type |
Description |
image_id | int | ID of image |
qas | object array | list of qas for the image |
| str | ID of question answer |
| str | question |
| str | answer |
| object array | array of sysnets in the question |
| str | synset name |
| str | string from question |
| str | index where synset starts in the question |
| str | index where synset ends in the question |
| object array | array of sysnets in the answer |
| str | synset name |
| str | string from answer |
| int | index where synset starts in the answer |
| int | index where synset ends in the answer |
[...
{
"image_id": 2317993,
"qas": [...
{
"qa_id": 912402,
"question": "Where are the clouds?",
"answer": "sky",
"question_synsets": [...
{
"synset_name": "cloud.n.01",
"entity_name": "cloud",
"entity_idx_start": 14,
"entity_idx_end": 20
},
...],
"answer_synsets": [...
{
"synset_name": "sky.n.01",
"entity_name": "sky",
"entity_idx_start": 0,
"entity_idx_end": 3
},
...]
},
...]
},
...]
objects.json.zip
All object instances
Name |
Type |
Description |
image_id | int | ID of image |
objects | object array | Array of object instances for this image |
| int | ID of object |
| int | x-coordinate of object bounding box |
| int | y-coordinate of object bounding box |
| int | width of object bounding box |
| int | height of object bounding box |
| str | name of object |
| str array | synset names associated with this object |
[...
{
"image_id": 2,
"objects": [...
{
"object_id": 1023847,
"x": 405,
"y": 34,
"w": 78,
"h": 438,
"name": "pole",
"synsets": ["pole.n.01"]
},
{
"object_id": 1023836,
"x": 239,
"y": 347,
"w": 136,
"h": 126,
"name": "car",
"synsets": ["car.n.01"]
},
...]
},
...]
attributes.json.zip
All attributes in the dataset
Name | Type | Description |
image_id | int | ID of image |
attributes | object array | Array of attributes with object instances for this image |
| int | ID of object |
| int | x-coordinate of object bounding box |
| int | y-coordinate of object bounding box |
| int | width of object bounding box |
| int | height of object bounding box |
| str | name of object |
| str array | synset names associated with this object |
| str array | list of attributes associated with this object |
[...
{
"image_id": 2,
"attributes": [...
{
"object_id": 1023847,
"x": 405,
"y": 34,
"w": 78,
"h": 438,
"name": "pole",
"synsets": ["pole.n.01"],
"attributes": ["brown"]
},
{
"object_id": 1023836,
"x": 239,
"y": 347,
"w": 136,
"h": 126,
"name": "car",
"synsets": ["car.n.01"],
"attributes": ["red", "broken"]
},
...]
},
...]
relationships.json.zip
All relationships
Name | Type | Description |
image_id | int | ID of image |
relationships | object array | array of relationships in the image |
| int | ID of relationship |
| int | starting char index of entity |
| str array | synset names associated with the predicate |
| int | ending char index of entity |
| int | ID of object |
| int | x-coordinate of object bounding box |
| int | y-coordinate of object bounding box |
| int | width of object bounding box |
| int | height of object bounding box |
| str | name of object |
| str array | synset names associated with this object |
| int | name of recognized entity |
| int | ID of object |
| int | x-coordinate of object bounding box |
| int | y-coordinate of object bounding box |
| int | width of object bounding box |
| int | height of object bounding box |
| str | name of object |
| str array | synset names associated with this object |
[...
{
"image_id": 2,
"relationships": [...
{
"relationship_id": 15947,
"predicate": "wears",
"synsets": ["wear.v.01"],
"subject": {
"object_id": 1023838,
"x": 324,
"y": 320,
"w": 142,
"h": 255,
"name": "man",
"synsets": ["man.n.01"]
},
"object": {
"object_id": 5071,
"x": 359,
"y": 362,
"w": 72,
"h": 81,
"name": "backpack",
"synsets": ["backpack.n.01"]
},
},
...],
}
...]
synsets.json.zip
All the synsets and their descriptions
Name | Type | Description |
| str | unique synset name |
| str | definition of synset according to WordNet |
[...
{
"synset_name": "phonograph_record.n.01",
"synset_definition": "sound recording consisting of a disk with a continuous groove; used to reproduce music by rotating while a phonograph needle tracks in the groove",
},
{
"synset_name": "truck.n.01",
"synset_definition": "an automotive vehicle suitable for hauling",
}
...]
region_graphs.json.zip
All the region graphs
Name | Type | Description |
image_id | int | ID of image containing region |
regions | object array | Array of region descriptions for this image |
| int | ID of region description |
| int | x-coordinate of region bounding box |
| int | y-coordinate of region bounding box |
| int | width of region bounding box |
| int | height of region bounding box |
| str | region description phrase |
| object array | synsets in the description |
| str | synset name |
| str | string from phrase |
| int | index where synset starts in the phrase |
| int | index where synset ends in the phrase |
| object array | Array of object instances for this image |
| int | ID of object |
| int | x-coordinate of object bounding box |
| int | y-coordinate of object bounding box |
| int | width of object bounding box |
| int | height of object bounding box |
| str | name of object |
| str array | synset names associated with this object |
| object array | array of relationships in the image |
| int | ID of relationship |
| int | starting char index of entity |
| str array | synset names associated with the predicate |
| int | ID of subject (found in objects list) |
| int | ID of object (found in objects list) |
[...
{
"image_id": 2407890,
"regions": [...
{
"region_id": 1353,
"x": 117,
"y": 79,
"width": 249,
"height": 107,
"phrase": "a cat sitting on a table.",
"synsets": [...
{
"synset_name": "cat.n.01",
"entity_name": "cat",
"entity_idx_start": 2,
"entity_idx_end": 5
},
...]
"objects": [...
{
"object_id": 1023838,
"x": 324,
"y": 320,
"w": 142,
"h": 255,
"name": "cat",
"synsets": ["cat.n.01"]
},
{
"object_id": 5071,
"x": 359,
"y": 362,
"w": 72,
"h": 81,
"name": "table",
"synsets": ["table.n.01"]
},
...],
"relationships": [...
{
"relationship_id": 15947,
"predicate": "wears",
"synsets": ["wear.v.01"],
"subject_id": 1023838,
"object_id": 5071,
}
...]
},
...]
},
...]
scene_graphs.json.zip
All the scene graphs
Name | Type | Description |
image_id | int | ID of image containing region |
objects | object array | Array of object instances for this image |
| int | ID of object |
| int | x-coordinate of object bounding box |
| int | y-coordinate of object bounding box |
| int | width of object bounding box |
| int | height of object bounding box |
| str | name of object |
| str array | synset names associated with this object |
.relationships | object array | array of relationships in the image |
| int | ID of relationship |
| int | starting char index of entity |
| str array | synset names associated with the predicate |
| int | ID of subject (found in objects list) |
| int | ID of object (found in objects list) |
[...
{
"image_id": 2407890,
"objects": [...
{
"object_id": 1023838,
"x": 324,
"y": 320,
"w": 142,
"h": 255,
"name": "cat",
"synsets": ["cat.n.01"]
},
{
"object_id": 5071,
"x": 359,
"y": 362,
"w": 72,
"h": 81,
"name": "table",
"synsets": ["table.n.01"]
},
...],
"relationships": [...
{
"relationship_id": 15947,
"predicate": "wears",
"synsets": ["wear.v.01"],
"subject_id": 1023838,
"object_id": 5071,
}
...]
},
...]
qa_to_region_mapping.json.zip
Mapping from qa to their corresponding region descriptions
{...
QA_ID: REGION_DESCRIPTION_ID,
"1885736": "2072251"
...}