Intel Egocentric Vision Dataset

This is a dataset for the recognition of handled objects using a wearable camera, collected by Matthai Philipose and Xiaofeng Ren at Intel Research Seattle. It includes ten video sequences from two human subjects manipulating 42 everyday objects.

The dataset is organized as follows:

frames: full-resolution (1024x768) frames in JPEG format, in 10 sequences (1.5G each)
01 02 03 04 05 06 07 08 09 10
exemplars: clean exemplars for all the objects (~550 total)
labels: labels (1-42) for the frames, 0 for no object occurrence
segmentations: human-marked segmentations (~10 per object)

In our benchmark analysis, Sequence {1,2,5,7,9} are used for training (from subject 1) and {3,4,6,8,10} are used for testing (from subject 2). Background frames with no object occurence are not used. As objects may have different numbers of occurrences, we normalize for it when reporting the average recognition rate.

Please refer to the following paper for our work on combining object recogntion with video segmentation:

Figure-Ground Segmentation Improves Handled Object Recognition in Egocentric Video
Xiaofeng Ren and Chunhui Gu, at CVPR 2010

An early analysis of the dataset can be found in our Egovision '09 workshop paper:

Egocentric Recognition of Handled Objects: Benchmark and Analysis
Xiaofeng Ren and Matthai Philipose, the First Workshop on Egocentric Vision 2009

Xiaofeng Ren
Intel Research Seattle

(updated 01/2016)