Bryan C. Russell, Alexei A. Efros, Josef Sivic, William T. Freeman, Andrew Zisserman
Given a large dataset of images, we seek to automatically determine the visually similar object and scene classes together with their image segmentation. To achieve this we combine two ideas: (i) that a set of segmented objects can be partitioned into visual object classes using topic discovery models from statistical text analysis; and (ii) that visual object classes can be used to assess the accuracy of a segmentation. To tie these ideas together we compute multiple segmentations of each image and then: (i) learn the object classes; and (ii) choose the correct segmentations. We demonstrate that such an algorithm succeeds in automatically discovering many familiar objects in a variety of image datasets, including those from Caltech, MSRC and LabelMe.
1. Download the code and untar it.
2. Compile the ".cpp" files via the "mex" command in Matlab.
3. Download and install Normalized Cuts.
4. Download and unzip the Oxford interest point and descriptor binaries: OxfordVisualWordBinaries.tar.gz. Note: these Linux binaries were provided by Krystian Mikolajczyk (while at the University of Oxford, now at University of Surrey) and George Matas (CTU Prague). Newer versions of the binaries can be downloaded from the Oxford Visual Geometry Group.
5. Download and unzip the pre-computed visual word clusters: OxfordVisualWordClusters.tar.gz
6. Download the LabelMe Matlab toolbox.
Running the code
The code is demonstrated in "demoLabelMe.m". To run, adjust the global variables at the top of the script to point to the locations of the downloaded libraries, binaries, and pre-computed clusters above. Also, be sure to download the tested LabelMe images (and their corresponding annotations, which is used for validation).
We ran experiments on the following datasets:
Caltech 5 (faces, motorcycles, airplanes, faces, and Google background images)
Microsoft Research Cambridge (MSRC, set A)
LabelMe (Images.tar.gz; Annotations.tar.gz)
Caltech 5 montages
B. C. Russell, A. A. Efros, J. Sivic, W. T. Freeman, and A. Zisserman, Using Multiple Segmentations to Discover Objects and their Extent in Image Collections, IEEE Conference on Computer Vision and Pattern Recognition (CVPR), New York, New York, June, 2006. (Paper: PDF; Poster: PDF, PPT)