Web-scale Scene Parsing

The amount of labeled training data required for image interpretation tasks is a major drawback of current methods. How can we use the gigantic collection of unlabeled images available on the web to aid these tasks? In this paper, we present a simple approach based on the notion of patch-based context to extract useful priors for regions within a query image from a large collection of (6 million) unlabeled images. This contextual prior over image classes acts as a non-redundant complimentary source of knowledge that helps in disambiguating the confusions within the predictions of local region-level features. We demonstrate our approach on the challenging tasks of region classification and surface-layout estimation.

Code (partly available)

Much of the code used here is based on a project involving lazy-learning object detection.
lazy learning zip file


"Unsupervised Patch-based Context from Millions of Images"
Santosh K. Divvala, Alexei A. Efros, Martial Hebert, Svetlana Lazebnik
Technical Report TR-11-38, CMU, 2011
[Paper] [Presentation]

"Can Similar Scenes help Surface Layout Estimation?"
Santosh K. Divvala, Alexie A. Efros, Martial Hebert
Computer Vision and Pattern Recognition (CVPR) 2008, Internet Vision Workshop (Oral Presentation)
[Paper] [BIBTEX] [Presentation]

Video of the talk (thanks to Paul Rybski!)