D. Hoiem, A.A. Efros, M. Hebert, "Occlusion-based Image Segmentation", submitted to IJCV. (ask for draft)
V. Hedau,
D. Hoiem, and D.A. Forsyth, “Recovering the Spatial Layout of Cluttered Rooms”,
ICCV 2009. pdf
G. Wang,
D. Hoiem, and D.A. Forsyth, “Learning Image Similarity from Flickr Groups Using Stochastic Intersection Kernel Machines”,
ICCV 2009. pdf
A.
Farhadi, I. Endres, D. Hoiem, and D.A. Forsyth, “Describing Objects by
their Attributes”, CVPR 2009. pdf
; project
G.
Wang, D. Hoiem, and D.A. Forsyth, “Building Text Features for Object
Image Classification”, CVPR
2009. pdf
S.K.
Divvala, D. Hoiem, J.H. Hays, A.A. Efros, and M. Hebert, “An Empirical
Study of Context in Object Detection”, CVPR 2009. pdf
T.L. Berg, A. Sorokin, G. Wang, D.A. Forsyth, D. Hoiem, A. Farhadi, I. Endres, "It's All About the Data", Accepted to Proc. IEEE , Special Issue on Internet Vision.
M.
Szummer, P. Kohli, and D. Hoiem, “Learning CRFs using Graph Cuts”, ECCV 2008. pdf
D.
Hoiem, A.A. Efros, and M. Hebert, “Closing the Loop on Scene Interpretation”,
CVPR 2008. pdf
D.
Hoiem, A.A. Efros, and M. Hebert, “Putting Objects in Perspective”,
IJCV (80), No. 1, October 2008. pdf
D.
Hoiem, "Seeing the World Behind the Image: Spatial Layout for 3D Scene
Understanding", doctoral dissertation, CMU-RI-TR-07-28, Robotics
Institute, Carnegie Mellon University, August 2007. pdf
CMU School of Computer
Science Distinguished Dissertation Award, ACM Doctoral Dissertation
Honorable Mention
A.N. Stein, D. Hoiem, and M. Hebert, "Learning to Extract Object
Boundaries using Motion Cues", ICCV 2007. pdf
D. Hoiem, A.N. Stein, A.A. Efros, and M. Hebert, "Recovering Occlusion
Boundaries from a Single Image", ICCV 2007. pdf
J-F. Lalonde, D. Hoiem, A.A. Efros, J. Winn, C. Rother and A. Criminisi,
"Photo Clip Art", ACM SIGGRAPH 2007. pdf
; project
D. Hoiem, C. Rother, and J. Winn, "3D LayoutCRF for Multi-View Object
Class Recognition and Segmentation", CVPR 2007. pdf
D. Hoiem, A.A. Efros, and M. Hebert, "Recovering Surface Layout from an
Image", IJCV, Vol. 75, No. 1, October 2007. pdf
2006
Recovering the Spatial Layout of Cluttered Rooms
We propose to model indoor scenes with a 3D cuboid that gives a rough sense of the global 3D structure and a surface layout that indicates which pixels belong to each surface. By combining the global and local 3D representations, we are able to better estimate each. We also can estimate which portions of the 3D scene are occupied by objects.
Learning Image Similarity from Flickr Groups Using Stochastic Intersection Kernel Machines
We propose an online learning method for SVMs with Histogram Intersection Kernels that achieves similar accuracy as batch training but much, much faster. We apply it estimate which Flickr groups an image is likely to belong to and show that this estimated membership provides a useful similarity measure.
Describing
Objects by their Attributes
We show how, when we learn semantic attributes, we
can say what is unusual about an object or learn to identify objects from
verbal description. We also propose
a feature selection method that suppresses attribute correlations through the
object (e.g., many cars have both wheels and are made of metal), so that our
attribute predictors are not confused by correlated attributes (which may lead
to good quantitative accuracy in some tests but poor performance in tasks that
require the semantics to be correct).
We also study cross-category generalization, looking at how well
attributes trained on one set of object categories can generalize to a new set.
Building
Text Features for Object Image Classification
We propose a text feature representation that is
computed by computing histograms of the tags and groups associated with nearby
images in Flickr. This allows
improvements in image classification, providing a good way to leverage massive-scale
internet photo sharing sites.
An
Empirical Study of Context in Object Detection
We use a variety of contextual sources to help
predict the presence, position, and size of objects in an image. We show that this provides substantial
gains on object detection on PASCAL 2008, and we study how context changes the
error patterns. We also
propose a fairly comprehensive organization of sources and uses of context in
vision.
Learning
CRFs Using Graph Cuts
Describes a structured learning approach for
learning CRF parameters, performing the inference step with graph cuts. It is possible to learn many pairwise
potential parameters from dozens of images in several minutes. We apply the method to segmentation and
pixel labeling and show that it works well in non-submodular cases (though
without the usual optimality guarantee).
Closing
the Loop on Scene Interpretation
Describes an approach to allow processes describing
different characteristics of the scene to interact to provide a more accurate
and more cohesive scene interpretation.
Similarly to the original Barrow and Tenenbaum intrinsic image paper, we
use a set of confidence maps, each describing a particular aspect of the scene,
as an interface between the different processes. This approach allows many different
types of scene analysis algorithms to work together while keeping the sample
complexity low. But it requires
that cues are explicitly defined for each pair of processes, making the
algorithm somewhat design-heavy.
Seeing
the World Behind the Image (dissertation)
Covers all of my work on surface layout, occlusion
recovery, and viewpoint-object reasoning, with an extended background and
discussion of future directions.
Recovering
Occlusion Boundaries from a Single Image
We use standard image cues together with surface
and depth cues to recover occlusion boundaries of major free-standing objects
from a single image.
Learning
to Extract Object Boundaries using Motion Cues
We use appearance (color, texture) and motion cues
computed over an oversegmentation to estimate occlusion boundaries from short
video clips.
Recovering
Surface Layout from an Image
We extend our ICCV 2005 paper with further description, analysis, and results.
Accuracy is improved and results on indoor images are demonstrated.
Putting Objects in Perspective
We propose an idea for modeling objects and their relationships to each other,
through the viewpoint, and to the surfaces in the scene. With a simple belief
propagation framework, the viewpoint of the camera is recovered with high
accuracy, and object detection performance improves considerably. Our method
can be used in conjunction with almost any local object detector. The journal version includes derivations
of our scene projection approximation and a new example-based method to recover
viewpoint from the scene gist.
Automatic Photo Pop-up
Describes our method for estimating orientations (ground, vertical, sky) in
outdoor images by robustly estimating the scene structure and learning
appearance-based statistical models of geometry. We show how we can use the
estimated geometry to reconstruct simple 3D models of the scene from a single
image.
Geometric Context from a Single Image
We extend our system from Automatic Photo Pop-up by subclassifying vertical
regions and provide quantitative analysis of the geometric labeling. We also
use the geometric labels as context for object detection, significantly
improving the accuracy. Worth reading if interested in image understanding. If
you've read Automatic Photo Pop-up, much of the new material is in the results
and applications sections (although there are some small changes to our method
described in the other sections).
The journal version (Recovering
Surface Layout from an Image) contains additional description, analysis,
and results. Some tweaks to the
feature set result in improved accuracy, and good results on indoor images are
demonstrated.
Computer Vision for Music Identification
We identify snippets (e.g. ten second clips) of music recorded under noisy
settings by learning a 32-bit representation for the music that can be indexed
for fast retrieval and is robust to typical distortions.
SOLAR: Sound Object Localization and Retrieval in Complex Audio Environments
We learn a discriminative model for distinguishing between sound classes (such
as dog barks) and all other noises. We can then detect and localize certain
sounds in audio, such as that typically found in movies.
Object-Based Image Retrieval Using the Statistics of Images
We learn a semi-naive Bayes model of an object class (e.g. automobiles) from a
few images and retrieve images that contain that object. A feedback loop allows results to
improve. One of the key ideas is
that learning the structure of the data on an unlabeled dataset improves
results when few supervised examples exist.
SnapFind: Brute Force Interactive Image Retrieval
We present a system capable of fast user-guided image retrieval based on local
patches. The emphasis here is on
the system as a whole, which achieves speed due to distributed processing on
local data.