Publications

Listed by Topic

 

M. Szummer, P. Kohli, and D. Hoiem, “Learning CRFs using Graph Cuts”, ECCV 2008. pdf

 

D. Hoiem, A.A. Efros, and M. Hebert, “Closing the Loop on Scene Interpretation”, CVPR 2008. pdf

 

D. Hoiem, A.A. Efros, and M. Hebert, “Putting Objects in Perspective”, IJCV, in press. pdf

 

D. Hoiem, "Seeing the World Behind the Image: Spatial Layout for 3D Scene Understanding", doctoral dissertation, CMU-RI-TR-07-28, Robotics Institute, Carnegie Mellon University, August 2007. pdf

A.N. Stein, D. Hoiem, and M. Hebert, "Learning to Extract Object Boundaries using Motion Cues", ICCV 2007. pdf

D. Hoiem, A.N. Stein, A.A. Efros, and M. Hebert, "Recovering Occlusion Boundaries from a Single Image", ICCV 2007. pdf

J-F. Lalonde, D. Hoiem, A.A. Efros, J. Winn, C. Rother and A. Criminisi, "Photo Clip Art", ACM SIGGRAPH 2007. pdf ; project

D. Hoiem, C. Rother, and J. Winn, "3D LayoutCRF for Multi-View Object Class Recognition and Segmentation", CVPR 2007. pdf

D. Hoiem, A.A. Efros, and M. Hebert, "Recovering Surface Layout from an Image", IJCV, Vol. 75, No. 1, October 2007. pdf

D. Hoiem, A.A. Efros, and M. Hebert, "Putting Objects in Perspective", CVPR 2006. Best Paper Award pdf ; project

B. Nabbe, D. Hoiem, A.A. Efros, and M. Hebert, "Opportunistic use of vision to push back the path-planning horizon", IROS 2006. pdf

D. Hoiem, A.A. Efros, and M. Hebert, "Geometric Context from a Single Image", ICCV 2005. pdf ; project

D. Hoiem, A.A. Efros, and M. Hebert, "Automatic Photo Pop-up", ACM SIGGRAPH 2005. pdf ; project

Y. Ke, D. Hoiem, and R. Sukthankar, "Computer Vision for Music Identification", IEEE Conference on Computer Vision and Pattern Recognition, 2005. pdf ; project

D. Hoiem, Y. Ke, and R. Sukthankar, "SOLAR: Sound Object Localization and Retrieval in Complex Audio Environments", IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2005. pdf ; project

D. Hoiem, R. Sukthankar, H. Schneiderman, and L. Huston, "Object-Based Image Retrieval Using the Statistics of Images", IEEE Conference on Computer Vision and Pattern Recognition, 2004. pdf ; project

L. Huston, R. Sukthankar, D. Hoiem, and J. Zhang, "SnapFind: Brute Force Interactive Image Retrieval", IEEE International Conference on Image Processing and Graphics, 2004. pdf


Publication Summaries

 

Closing the Loop on Scene Interpretation

Describes an approach to allow processes describing different characteristics of the scene to interact to provide a more accurate and more cohesive scene interpretation.  Similarly to the original Barrow and Tenenbaum intrinsic image paper, we use a set of confidence maps, each describing a particular aspect of the scene, as an interface between the different processes.  This approach allows many different types of scene analysis algorithms to work together while keeping the sample complexity low.  But it requires that cues are explicitly defined for each pair of processes, making the algorithm somewhat inelegant and design-heavy.

 

Seeing the World Behind the Image (dissertation)

Covers all of my work on surface layout, occlusion recovery, and viewpoint-object reasoning, with an extended background and discussion of future directions. 

 

Recovering Occlusion Boundaries from a Single Image

We use standard image cues together with surface and depth cues to recover occlusion boundaries of major free-standing objects from a single image.

 

Learning to Extract Object Boundaries using Motion Cues

We use appearance (color, texture) and motion cues computed over an oversegmentation to estimate occlusion boundaries from short video clips.

 

Recovering Surface Layout from an Image
We extend our ICCV 2005 paper with further description, analysis, and results. Accuracy is improved and results on indoor images are demonstrated.

Putting Objects in Perspective
We propose an idea for modeling objects and their relationships to each other, through the viewpoint, and to the surfaces in the scene. With a simple belief propagation framework, the viewpoint of the camera is recovered with high accuracy, and object detection performance improves considerably. Our method can be used in conjunction with almost any local object detector.  The journal version includes derivations of our scene projection approximation and a new example-based method to recover viewpoint from the scene gist. 

Automatic Photo Pop-up
Describes our method for estimating orientations (ground, vertical, sky) in outdoor images by robustly estimating the scene structure and learning appearance-based statistical models of geometry. We show how we can use the estimated geometry to reconstruct simple 3D models of the scene from a single image.


Geometric Context from a Single Image
We extend our system from Automatic Photo Pop-up by subclassifying vertical regions and provide quantitative analysis of the geometric labeling. We also use the geometric labels as context for object detection, significantly improving the accuracy. Worth reading if interested in image understanding. If you've read Automatic Photo Pop-up, much of the new material is in the results and applications sections (although there are some small changes to our method described in the other sections).  The journal version (Recovering Surface Layout from an Image) contains additional description, analysis, and results.  Some tweaks to the feature set result in improved accuracy, and good results on indoor images are demonstrated.

Computer Vision for Music Identification
We identify snippets (e.g. ten second clips) of music recorded under noisy settings by learning a 32-bit representation for the music that can be indexed for fast retrieval and is robust to typical distrortions.

SOLAR: Sound Object Localization and Retrieval in Complex Audio Environments
We learn a discriminative model for distinguishing between sound classes (such as dog barks) and all other noises. We can then detect and localize certain sounds in audio, such as that typically found in movies.


Object-Based Image Retrieval Using the Statistics of Images
We learn a semi-naive Bayes model of an object class (e.g. automobiles) from a few images and retrieve images that contain that object.  A feedback loop allows results to improve.  One of the key ideas is that learning the structure of the data on an unlabeled dataset improves results when few supervised examples exist.  

SnapFind: Brute Force Interactive Image Retrieval
We present a system capable of fast user-guided image retrieval based on local patches.  The emphasis here is on the system as a whole, which achieves speed due to distributed processing on local data.


Home



GoStats.com