This page contains links to some of the projects that I have worked on or that are currently in progress.
Closing the Loop on
Scene Interpretation
We model the interactions among occlusion, surface, object, and viewpoint inference, leading to modest quantitative and more substantial qualitative improvements. We also extend Automatic Photo Pop-up with occlusion, object, and viewpoint information, leading to more robust and detailed models. Finding more robust and flexible models for integrating diverse and numerous inference processes is a long-term interest.
Recovering
Occlusion Boundaries from a Single Image
We recover occlusion boundaries and figure/ground labels by reasoning about region similarity, 3D geometry, boundaries, and junctions. Code is available.
Putting Objects in Perspective
We propose a method for modeling objects, viewpoint, and surfaces in an integrated fashion. With a simple belief propogation framework, the viewpoint of the camera is recovered with high accuracy, and object detection performance improves considerably.
We extend our framework from Automatic Photo Pop-up by subclassifying vertical regions, providing extensive quantitative evaluation, and demonstrating the usefulness of the geometric labels as context for object detection.
Our system estimates which regions
of an outdoor image correspond to ground, vertical objects, and sky. These
geometric labels and allow us to construct a coarse 3D model of the scene.
We apply computer vision techniques to identify songs from a short, noisy audio snippet. This research is performed with Yan Ke and Rahul Sukthankar.
Sound Detection and
Identification
Our goal is to detect and identify sound objects, such as car horns or dog barks, in audio. Our system, called SOLAR (sound object localization and retrieval) is the first, to our knowledge, that is capable of finding a large variety of sounds in audio data from movies and other complex audio environments.
The goal is to retrieve images based
on the appearance of the objects, such as elephants or race cars, contained in
them. Our approach is to extend popular techniques for object detection to be
capable of online training with only a few examples. The key idea is that, once
the underlying statistical structure of the image domain is modeled, that
structure and its parameters can be used to dramatically improve results in a
probabilistic object- based image retrieval system.