Carnegie Mellon University
Browse
Coherent Scene Understanding With 3D Geometric Reasoning.pdf (47.01 MB)

Coherent Scene Understanding With 3D Geometric Reasoning

Download (47.01 MB)
thesis
posted on 2011-05-01, 00:00 authored by Jiyan Pan
When looking at a single 2D image of a scene, humans could effortlessly understand the 3D world behind the scene even though stereo and motion cues are not available. Due to this remarkable human capability, one of the ultimate goals of computer vision is to enable machines to automatically infer the 3D structure of a scene given a single 2D image. This dissertation proposes methods that produce a geometrically and semantically coherent 3D interpretation of urban scenes from a single image, and shows the benefits of reasoning in 3D when analyzing 2D images. In this dissertation, we model an urban scene using three types of elements. The first type is global geometries such as ground plane and gravity direction. The second type is objects such as cars and pedestrians that have definitive shapes and extents. The third type is vertical surfaces such as building facades that do not have definitive shapes and extents. Such a modeling allows for a richer characterization of an urban scene than existing works. To tackle the inherent ambiguity involved in recovering the 3D structure from a single 2D image, we systematically identify geometric constraints among the three types of elements in our model, and encode such constraints in a Conditional Random Field (CRF). For objects, we consider both their global geometric compatibility with ground plane and gravity direction, and their local geometric compatibility between adjacent objects. For building facades, we decompose them into a set of continuously-oriented planes mutually related by 3D geometric relationships, and constrained by nearby objects in 3D. We also propose a generalized RANSAC algorithm to make the inference of the model tractable. We show that performing 3D geometric reasoning using our model benefits individual tasks such as object detection, viewpoint estimation, and facade layout recovery. In addition, it yields a more informative interpretation of the 3D scene behind the image.

History

Date

2011-05-01

Degree Type

  • Dissertation

Department

  • Robotics Institute

Degree Name

  • Doctor of Philosophy (PhD)

Advisor(s)

Takeo Kanade,Martial Hebert

Usage metrics

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC