Coherent Scene Understanding With 3D Geometric Reasoning

Pan, Jiyan

doi:10.1184/R1/6715151.v1

Coherent Scene Understanding With 3D Geometric Reasoning.pdf (47.01 MB)

Coherent Scene Understanding With 3D Geometric Reasoning

thesis

posted on 2011-05-01, 00:00 authored by Jiyan Pan

When looking at a single 2D image of a scene, humans could effortlessly understand the 3D world behind the scene even though stereo and motion cues are not available. Due to this remarkable human capability, one of the ultimate goals of computer vision is to enable machines to automatically infer the 3D structure of a scene given a single 2D image. This dissertation proposes methods that produce a geometrically and semantically coherent 3D interpretation of urban scenes from a single image, and shows the benefits of reasoning in 3D when analyzing 2D images. In this dissertation, we model an urban scene using three types of elements. The first type is global geometries such as ground plane and gravity direction. The second type is objects such as cars and pedestrians that have definitive shapes and extents. The third type is vertical surfaces such as building facades that do not have definitive shapes and extents. Such a modeling allows for a richer characterization of an urban scene than existing works. To tackle the inherent ambiguity involved in recovering the 3D structure from a single 2D image, we systematically identify geometric constraints among the three types of elements in our model, and encode such constraints in a Conditional Random Field (CRF). For objects, we consider both their global geometric compatibility with ground plane and gravity direction, and their local geometric compatibility between adjacent objects. For building facades, we decompose them into a set of continuously-oriented planes mutually related by 3D geometric relationships, and constrained by nearby objects in 3D. We also propose a generalized RANSAC algorithm to make the inference of the model tractable. We show that performing 3D geometric reasoning using our model benefits individual tasks such as object detection, viewpoint estimation, and facade layout recovery. In addition, it yields a more informative interpretation of the 3D scene behind the image.

History

Date

2011-05-01

Degree Type

Dissertation

Department

Robotics Institute

Degree Name

Doctor of Philosophy (PhD)

Advisor(s)

Takeo Kanade,Martial Hebert

Usage metrics

Keywords

Robotics Institute

Licence

In Copyright

Exports

RefWorks

BibTeX

Ref. manager

Endnote

DataCite

NLM

DC

Coherent Scene Understanding With 3D Geometric Reasoning

History

Date

Degree Type

Department

Degree Name

Advisor(s)

Usage metrics

Categories

Keywords

Licence

Exports