Date of Award

8-2012

Embargo Period

10-15-2012

Degree Type

Dissertation

Degree Name

Doctor of Philosophy (PhD)

Department

Machine Learning

Advisor(s)

Luis von Ahn

Second Advisor

Tom Mitchell

Third Advisor

Jaime Carbonell

Fourth Advisor

Eric Horvits

Fifth Advisor

Rob Miller

Abstract

This thesis is centered around the problem of attribute learning -- using the joint effort of humans and machines to describe objects, e.g., determining that a piece of music is "soothing," that the bird in an image "has a red beak", or that Ernest Hemingway is an "Nobel Prize winning author." In this thesis, we present new methods for solving the attribute-learning problem using the joint effort of the crowd and machines via human computation games.

When creating a human computation system, typically two design objectives need to be simultaneously satisfied. The first objective is human-centric -- the task prescribed by the system must be intuitive, appealing and easy to accomplish for human workers. The second objective is task-centric -- the system must actually perform the task at hand. These two goals are often at odds with each other, especially in the casual game setting. This thesis shows that human computation games can accomplish both the human-centric and task-centric objectives, if we first design for humans, then devise machine learning algorithms to work around the limitations of human workers and complement their abilities in order to jointly accomplish the task of learning attributes. We demonstrate the effectiveness of our approach in three concrete problem settings: music tagging, bird image classification and noun phrase categorization.

Contributions of this thesis include a framework for attribute learning, two new game mechanisms, experiments showing the effectiveness of the hybrid human and machine computation approach for learning attributes in vocabulary-rich settings and under the constraints of knowledge limitations, as well as deployed games played by tens of thousands of people, generating large datasets for machine learning.

Comments

CMU-ML-12-106

Share

COinS