Date of Award

Summer 8-2015

Embargo Period


Degree Type


Degree Name

Doctor of Philosophy (PhD)


Machine Learning


Tom Mitchell


How is information organized in the brain during natural reading? Where and when do the required processes occur, such as the perception of individual words and the construction of sentence meanings. How are semantics, syntax and higher-level narrative structure represented? Answering these questions is core to understanding how the brain processes language and organizes complex information. However, due to the complexity of language processing, most brain imaging studies focus only on one of these questions using highly controlled stimuli which may not generalize beyond the experimental setting. This thesis proposes an alternative framework to study language processing. We acquire data using a naturalistic reading paradigm, annotate the presented text using natural language processing tools and predict brain activity with machine learning techniques. Finally, statistical testing is used to form rigorous conclusions. We also suggest the use of direct non-parametric hypothesis tests that do not rely on any model assumptions, and therefore do not suffer from model misspecification. Using our framework, we construct a brain reading map from functional magnetic resonance imaging data of subjects reading a chapter of a popular book. This map represents regions that our model reveals to be representing syntactic, semantic, visual and narrative information. Using this single experiment, our approach replicates many results from a wide range of classical studies that each focus on one aspect of language processing. We extend our brain reading map to include temporal dynamics as well as spatial information by using magnetoencephalography. We obtain a spatio-temporal picture of how successive words are processed by the brain. We show the progressive perception of each word in a posterior to anterior fashion. For each region along this pathway we show a differentiation of the word properties that best explain its activity.