Date of Original Version




Rights Management

All Rights Reserved

Abstract or Description

The PITCHf/x database, which records the location, velocity, and trajectory of every pitch thrown in Major League Baseball (MLB), has allowed the statistical analysis of MLB to ourish since its introduction in late 2006. Using PITCHf/x, pitches have been classified by hand, requiring considerable effort, or using neu- ral network clustering and classification, which is often difficult to interpret. We use model-based clustering with a multivariate Gaussian mixture model and an adjusted Bayesian Information Criterion to determine the number of different clusters. We verify these results via cross validation, validation by prediction strength, and through visual inspection. Furthermore, we use our method to cluster pitchers into groups with similar characteristics via k-means clustering and the Fisher-wise criterion. Our method builds a strong foundation towards addressing many open MLB research questions, including preventing pitcher in- jury.


Advisor: Andrew Thomas

Department of Statistics

Embargo Date