Trouble with the Curve: Identifying Clusters of MLB Pitchers using Improved Pitch Classification Techniques
The PITCHf/x database, which records the location, velocity, and trajectory of every pitch thrown in Major League Baseball (MLB), has allowed the statistical analysis of MLB to ourish since its introduction in late 2006. Using PITCHf/x, pitches have been classified by hand, requiring considerable effort, or using neu- ral network clustering and classification, which is often difficult to interpret. We use model-based clustering with a multivariate Gaussian mixture model and an adjusted Bayesian Information Criterion to determine the number of different clusters. We verify these results via cross validation, validation by prediction strength, and through visual inspection. Furthermore, we use our method to cluster pitchers into groups with similar characteristics via k-means clustering and the Fisher-wise criterion. Our method builds a strong foundation towards addressing many open MLB research questions, including preventing pitcher in- jury.
History
Date
2013-05-01Advisor(s)
Andrew ThomasDepartment
- Statistics