Exploratory Data Analysis and Visualization of Trends and Biases in MLB Draft Decisions
1 Introduction
Baseball, known as America’s favorite pastime, is a sport filled with history, statistics, and tactics. The abundance of data collected over decades provides a rich foundation of analyzing the underlying trends present within this sport. Within this research, we will focus on finding key patterns and biases in baseball drafting using exploratory data analysis and visualization. By examining the various factors involved in player background and performance, we aim to better understand how such factors influence decisions in the MLB draft.
To achieve this goal, we utilize the BaseballR package in R and the pybaseball package in Python, both of which are thorough tools for retrieving both past and present baseball information to discover trends that could impact the selection of players in the MLB draft. These resources allow us to explore trends related to many factors, which incldue player position, age, background (college vs. high school), draft placement, and performance metrics. By conducting a thorough exploratory data analysis, we aim to identify patterns, anomalies, and potential biases in how players are scouted and drafted.
Through studying draft biases, our goal is to determine if specific player traits are given too much or too little importance during the drafting process and to shed light on how drafting decisions have evolved. This analysis offers valuable insights into scouting, drafting, and player development processes, helping to improve decision-making in a sport increasingly driven by analytics for a more precise and equitable evaluation of player potential.