Capstone Project
Analyzing & Visualizing a Variety of Off- and On-Field Factors to Determine Their Effect on MLB Game Attendance
Introduction
My capstone project for the data analytics bootcamp consists of looking at several factors that may influence attendance to regular season Major League Baseball (MLB) games, from the 2010 season through the 2024 season. Every regular season game played is a data point, and I took a look at every regular season MLB game played during this time period– for a total of 34,913 games.
You can skip ahead to the first page of the dashboard by going here, but I recommend scrolling down first to read about the data gathering experience, the data cleaning process, and putting.
Background Information
Typically, every MLB team plays 162 games during the regular season, with about half of them played at their home venue. The notable exception to this rule was the 2020 season, which was delayed due to COVID-19 and resulted in a shortened 60-game season– more on that on the Data Details section below.
My analysis sought to look at three categories of variables that might influence attendance to MLB games:
- External factors: Variables that are essentially outside of the control of teams and venues.
- Venue factors: Variables within the venues and stadiums where games are played.
- On-the-field factors: Variables involving the teams and players themselves.
Gathering the Data
Data Details
It is important for my analysis to note that regular season games during the 2020 season had no in-person fans, so attendance for each of these games was 0.
Additionally, the 2021 season opened with limited attendance for all but one team, the Texas Rangers. Capacity was adjusted during the first half of the season by varying degrees among the remaining 29 teams, but by July 2021 all venues were back to full capacity. Since my analysis standardizes attendance by looking at percent of capacity in attendance, absolute numbers do little to show the percentage of tickets sold compared to available capacity. Research to address this is in my to-do list, but it’ll be a long process as I haven’t been able to find a running tally of attendance limits for the 2021 season.
There are currently three pages on the report highlighting the different factors we’re comparing against percentage of capacity attended, along with two pages of statistical observations.