Data Work

Here are a couple samples of previous data analysis work, please feel to reach out for greater detail & explanation. All analysis done in R in free time for fun.

2021 MLB Pitching Breakdown Analysis – Breakdown of 2021 MLB pitcher data for better identification of optimal relief pitchers to aim for during 2022 Fantasy Baseball Draft & Season.
2015 San Diego Bus Traffic Analysis – Breakdown of ridership across San Diego bus and trolley systems to define predictive model of ridership by origin/stop, time of day, season etc. (Password protected, please reach out for more detail).



2021 MLB Pitching Breakdown Analysis

  • Goal to analyze MLB Relief pitchers for better identification of optimal pitchers to draft in 2022 Fantasy Baseball League Draft.
  • 95% of traditional leagues use “Saves” as scoring metric and thus majority of industry experts rank relievers only based on Saves.
  • However, this league is measured in “Saves + Holds”, which greatly expands pitching pool of viable pitchers/relievers.
  • Thus this analysis was aimed at identifying those pitchers who wouldn’t normally be considered in “Saves-only” leagues but are still could be potential impact players for “Saves + Holds” leagues.

TL;DR: The Results

Skill-interactive Earned Run Average (SIERA): SIERA quantifies a pitcher’s performance by trying to eliminate factors the pitcher can’t control by himself. But unlike a stat such as xFIP, SIERA considers balls in play and adjusts for the type of ball in play.

Top Relievers to Pursue in “Save + Holds” Leagues, the upper right quadrant being the “Elite”:

  • Plotting SIERA by SV+HLDs gives visual representation of the top relievers in the game (the lower the SIERA the better) that get plenty of Save or Hold opportunities to boost fantasy points earned. (Note: Y axis flipped so upper right quadrant is key when viewing SIERA).
  • Traditional Closers are labeled in Blue, which the industry experts typically rank and favor, however the Red points illustrate drastically undervalued Relievers that are just as valuable in “Saves + Holds” Leagues that are overlooked and are cheap to obtain on draft day and should be prioritized.

The Data

Unstructured 2021 MLB Reliever Data
  • Created new dataframes, removed outliers, cleaned & manipulated attributes by combining, separating, or removing columns entirely (i.e. removing duplicated rows, remove starting pitchers not eligible for Saves or Holds, creating additional ‘SV + Hold’ column for easier comparison, etc.).
  • Removing starting pitchers (459) and consolidating duplicated entries (125) dropped player pool from 987 to 403.
Filtered dataset capturing only applicable data
  • To be able to compare pitchers 1:1 had to import 2022 Advanced Stats Projections and merge both dataframes together.

Formatting in progress