To what extent can one predict Major League Baseball readiness using Minor League Baseball statistics?

In Major League Baseball (MLB), players are viewed not only by how skilled they are in the moment, but also by their potential to become more skilled. This is particularly true when looking at Minor League Baseball (MiLB) players. As a general rule in the MLB, players with the most potential are the most highly touted by their respective team, even when there are other players who have better MiLB statistics. There are a variety of reasons for this phenomenon, including age and investment in a particular player. This discrepancy makes many players the victim of an inopportune call up to the MLB, whether it be too late or too early. This problem affects both the team and the player, as the team does not get the maximum value that is sought after, and the player is not compensated in the same way that they may have been. What if there was a statistic to assess MiLB players’ readiness to produce at the MLB level? This could potentially help MLB teams make better decisions about their MiLB players. Through the use of statistical analysis, a single statistic can be created that estimates how ready a player is for the MLB level.

### Pedro Alvarez

This problem is truly evident when analyzing the career of Pedro Alvarez. Alvarez, who played in the Pittsburgh Pirates organization at the time, was coming off a 2009 season in A+ and AA where he was very productive. In 2010, Alvarez was seen as a top prospect in the Pirates organization and was called up to AAA. In 66 games in 2010, Alvarez had noticeably worse statistics than his 2009 season, especially in K% (Strikeout Percentage) and BB% (Base on Balls Percentage).

K% = K/PA and BB% = BB/PA

These two statistics are a good representation of how well a player is performing, especially when looked at in the context of growth. A high K% and low BB% is an indicator of poor performance. Pedro Alvarez had a positive trend in K% and a negative trend in BB% from AA to MLB. This explains why Alvarez saw a large drop in performance once called up to the MLB.

Alvarez had a respectable MLB career past his rookie season in 2010, but never lived up to his all-star potential. With more information, Alvarez may have had a better career, causing the Pirates to get more value out of him and Alvarez to be better financially compensated . This is what MLB teams strive for, to get the most value out of every player that they have in their organization. Currently, teams are doing this very well with players that exist in the MLB, but in terms of when to call a player up from the MiLB, there is not an easy or simple way to make that decision. With MiLB and MLB player statistics available to track, a statistic can be created. This information could be incredibly useful for MLB teams looking to get the most value out of a player at the MLB level.

### Correlation of Statistics

Finding the correlation between the performance of players in the MiLB as opposed to those same players’ performance in the MLB is immensely important to figuring out when the right time to call a player up is. OPS (On-Base Plus Slugging) is a simple statistic that adds together a player’s OBP (On-Base Percentage) and SLG (Slugging Percentage).

OPS = ((H + BB + HBP) / (AB + BB + HBP + SF)) + (1 * 1B) + (2 * 2B) + (3 * 3B) + (4 * HR) / AB

This is a good statistic to start with, as it is an all-encompassing statistic for a hitter. Using this statistic, one can see if a correlation between MiLB statistics and MLB statistics exists. Using data from 2006-2017, a graph can be created to show the correlation between OPS in a player’s final season in the MiLB and their first season in the MLB. 10% of the 435 players in the sample were removed as outliers, leaving a data set of 391 players.

Looking at this graph, one can clearly see that there is some type of correlation between OPS at the MiLB level and OPS at the MLB level. Interpreting this data is the next step. The league average OPS in 2018 was 0.728, and MLB teams are looking for the players that they call up to be at least league average in their first season. If the formula for the trendline is used, one can see that a player’s OPS in the MiLB should probably be at least 0.825.

0.728 = 0.8823x

x = 0.825

This means that MLB teams should begin to look at when a minor leaguer has an OPS of .825, because this indicates success at the next level.

What if MLB teams are looking at elite prospects, such as Pedro Alvarez? When is the right time to call them up based on OPS? To get the maximum amount of value out of an elite prospect, a team waits until they are able to make an impact as a star at the MLB level. These teams want their players to be in the top 25% of MLB players in terms of OPS. In 2018 the 25th percentile for OPS was 0.837. Using the same method from before, if a player has an OPS of 0.949 at the MiLB level, they have a higher likelihood of becoming a star at the MLB level.

0.837 = 0.8823x

x = 0.949

While a correlation does seem to exist, there is much variability (R^2=0.07), and therefore finding a better way to evaluate these players is important. Instead of looking at OPS, which is a non-counting statistic, one may want to look at a more simple statistic, or a counting statistic. Hits may be a good place to start, as it is the most commonly used counting statistic. Hits, however, have a lot of problems, including a home run and a single counting as just one hit each, even though a home run is inherently worth more than a single in all situations. Because of this, hits will not be used, but home runs will be. In the modern MLB home runs are king. There are still the few players that defy the trend and hit singles and do not hit home runs, however these are few and far between, and are therefore negligible. So if an MLB team wants someone who will produce power in the majors, home runs may be a good minor league statistic to use to assess and predict their performance.

Again using statistics from 2006-2017 seasons, a graph of home runs in a player’s final MiLB season versus home runs in a player’s rookie season in the MLB can be created.

The most glaring point on this graph is at the top of the graph at the point (19,56). This player is Aaron Judge, who had one of the greatest rookie seasons of all time, breaking the rookie record for home runs in a season. While a correlation does exist between home runs at the minor league level and home runs at the major league level, it is relatively loose, as the model of linear regression shows. Looking specifically at the x-value of 19, where both Aaron Judge and Ryan Flaherty lie, one can see the lack of correlation that exists. Ryan Flaherty had 6 home runs in his rookie season, much less than Judge. This lack of correlation may be due to the discrepancy in at-bats from one player to the next. Logically, if a player has more chances, they will hit more home runs, and vice versa. To fix this, one can look at a statistic such as AB/HR. This statistic shows the amount of at-bats it takes a player, on average, to hit a single home run. This will make the discrepancy in at-bats between players and leagues negligible.

With an R^2 value of 0.0584, there is little correlation and much variability between AB/HR in the MiLB versus AB/HR in the MLB. This shows that home runs in the minor leagues do not at all predict success in terms of home runs at the MLB level, however, the concept of using statistics that account for the discrepancy in at-bat totals is important.

While home runs may not be a great indicator of success, one can look at the other two true outcomes to find a statistic that may actually have a correlation between the minor leagues and the MLB. The Three True Outcomes is a concept in baseball that includes home runs, base on balls (walks), and strikeouts. All three of these outcomes do not involve the defense and are therefore less likely to be random and involve chance. Using this logic, a correlation should exist between the three true outcomes in the minors and the three true outcomes in the MLB. Home runs did not show this correlation, so the next step is to see if walks and strikeouts have a correlation. The same complication that existed with home runs exists with these two particular statistics, in that there is a large discrepancy between the number of at-bats each player has. This will greatly affect the data and show no correlation, even though one may exist. A solution to this problem is to use a statistic called BB/K, which is literally just the amount of times a player walks for each strikeout that the player has. This would solve the problem of differing at-bat numbers and highlight both walks and strikeouts at the same time.

A graph of this data can be created to demonstrate the correlation of minor league BB/K to MLB BB/K.

This graph shows a significant correlation between the MiLB BB/K statistics and the MLB BB/K statistics. BB/K is a good indicator of success at the MLB level as well. In 2018, the top 30 hitters in terms of wRC+ (Weighted Runs Created Plus) had an average BB/K of 0.60*. This is above the 75th percentile for BB/K in 2018, which was 0.59. This is to say that the players who had the most success at the plate were also the one who had a lot of walks per strikeout. This means that finding those players who will produce a high BB/K will have a major impact on the hitting of the team, and since this statistic can seemingly be predicted from minor league statistics, it can be an incredibly useful tool to aid MLB front offices in making decisions about prospects. For context, Pedro Alvarez had a 0.55 BB/K in his final season of the MiLB. This equation shows his predicted BB/K in the MLB, based solely on his BB/K in his final season in minor league baseball.

MLB Predicted Value = (0.5283)(0.55) + 0.0818

MLB Predicted Value= 0.372365

This would have put Pedro Alvarez in the 17th percentile of MLB players in that particular year, 2010. This is not ideal, especially for a rookie who could get more time to develop. Pedro Alvarez's actual BB/Kin 2010 was 0.31, which put him in the 7th percentile of players that season.

Isolated Power (ISO) is a statistic that is not commonly used, but in this situation, ISO is essential. ISO is measured through the following calculations:

ISO = 2B + 2*3B + 3*HR / AB

or,

ISO = SLG - AVG

This calculation attempts to simply show a player’s power production, by measuring how often that player produces extra-base hits (doubles, triples, home runs). Looking at the graph below, one can clearly see the strong correlation between the MiLB statistics and the MLB statistics

This correlation makes sense, as one can logically reason that raw power would stay similar throughout the transition between the MiLB and MLB. The linear regression model agrees with the fact that there is a correlation present, as R = 0.7221, which is above the 0.5 threshold, and R^2 = 0.5214, showing much less variation.

### Importance of MLB Statistics

Figuring out a relatively good way to assign value to MiLB players is important, from a success standpoint and a financial standpoint. Rookies who make a large impact in the MLB are paid little and help the team immensely. A value can be assigned to these players through the use of mathematics. To get to a single number, however, means the first thing to figure out is what statistics are important for value in the MLB. With this data, figuring out what type of MiLB player will usually produce these statistics at the Major League level is the next step. So what MLB statistics are most important?

Baseball has one main goal, and that is to score runs. What creates the best chance to score runs? To find this, one may look at the statistic wOBA (Weighted On-Base Average). This statistic attempts to estimate the value of each hitting outcome, and give more credit for outcomes that give the team a better chance to score a run, and therefore win the game.

wOBA = .69(uBB) + .72(HBP) + .89(1B) + 1.27(2B) + 1.62(3B) + 2.10(HR) / AB + BB - IBB + SF + HBP

This equation is basically saying that a double is 1.27 / 0.89 times more valuable than a single, or ~1.43 times as valuable. Using this statistic, weights can be given to certain outcomes and how valuable they are to a team scoring runs. For the created statistic, a similar concept will be used, where weights will be given to different statistics within the overall statistic.

### wMLR

The statistic that has been created from this data is called wMLR (Weighted Major League Readiness). This statistic is a single number showing how ready for the MLB a player is, based solely on their most recent year in the MiLB. The table below shows the scale for which this statistic is based.

As is shown by the table, wMLR is scaled to a league average of 100. This means that if a minor leaguer has a wMLR of 100 then they would likely produce average MLB statistics. A wMLR of 175.00 would put the player in the MVP conversation if called up, and is relatively unrealistic. Most MiLB players fall under 25.00 wMLR, and will, therefore, most likely make no impact at the MLB level if given the opportunity.

The calculation for this statistic is relatively simple:

wMLR =(ISO) + (1.24 * BB/K) / (League Average wMLR) *100

This statistic is basically saying that BB/K is 1.24 times than more valuable than ISO at predicting how ready a player is for the MLB. This makes sense within the context of baseball, as ISO is slightly more volatile than BB/K.

This particular statistic is seemingly unreliable, as one looks at the top ten players since 2007 in terms of wMLR, charted with their wRC+ as a rookie in the MLB.

As one can see, the top 10 players in terms of wMLR have not consistently been the best hitters once they become MLB players. The average wRC+ of the players on this list is 100.3, which is the definition of a league average hitter. This is especially troublesome when compared to the average wRC+ of all rookies since 2007, which is 95.362. The players that lead the league in wMLR are just barely better than the average rookie. This data is somewhat flawed, however, as wMLR is tailored to the modern MLB in terms of run production. To remedy this, one can look specifically at the two most recent seasons that exist in the data, 2016 and 2017, and with 76 players falling within this qualification, the sample size is adequate. Looking at the top 10 players in terms of wMLR in 2016 and 2017, one can see a change in the effectiveness of wMLR in predicting MLB success.

The average wRC+ of this set of 10 is 105.6. This is marginally better than the average from before, however, this data set contains an extreme outlier in Allen Cordoba. His 2017 rookie season with 55 wRC+ was actually the lowest of any player in the MLB that season had he qualified. Cordoba’s final season in the minor leagues was also in the lowest level of the minor leagues, Rookie Ball. In this sense, Cordoba was able to inflate his statistics against lower level competition. It can be safely assumed that the Padres had no intention of Cordoba being a viable player in the MLB, and was only there to fill a roster spot and fulfill service time requirements. To this point, wMLR should only be applied to players who are in AA or AAA, as those are the two highest levels of minor league baseball.

So without Cordoba, the average wRC+ of this data set becomes 111.2. This is significantly higher than average and shows that in the modern MLB, wMLR is a relatively effective tool to evaluate the readiness of minor league baseball players. wMLR can be tweaked though, so finding the best way to create the statistic is important.

### Updated wMLR

While wMLR does seem to somewhat predict the success of MLB rookies, it can be done more effectively. The original statistic weighted BB/K more than ISO due to the more volatile and predictive nature of ISO. Again, this worked somewhat well, but a different way of weighting these two statistics could be their correlation to wRC+ in the major leagues. This weighting leads to ISO being 6.28 times more valuable than BB/K, or 0.48130.0767. These are the two R2values of the variability between each statistic and wRC+. This generally means that a player with a higher ISO is likely to have a better wRC+, which means they are a better overall hitter.

This leads to the new equation of:

wMLR = (BB/K) + (6.28 * ISO) / (League Average wMLR) *100

This newer version is on the exact same scale as the earlier version, with a wMLR of 100 being league average.

As was done with the previous version, one can look at the top 10 players in terms of wMLR from the 435 rookies between 2007 and 2018.

These 10 players have an average wRC+ of 123.4, which is to say that they were on average 23.4% better than the average major league hitter. This is very significant and shows the ability of the updated wMLR to predict success in the major leagues. Looking deeper into this top 10, there are two players who are below the league average of 100 in wRC+. Both of these players have something in common, as both are catchers. In the MLB, the offensive ability of catchers is usually secondary, so them being just below league average is actually a decent catcher in the MLB.

Going further into these statistics, of the 47 players delineated as "above average", which is a player with a wMLR greater than 135.00, only 16 had a below average wRC+ in the majors. That is a 65% success rate, which is relatively good compared to the methods of today. Of the top 24 players in terms of wMLR, only three had a below average wRC+, good for an 88% success rate. This again shows the usefulness of wMLR in predicting the success of MLB rookies and thereby should help MLB teams make decisions as to whether they should call a player up to the MLB.

### Application to Today

Looking into 2019, wMLR can be used for its purpose: who should be called up to the MLB? The following players led AA and AAA in wMLR during the 2019 season.

While many of these players did get called up to their respective MLB teams by the end of the season, three notable players did not, Mark Payton, John Nogowski, and Yadiel Hernandez. All three of these players had a very good wMLR, and likely would have made an impact at the major league level if given the opportunity. All three of these players have not yet made an MLB appearance, and are above the age of 27. This is considered too old to be a minor league baseball player, and therefore means that teams overlook these players. This is troublesome, as all three of these players played for the AAA affiliate of playoff teams, and each MLB team could have used the production from these players. wMLR helps showcase a player’s predicted value at the major league level, and is independent of factors, such as age, that create a misevaluation of talent.

### Limitations

As with any statistic in baseball, limitations exist, and they must be acknowledged in order to properly use the statistic. While there are many, the three biggest limitations that exist with wMLR are as follows: defense, ballpark, and baserunning.

The most obvious and important of these three is defense, which is not accounted for in wMLR. Defense is an enormous part of the game of baseball, and an awful defensive player is unlikely to be given an opportunity in the MLB. That being said, defense is much easier to predict, and players who record a high amount of Defensive Runs Saved (DRS) in the minors, tend to do so in the majors. Looking at this statistic in conjunction with wMLR and the position that the player plays is helpful to account for the defensive value of a player.

The ballpark that a player plays in should be accounted for as a part of a player’s statistics. A hitter-friendly ballpark may give way to a higher wMLR, and a pitcher-friendly ballpark may do the opposite. Due to the low volatility of minor league ballparks, however, the difference that a ballpark makes is almost negligible when it comes to minor league statistics. While ballpark should be acknowledged, it is unnecessary to include in a statistic like wMLR.

The final limitation, baserunning, is important to understand with regard to decision making. Baserunning is a facet of the game of baseball and can be a weapon at the major league level. With this knowledge, some decisions are made based on the baserunning ability of a player, and this should continue. Speed stays consistent between the minors and majors, and is therefore the least volatile aspect of the game. There is no need to include a measurement of baserunning in wMLR due to the nature of a tool like baserunning.

These limitations are important to acknowledge, but should not discourage the use of wMLR, they should, however, serve to bring context to the statistic and show that it cannot be used independently.

### Conclusion

wMLR is an effective tool to be used for the prediction of value at the major league level, using minor league statistics. Each year, MLB teams are looking for a competitive advantage, and the ability to better predict when a minor league player is ready for the MLB could be the competitive advantage that teams are searching for. Through the use of statistical analysis, especially the utilization of R2 , wMLR came to fruition. MLB teams are also becoming more analytics based as time goes on, so the willingness to utilize a statistic such as this one will only grow. As more sophisticated methods for statistical analysis develop, this particular statistic will improve, but as of right now, this is a very effective tool for the judgment of minor league players. It is also effective at helping teams gain the most value out of a player and for those players to be better financially compensated.

* If one excludes Miguel Andujar and Javier Baez, two notoriously poor hitters in terms of BB/K, from this calculation then the average goes up to 0.63.