This paper was inspired by my own futility as a baseball prognosticator. Every spring I'd sit down with the spring training rosters and last season's statistics, calibrating and calculating the effect of changes which, in my “expert” opinion, would inevitably catapult Team A to a division championship while Team B collapsed to a pathetic sixth place. Every spring I'd share my picks with friends -- and every fall's final standings would reconfirm my incompetence. Am I really that stupid, or is there another explanation? Read on and draw your own conclusions...
Tinkering
Ever with
Chance
Or...How Well Do Statistics Predict the Present?
Doug Pappas
SABR 23
June 25, 1993
This research was inspired by Bill James's essay "How Often Do the Best Teams Actually Win," which appeared in the 1989 Baseball Abstract (the Brock Hanke/Rob Wood collaboration). For that essay, James simulated 1,000 complete seasons on his computer, using teams of a known quality whose chances of winning a particular game could be precisely calculated. He found that the best team in a division won only 2,185 of 4,000 divisional races, or 54.6%. The best team in all of baseball won its division only 71.5% of the time, and won the World Series only 29.3% of the time...and this in a simplified model where "the best team in baseball" could be objectively identified.
This bothered me...until I stopped to think about how "real" baseball works. The final standings summarize the results of 162 individual games, while each game in turn represents the results of nine (or more) innings of batter-pitcher duels. At each stage of the contest, chance plays a major role:
-- Within an inning: Batters don't "win games"; they hit doubles, draw bases on balls, and strike out. The order of events is critically important. If a team's first three batters produce an infield single, a walk, and a double high off the outfield wall, it will score no runs if the double comes first, one run if it comes second, and two runs if it comes third, yet the individual players have performed the same.
-- Within a game: A team wins only by scoring more runs than it allows, so great pitching performances, like Harvey Haddix's 12-inning perfect game in 1959, aren't enough if the team doesn't score. Twenty years later, the Cubs scored 22 runs and lost; just last week, Carlos Baerga hit three home runs against Detroit in a losing effort. Teams can't create a "runs bank" to store the excess from ten-run wins for use during later one-run losses.
-- Within a season: If success is defined as winning a pennant or division championship, it's not enough just to win a lot of games. Casey Stengel's 1954 Yankees won 103 games, more than any other Yankee team he managed...but it lost the pennant because Cleveland won 111 games. Meanwhile, the 1973 Mets captured their division with 82 wins, and the 1987 Twins rode an 87-win season (in which they were actually outscored by their opponents!) all the way to a World Championship.
In recent years, new measurements have been developed to estimate how many games a team should win with a given offensive and defensive performance. The first of these was Bill James' Pythagorean Formula, which focuses on the number of runs scored and allowed over the course of a season:
Winning percentage = (Runs scored)2 / (Runs scored)2 + (Runs allowed)2.
Thus a team which scores as many runs as it allows should win half the time; one that scores 700 and allows 600 should win at about a .576 percentage. Teams performing significantly better or significantly worse are "clutch" and "choke" teams, if you believe in such concepts, "lucky" and "unlucky" if you don't.
More recently, Pete Palmer has developed a "linear weights" system which evaluates each player's batting, pitching and fielding to determine whether he's better or worse than the league average and converting that into his effect on the team's won/lost record. This system concluded, for example, that because of Barry Bonds' outstanding 1992 season, the Pirates won nine games more than if they'd have won with an average left fielder. (Bonds' nine-win 1992 is the fifth best single season in history, according to Palmer's formulas.) Each team's expected performance is simply the sum of the positive and negative values for each of its players, with a .500 team scoring zero. The formulas are too complex to reproduce here, but can be found in the Appendix to Total Baseball; the sum of each team's batting, pitching and fielding Linear Weights for a given season is contained in Total Baseball's Annual Record of that season. Again, teams which outperform their collective statistics are "clutch" or "lucky." (Don't ask me to explain why Total Baseball's system for rating managers is a variant of the Pythagorean Formula, attributing all differences between expected and actual record to managerial skill and thereby concluding that Joe McCarthy, John McGraw and Casey Stengel were below-average managers.)
For easy reference, the three right-hand columns on the first page of reports for each set of simulated seasons lists the team's actual number of wins; the number predicted by the Pythagorean Formula; and the number predicted by Linear Weights. In 1992, for example, the Bluejays and Pirates were "lucky"; the Mariners and Dodgers, "unlucky."
************************************************************************************
Keeping these principles in mind, I developed a different type of simulation -- one which, using actual players and teams, controlled for every variable except the effect of chance on the outcome. I used version 4.0 of the Pursue the Pennant computer game to replay every season from 1986 through 1992 using each team's actual rosters and statistics. The computer chose starting lineups and pitching rotations and made all in-game decisions for each team, replaying an entire league's schedule in about an hour and a half. When a season was complete, I collected the results, cleared them from the computer's memory, and ran the simulation again...ten times for the 1986-91 seasons, 20 times for the 1992 season. The same players with the same statistics, using the same batting orders, pitching rotations, and managerial strategies -- how much could the results vary?
A LOT. In the twenty 1992 simulations, for example, four different teams won the NL East, with real-world champion Pittsburgh accounting for only three of those wins. Houston tied Atlanta for one division title, while finishing last two other times; world champion Toronto won as many as 103 games, as few as 74. In fact, every single team in baseball had at least an eighteen-game spread between its best and worst record.
For the 1990 and 1991 seasons, I extracted a few players' key statistics for each simulation to show variance in individual player performances. AL MVP Rickey Henderson's 1990 Runs Created/27 Outs (the number of runs a team of nine Rickey Hendersons would score) varied from 8.62 to 12.68, while Cecil Fielder's 1990 home run total ranged from a low of 41 to a high of 63(!). Leaguewide ERAs for that season varied only from 4.07 to 4.24.
1992
AL EAST Consolidated Standings GB RealPythLin Wt
Divisional wins: Baltimore 92-70 -- 89 87 90
Baltimore - 7 Toronto 91-71 1 96 91 90
Milwaukee - 5 Milwaukee 91-71 1 92 97 90
Toronto - 8 New York 84-78 8 76 73 81
Detroit 74-88 18 75 81 76
Cleveland 74-88 18 76 80 76
Boston 67-95 25 73 72 75
AL WEST Consolidated Standings GB
Divisional wins: Minnesota 97-65 -- 90 92 89
Chicago - 1 Oakland 92-70 5 96 89 91
Minnesota - 13 Chicago 84-78 13 86 86 85
Oakland - 6 Kansas City 82-80 15 72 69 75
Texas 73-89 24 77 73 76
Seattle 73-89 24 64 68 73
California 60-102 37 72 69 65
NL EAST Consolidated Standings GB
Divisional wins: St. Louis 92-70 -- 83 85 87
Chicago - 1 Montreal 90-72 2 87 90 87
Montreal - 8 Pittsburgh 89-73 3 96 93 89
Pittsburgh - 3 Chicago 79-83 13 78 77 79
St. Louis - 8 Philadelphia 75-87 17 70 77 75
New York 73-89 19 72 74 73
NL WEST Consolidated Standings GB
Divisional wins: Atlanta 93-69 -- 98 96 90
Atlanta - 18.5 Cincinnati 83-79 10 90 88 88
Cincinnati - 1 San Francisco 79-83 14 72 71 73
Houston - 0.5 Houston 76-86 17 81 73 77
San Diego 73-89 20 82 79 81
Los Angeles 68-94 25 63 69 73
"Real" -- Actual number of wins for the team
"Pyth" -- Number of wins predicted by Bill James' Pythagorean Formula
"Lin Wt" -- Number of wins predicted by Pete Palmer's Linear Weights formula
1991
AL EAST Consolidated Standings GB RealPythLin Wt
Divisional wins: Toronto 98-64 -- 91 88 90
Boston - 1* Boston 87-75 11 84 83 86
Detroit - 1 Detroit 85-77 13 84 83 81
Toronto - 8* Baltimore 78-84 20 67 69 73
New York 72-90 26 71 70 72
*Two Boston-Toronto tiesMilwaukee 70-92 28 83 87 82
Cleveland 63-99 35 57 59 65
AL WEST Consolidated Standings GB
Divisional wins: Minnesota 90-72 -- 95 95 96
Chicago - 5 Chicago 88-64 2 87 90 87
Minnesota - 3 Texas 86-76 4 85 82 83
Oakland - 0.5 Oakland 82-80 8 84 79 74
Texas - 1.5 Kansas City 80-82 10 82 82 83
California 77-85 13 81 81 80
Seattle 71-91 19 83 84 85
NL EAST Consolidated Standings GB
Divisional wins: Pittsburgh 89-73 -- 98 97 97
Chicago - 1 St. Louis 85-77 4 84 81 80
New York - 1 Chicago 78-84 11 77 77 76
Pittsburgh - 7 Philadelphia 76-86 13 78 75 74
St. Louis - 1 Montreal 74-88 15 71 70 76
New York 74-88 19 77 80 81
NL WEST Consolidated Standings GB
Divisional wins: Los Angeles 90-72 -- 93 93 91
Atlanta - 4 Atlanta 89-73 1 94 94 90
Cincinnati - 1 Cincinnati 87-75 3 74 81 84
Los Angeles - 4* San Diego 83-79 7 84 80 79
San Diego - 1* San Francisco 80-82 10 75 75 75
* Two LA-SD ties Houston 67-75 23 65 67 68
Frank Thomas, OBP: .458.461.440.428.425.454.427.451.447.429
David Cone, K/9 IP: 10.10 9.72 10.73 9.67 9.40 10.61 9.23 9.73 10.28 9.34
1990
AL EAST Consolidated Standings GB RealPythLin Wt
Divisional wins: Boston 85-77 -- 88 85 88
Baltimore - 1.5* Toronto 83-79 2 86 93 89
Boston - 5.5* Baltimore 79-83 6 76 78 78
Cleveland - 1 Cleveland 77-85 8 77 80 76
Toronto - 2 Detroit 76-86 9 79 81 78
Milwaukee 76-86 9 74 78 75
*One Bal-Bos tie New York 70-92 15 67 64 66
AL WEST Consolidated Standings GB
Divisional wins: Oakland 102-59 -- 103 101 99
Oakland - 10 Chicago 87-75 15 94 87 81
Seattle 85-77 17 77 76 83
Kansas City 85-77 17 75 81 81
Texas 79-83 23 83 79 81
Minnesota 77-85 25 74 74 76
California 73-89 29 80 79 82
NL EAST Consolidated Standings GB
Divisional wins: New York 90-72 -- 91 100 92
Montreal - 1.5 Pittsburgh 88-74 2 95 95 94
New York - 3.5 Montreal 86-76 4 85 89 88
Philadelphia - 1.5 Philadelphia 82-80 8 77 72 76
Pittsburgh - 3.5 St. Louis 80-82 10 70 69 76
St. Louis - 1 Chicago 76-86 14 77 71 72
One Montreal-NY tie; one Philadelphia-Pittsburgh tie
NL WEST Consolidated Standings GB
Divisional wins: Cincinnati 93-69 -- 91 93 92
Cincinnati 8.3 Los Angeles 87-75 6 86 86 83
Los Angeles 1.3 San Diego 82-80 11 75 81 81
San Diego 0.3 San Francisco 77-85 16 85 82 79
Atlanta 66-96 27 65 66 64
Houston 66-96 27 75 70 75
One three-way tie between Cincinnati, Los Angeles and San Diego
Cecil Fielder, home runs: 52 63 58 44 60 41 54 51 46 51
Rickey Henderson,
Runs Created/27 outs: 10.87 11.40 8.62 9.87 11.06 12.68 9.44 9.25 9.45 11.65
American League ERA: 4.08 4.17 4.10 4.12 4.07 4.24 4.15 4.08 4.08 4.14
1989
AL EAST Consolidated Standings GB RealPythLin Wt
Divisional wins: Toronto 91-71 -- 89 90 88
Boston - 1* Milwaukee 84-78 7 81 84 79
Milwaukee - 1.5* Boston 82-80 9 83 85 88
Toronto - 7.5* Cleveland 79-83 12 73 75 77
*Two Bos-Tor ties; New York 77-85 14 74 71 74
one Mil-Tor tie Baltimore 74-88 17 87 84 81
Detroit 61-91 30 59 59 63
AL WEST Consolidated Standings GB
Divisional wins: Oakland 93-69 -- 99 98 95
California - 2 Kansas City 90-72 3 92 88 87
Kansas City 2.5* Texas 89-73 4 83 79 82
Oakland - 4 California 87-75 6 91 93 89
Texas - 1.5* Seattle 76-86 17 73 77 75
*One KC-Tex tie Chicago 68-94 25 69 75 74
California 60-102 37 72 69 65
NL EAST Consolidated Standings GB
Divisional wins: St. Louis 94-68 -- 86 84 88
Chicago - 4* Chicago 89-73 5 93 91 89
St. Louis - 6* New York 82-80 12 87 92 88
Montreal 81-81 13 81 81 83
*Two Chi-St.L ties Pittsburgh 81-81 13 74 76 76
Philadelphia 64-98 30 67 68 73
NL WEST Consolidated Standings GB
Divisional wins: San Francisco 88-74 -- 92 93 90
Cincinnati - 1 Houston 84-78 4 86 78 74
Houston - 4 Cincinnati 82-80 6 75 74 77
San Diego - 1 Los Angeles 80-82 8 77 84 82
San Francisco - 5 San Diego 77-85 11 89 83 84
Atlanta 69-93 19 63 69 68
1988
AL EAST Consolidated Standings GB RealPythLin Wt
Divisional wins: Boston 93-69 -- 89 94 96
Boston - 4 Milwaukee 91-71 2 87 90 91
Milwaukee - 3 Toronto 85-77 8 87 89 84
New York - 1 Detroit 83-79 10 88 86 84
Toronto - 2 New York 82-80 11 85 84 80
Cleveland 78-84 15 78 73 74
Baltimore 54-108 39 54 53 61
AL WEST Consolidated Standings GB
Divisional wins: Oakland 94-68 -- 104 101 95
Kansas City - 2 Minnesota 92-70 2 91 91 92
Minnesota - 1.5* Seattle 82-80 12 68 72 77
Oakland - 6.5* Kansas City 82-80 12 84 88 85
California 74-88 20 75 75 72
*One Min-Oak tie Texas 73-89 21 70 69 74
Chicago 71-91 23 71 66 67
NL EAST Consolidated Standings GB
Divisional wins: New York 101-61 -- 100 103 100
New York - 9 Montreal 88-74 13 81 86 87
St. Louis - 1 St. Louis 84-78 17 76 74 79
Pittsburgh 81-81 20 85 85 83
Chicago 81-81 20 77 77 78
Philadelphia 65-97 36 65 65 66
NL WEST Consolidated Standings GB
Divisional wins: Cincinnati 92-70 -- 87 87 85
Cincinnati - 4* San Francisco 88-74 4 83 86 83
Houston - 0.5 San Diego 84-78 8 83 83 82
San Diego - 2 Houston 81-81 11 82 79 79
San Francisco - 3.5 Los Angeles 76-86 16 94 92 85
*One Cin-Hou tie; one Atlanta 51-111 41 54 58 60
Cin-SF tie
1987
AL EAST Consolidated Standings GB RealPythLin Wt
Divisional wins: Toronto 90-72 -- 96 101 96
Boston - 1 New York 89-73 1 89 84 83
Detroit - 0.3* Milwaukee 85-77 5 91 85 81
Milwaukee - 0.3* Boston 82-80 8 78 83 81
New York - 5 Detroit 79-83 11 98 97 96
Toronto - 3.3* Baltimore 69-93 21 67 66 69
*One Det-Mil-Tor tie Cleveland 65-97 23 61 61 65
AL WEST Consolidated Standings GB
Divisional wins: Kansas City 86-76 -- 83 84 86
California - 2 Minnesota 85-77 1 85 79 79
Chicago - 1 Chicago 84-78 2 77 81 79
Kansas City - 3 California 84-78 2 75 78 77
Minnesota - 3 Seattle 83-79 3 78 77 82
Seattle - 1 Oakland 79-83 7 81 83 82
Texas 75-87 11 75 78 77
NL EAST Consolidated Standings GB
Divisional wins: New York 94-68 -- 92 94 93
New York - 7 St. Louis 90-72 4 95 92 85
St. Louis - 3 Montreal 82-80 12 91 83 82
Pittsburgh 81-81 13 80 76 80
Chicago 77-85 17 76 72 77
Philadelphia 77-85 17 80 79 81
NL WEST Consolidated Standings GB
Divisional wins: San Francisco 92-70 -- 90 94 88
Cincinnati - 0.5* Cincinnati 80-72 12 84 82 84
Houston - 1 Houston 79-83 13 76 77 79
San Francisco - 8.5* San Diego 77-85 15 65 70 74
Atlanta 72-90 20 69 73 75
*One Cin-SF tie Los Angeles 71-91 21 73 76 74
1986
AL EAST Consolidated Standings GB RealPythLin Wt
Divisional wins: New York 99-63 -- 90 87 92
Boston - 1 Toronto 94-68 5 86 89 86
Detroit - 2 Detroit 93-69 6 87 90 90
New York - 6.5* Boston 86-76 13 95 92 89
Toronto - 0.5* Milwaukee 81-81 18 77 73 76
Cleveland 76-86 23 84 80 80
*One NY-Tor tie Baltimore 76-86 23 73 75 76
AL WEST Consolidated Standings GB
Divisional wins: California 95-67 -- 92 92 91
California - 8 Kansas City 77-85 18 76 79 79
Kansas City - 1 Texas 76-86 19 87 84 84
Texas - 1 Minnesota 75-87 20 71 71 73
Oakland 73-89 22 76 78 75
Chicago 71-91 24 72 74 74
Seattle 61-101 34 67 69 68
NL EAST Consolidated Standings GB
Divisional wins: New York 93-69 -- 108 105 100
Montreal - 1 St. Louis 89-73 4 79 80 77
New York - 4.5* Philadelphia 87-75 6 86 84 84
Philadelphia - 2 Montreal 83-79 10 78 75 80
St. Louis - 2.5* Pittsburgh 70-92 23 64 77 76
* One NY-St.L tie Chicago 68-94 25 70 70 70
NL WEST Consolidated Standings GB
Divisional wins: Houston 93-69 -- 96 92 92
Cincinnati - 1 San Francisco 92-70 1 83 91 85
Houston - 4 Cincinnati 82-80 11 86 83 81
San Francisco - 5 San Diego 74-88 19 74 73 76
Atlanta 73-89 20 72 68 74
Los Angeles 68-94 25 73 76 74
Copyright © 1993 Doug Pappas. All rights
reserved.
Originally presented at the 1993 SABR convention.