Tuesday, July 29, 2008

“Regression To the Mean” is not “Karma”

File this under “A little knowledge can be a dangerous thing.”

Most of you probably have some understanding of the concept of “regression to the mean.” Simply put, it states that as a sample size gets larger, the effects of luck and Darin Erstad’s grittiness disappear. However, there is a tendency by some to equate regression to the mean and the bullshit Eastern concept of Karma, in which the universe will balance out the consequences of one’s deeds, giving them ultimate control over the results of one’s own life.

Anyway, let’s say that Juan Pierre starts out with hits in his first 30 ABs, an amazing feat. Karma would balance this out by, at a later date, causing Juan Pierre to go hitless in 30 ABs. Of course, that wouldn’t be that unusual for Pierre and the universe would probably have to ding him a few dozen more, but you get the idea.

This is not how regression to the mean works. While it’s possible that Pierre will experience some correspondingly negative luck in the future, it is certainly not required, or even likely. What regression to the mean does require is for Pierre’s little hot streak to end up buried in data.

Think of it this way. Let’s assume that Pierre is a .200 hitter, meaning that he gets 2 hits per 10 ABs over the long haul. And let’s assume that he has started the season 30/30 for a 1.000 batting average, and Jon Kruk is sitting around on Baseball Tonight wondering if Pierre will hit .500 for the season. At this point, let’s say that regression to the mean starts to kick in. Over Pierre’s next 100 ABs he gets 20 hits, making him 50/150 (.333) on the season.

And again, over his next 100 ABs he gets 20 hits, making him 70/250 (.280). And again, over his next 100 ABs he gets 20 hits, making him 90/350 (.257).

Juan Pierre averages 659 ABs per year. Even with his bit of luck, starting the season with 30 straight hits, after 650 ABs of regressing to the mean, Pierre will put up a line of 150/650 (.231).

If you let Juan play long enough, (say 11,050 ABs) his hitting streak will be almost completely lost in the statistics (2230/11050, .202)

At no point did we have Juan go into any sort of prolonged slump, it’s just that over time, your little peaks and valleys will get eaten up by the true you. In Juan’s case, that’s a pretty crappy, powerless corner outfielder.

Every time an announcer claims that some guy is “due” he’s alluding to Karma. Don’t believe it. While all players will go though hot streaks as well as slumps, one does not have any effect on the other.

A Quick Note About Pythagorean Record

The Cubs have a way better run differential than the Brewers. This means that the Brewers have probably gotten lucky to some extent, as they are in almost a dead heat at the moment. Because the Brewers have gotten luckier than the Cubs, some are concluding that the Brewers are destined to fall away.

This is Karma talking. And, as previously stated, Karma is bullshit.

The Brewers have been lucky; there is no doubt about that. But even if the Brewers regress to the mean right now, they have made drastic changes to their lineup which may be able to substitute for that missing luck. Adding Sabathia, and to a lesser extent, Durham, makes the Brewers a better team. The luck we have experienced is in the bank. It managed to keep us within striking distance of the Cubs. Going forward, the addition of better players is what is likely to carry the Brewers.

Regression to the mean does NOT mean that the Brewers will hit a stretch of bad luck. Of course, they might hit a stretch of bad luck, as might you, or I, or the Cubs. They will most likely play at the mean. Fortunately for the Brewers, their mean is now significantly better than it was before.

Creative Commons License
All Original Photos are licensed under a Creative Commons Attribution-NonCommercial 2.5 License. Creative Commons License
All Original writings are licensed under a Creative Commons Attribution-NonCommercial 2.5 License.