Regression to the Mean

Feb 19, 2022

Watching the Winter 2022 Olympics, it's fun to listen to the commenters talk about the top athletes:

Walberg's first run was great. His second run is going to be a lot harder now that he's in his head about beating it. Kingsbury's run should be better – his last run wasn't as good and now he has nothing to lose.

As much as this story might help us make sense of the world of skiing, it's a narrative fallacy. But the commentators are noticing something. It's called regression to the mean. Every run is a mixture of skill and luck. A bad wind or icy patch can throw off even the best of skiers. Whenever there's an element of luck (randomness) involved, any extreme data point – good or bad – is likely to regress towards the true average performance.

Regression to the mean is especially important in designing experiments. Take a group of lowest performers on a test. Let's say you put them in a program that's supposed to increase their test scores. The tests scores will likely improve. Was the program successful? Well, some test takers will naturally do better on the second try: maybe they were having a bad day the first time or didn't get enough sleep. The best way to protect against this pitfall is to split the experiment into a treatment group (low-scorers who get the program) and control group (low-scorers who don't).