Elo Rating

Dec 9, 2022

In 1978, Arpad Elo, a physics professor, and chess master, introduced the Elo rating system as a way to measure the relative skill levels of chess players. Since then, the system has been adapted and applied to a wide range of competitive activities, from video games and sports to online dating and even voting in political elections.

The Elo system is based on a simple idea: each player has a numerical rating that represents their skill level, and this rating is adjusted based on the outcome of their games. For example, if a higher-rated player beats a lower-rated player, their rating will go up, while the loser's rating will go down.

A simple implementation of the system is:

New rating = Old rating + K * (outcome - expected outcome)

where:

- New rating is the updated rating after the game
- Old rating is the player's rating before the game
- K is a constant that determines the weight of the outcome on the rating
- Outcome is the actual result of the game (1 for a win, 0 for a loss, 0.5 for a draw)
- Expected outcome is the probability of the player winning, calculated using the following formula:

Expected outcome = 1 / (1 + 10^((opponent's rating - player's rating) / 400))

The 400 in the denominator of the expected outcome comes from an approximation of the standard distribution or a convenient constant for chess, as far as I can tell (a difference of 200 in rating corresponds to about 0.75 in expected outcome).

Elo is used for systems other than chess

  • Matching players in online multiplayer games
  • Ranking professional sports teams or players
  • Evaluating the performance of political candidates in an election
  • Predicting the success of romantic relationships in online dating (Zuckerberg allegedly used Elo in his "Face Mash" app to rank students).
  • Ranking the quality of restaurants or other businesses based on customer ratings and reviews

The shortcomings of the system:

  • Players who stop playing in order to keep their rating
  • Selective match-making, where players seek out players that are overrated and avoid underrated players
  • Inability to compare across time periods, as ratings may be inflated or deflated over time.

It's trivial to improve the Elo rating system across different dimensions, but it usually comes at the cost of complexity. Elo is fairly easy to understand and implement. Sometimes the easiest ranking algorithms are the best.