Encyclopedia > ELO rating system

Article Content

ELO rating system

The ELO rating system is a method for calculating the relative strength of chess players. ELO is often written in capital letters, but is not an acronym. It is the family name of the system's creator, Arpad Elo[?] (born 1903), a Hungarian-born American physics professor. "ELO" is written in uppercase to distinguish it from Professor Elo.

Table of contents

1 Previous ratings systems

2 Elo's rating system model

3 Implementing Elo's scheme

4 Comparative ratings

5 Mathematical Details

6 External links

Previous ratings systems

Elo was a master-level chess player and an active participant in the United States Chess Federation from its founding in 1939. The USCF used a numerical ratings system, devised by Kenneth Harkness, to allow members to track their individual progress in terms other than tournament wins and losses. The Harkness system was reasonably fair, but in some circumstances gave rise to substantially inaccurate ratings. On behalf of the USCF, Elo devised a new system with a statistical basis.

Estimating the true skill of a player is slippery, because chess performance can never be directly measured. One can't look at a sequence of moves and say, "That performance is 1439," or anything similar. Performance can only be inferred from wins, draws and losses. If a player wins a game, he is assumed to have performed at a higher level than his opponent for that game. Conversely if he loses, he is assumed to have performed at a lower level. If the game is a draw, the two players are assumed to have performed at nearly the same level.

Elo's rating system model

Elo suggested estimating the true skill of players by updating their ratings when they won or lost against other players, based on a comparison with the other player's ratings. If a player won more games than he was expected to win, his rating would be adjusted upward, while if he won fewer games than expected his rating would be adjusted downward.

Elo's central assumption was that the chess "performance" of any given player in any given game would be a hypothetical normally distributed random variable, with a mean value that represented the player's true skill, and would change only slowly. Given this mathematical model of chess play, the aim of the Elo rating was an attempt to estimate that mean value for each player, by considering the observable data of wins, losses and draws.

Implementing Elo's scheme

The USCF implemented Elo's suggestions in 1960, and the system quickly gained recognition as being more fair and accurate than the Harkness system. Elo's system was adopted by FIDE in 1970. Elo described his work in some detail in the book "The Rating of Chessplayers, Past and Present", published in 1978.

Subsequent statistical tests have shown that chess performance is almost certainly not normally distributed. Weaker players have significantly greater winning chances than Elo's model predicts. Therefore, both the USCF and FIDE have switched to systems based on the logistic distribution[?]. However, in deference to Elo's contribution, both organizations are still commonly said to use "the Elo system".

Comparative ratings

Because Elo's general ideas have been adopted by many different organizations (including the Internet Chess Club (ICC), Yahoo! Games, and the now defunct Professional Chess Association (PCA)), and because each organization has a unique implementation different from Elo's original suggestions, it is ambiguous and perhaps misleading to refer to a player's "ELO rating". It is more precise to refer to the organization granting the rating, e.g. "As of August 2002, Gregory Kaidanov had a FIDE rating of 2638 and a USCF rating of 2742." It should be noted that the Elo ratings of these various organisations are not always directly comparable (USCF ratings are in general about 100 points higher than FIDE ratings, for example).

The following analysis of the July 2003 FIDE rating list gives a rough impression of exactly what having an Elo rating of 2638 (or anything else) means:

1 player (Gary Kasparov) has a rating of 2800 or above
16 players have a rating of 2700 or above
113 players have a rating of 2600 or above
a player rated 2500 or above is likely to have the Grandmaster title
a player rated between 2400 and 2499 is likely to have the International Master[?] title

The highest ever FIDE Elo rating was 2851, which Gary Kasparov had on the July 1999 and January 2000 lists.

Mathematical Details

Performance can't be measured absolutely, it can only be inferred from wins and losses. Ratings therefore have meaning only relative to other ratings. Both the average and the spread of ratings can be arbitrarily chosen. Elo suggested scaling ratings so that a difference of 200 rating points would mean that the stronger player has an expected score of approximately 0.75, and the USCF initially aimed for an average club player to have a rating of 1500.

A player's expected score is his probability of winning plus half his probability of drawing. Thus an expected score of 0.75 could represent a 75% chance of winnning, 25% chance of losing, and 0% chance of drawing. On the other extreme it could represent a 50% chance of winning, 0% chance of losing, and 50% chance of drawing. The probability of drawing, as opposed to having a decisive result, is not specified in the ELO system. Instead a draw is considered half a win and half a loss.

If Player A has true strength <math>R_A</math> and Player B has true strength <math>R_B</math>, the exact formula (using the logistic curve) for the expected score of Player A is

Similarly the expected score for Player B is

Note that <math>E_A + E_B = 1</math>. In practice, since the true strength of each player is unknown, the expected scores are calculated using the player's current ratings.

When a player's actual tournament scores exceed his expected scores, the ELO system takes this as evidence that that player's rating is too low, and needs to be adjusted upward. Similarly when a player's actual tournament scores fall short of his expected scores, that player's rating is adjusted downward. Elo's original suggestion, which is still widely used, was a simple linear adjustment proportional to the amount by which a player outperformed or underperformed his expected score. The maximum possible adjustment per game (sometimes called the K-value) was set at K=16 for masters and K=32 for weaker players.

Supposing Player A was expected to score <math>E_A</math> points but actually scored <math>S_A</math> points. The formula for updating his rating is

This update can be performed after each game or each tournament, or after any suitable rating period. An example may help clarify. Suppose Player A has a rating of 1613, and plays in a five-round tournament. He loses to a player rated 1609, draws with a player rated 1477, defeats a player rated 1388, defeats a player rated 1586, and loses to a player rated 1720. His actual score is (0 + 0.5 + 1 + 1 + 0) = 2.5. His expected score, calculated according the formula above, was (0.506 + 0.686 + 0.785 + 0.539 + 0.351) = 2.867. Therefore his new rating is (1613 + 32*(2.5 - 2.867)) = 1601.

Note that while two wins, two losses, and one draw may seem like a par score, it is worse than expected for Player A because his opponents were lower rated on average. Therefore he is slightly penalized. If he had scored two wins, one loss, and two draws, for a total score of three points, that would have been slightly better than expected, and his new rating would have been (1613 + 32*(3 - 2.867)) = 1617.

This updating procedure is at the core of the ratings used by FIDE, USCF, Yahoo Games, the ICC, and FICS. However, each organization has taken a different route to deal with the uncertainty inherent in the ratings, particularly the ratings of newcomers, and to deal with the problem of ratings inflation/deflation. New players are typically assigned provisional ratings which are adjusted more drastically than established ratings, and various methods (none completely successful) have been devised to inject points into the rating system so that ratings from different eras are roughly comparable.

The principles used in these rating system can be used for rating other competitions - for instance, international football matches.