data/help/glicko


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219


       +-------------------------------------------------+
       |   Vek-splanation of the Glicko Ratings System   |
       +-------------------------------------------------+

As you may have noticed, each FICS player now has a rating and an RD.  
 
RD stands for "ratings deviation".
 
Why a new system
----------------
 
The new system with the RD improves upon the binary categorization that was
used before on fics and elsewhere, where players with fewer than 20 games were
labeled"provisional" and others were labeled "established".  Instead of two
separate ratings formulas for the two categories, there is now a single
formula incorporating the two ratings and the two RD's to find the ratings
changes for you and your opponent after a game.

What RD represents
------------------
 
The Ratings Deviation is used to measure how much a player's current rating
should be trusted.  A high RD indicates that the player may not be competing
frequently or that the player has not played very many games yet at the
current rating level.   A low RD indicates that the player's rating is fairly
well established.  This is described in more detail below under "RD
Interpretation".

How RD Affects Ratings Changes
------------------------------

In general, if your RD is high, then your rating will change a lot each time
you play.  As it gets smaller, the ratings change per game will go down. 
However, your opponent's RD will have the opposite effect, to a smaller
extent: if his RD is high, then your ratings change will be somewhat smaller
than it would be otherwise.

A further use of RD's:
----------------------

Vek asked Mark Glickman the following:

> Given player one with rating r1, error s1,
> and player two with r2 and s2, do you have a formula for the probability
> that player 1's "true" rating is greater than player 2's ?

Mark said:
 
  Yes - it's:
 
  1/(1 + 10^(-(r1-r2)f(sqrt(s1^2 + s2^2))/400) )
 
  where f(s) is [the function applied to RD in Step 2 below].

How RD is Updated
-----------------

In this system, the RD will decrease somewhat each time you play a game,
because when you play more games there is a stronger basis for concluding what
your rating should be.  However, if you go for a long time without playing any
games, your RD will increase to reflect the increased  uncertainty in your
rating due to the passage of time.  Also, your RD will decrease more if your
opponent's rating is similar to yours, and decrease less your opponent's
rating is much different.

Why Ratings Changes Aren't Balanced
-----------------------------------

In the other system, except for provisional games, the ratings changes for the
two players in a game would balance each other out - if A wins 16 points, B
loses 16 points.  That is not the case with this system.  Here is the
explanation I received from Mark Glickman:

  The system does not conserve rating points - and with good
  reason!  Suppose two players both have ratings of 1700,
  except one has not played in awhile and the other playing
  constantly.  In the former case, the player's rating is not
  a reliable measure while in the latter case the rating is a fairly
  reliable measure.  Let's say the player with the uncertain rating
  defeats the player with the precisely measured rating.
  Then I would claim that the player with the imprecisely
  measured rating should have his rating increase a fair
  amount (because we have learned something informative from
  defeating a player with a precisely measured ability) and
  the player with the precise rating should have his rating
  decrease by a very small amount (because losing to a player
  with an imprecise rating contains little information).
  That's the intuitive gist of my extension to the Elo system.    
 
  On average, the system will stay roughly constant (by the
  law of large numbers).  In other words, the above scenario
  in the long run should occur just as often with the 
  imprecisely rated player losing.
 
Mathematical Interpretation of RD
---------------------------------

Direct from Mark Glickman:
 
Each player can be characterized as having a true (but unknown) rating that
may be thought of as the player's average ability.  We never get to know that
value, partly because we only observe a finite number of games, but also
because that true rating changes over time as a player's ability changes.  But
we can *estimate* the unknown rating.  Rather than restrict oneself to a
single estimate of the true rating, we can describe our estimate as an
*interval* of plausible values.  The interval is wider if we are less sure
about the player's unknown true rating, and the interval is narrower if we are
more sure about the unknown rating.  The RD quantifies the uncertainty in
terms of probability:
 
The interval formed by Current rating +/- RD contains your true rating with
probability of about 0.67.
 
The interval formed by Current rating +/- 2 * RD contains your true rating
with probability of about 0.95.
 
The interval formed by Current rating +/- 3 * RD contains your true rating
with probability of about 0.997.
 
For those of you who know something about statistics, these are not confidence
intervals, but are called "central posterior intervals" because the derivation
came from a "Bayesian" analysis of the problem.
 
These numbers are found from the cumulative distribution function of the
normal distribution with mean = current rating, and standard deviation = RD.   
For example, CDF[ N[1600,50], 1550 ] = .159  approximately (that's shorthand
Mathematica notation.)

The Formulas
------------

Algorithm to calculate ratings change for a game against a given opponent:
 
Step 1.  Before a game, calculate initial rating and RD for each player.
 
  a)  If no games yet, initial rating assumed to be 1720.
      Otherwise, use existing rating.  
      (The 1720 is not printed out, however.)
 
  b)  If no RD yet, initial RD assumed to be 350 if you have no games,
      or 70 if your rating is carried over from ICC.
      Otherwise, calculate new RD, based on the RD that was obtained
      after the most recent game played, and on the amount of time (t) that
      has passed since that game, as follows:
 
      RD' = Sqrt(RD^2 + c log(1+t))
 
      where c is a numerical constant chosen so that predictions made
      according to the ratings from this system will be approximately
      optimal.
 
Step 2.   Calculate the "attenuating factor" due to your OPPONENT's RD,
          for use in later steps.
 
       f =  1/Sqrt(1 + p RD^2)
 
          Here p is the mathematical constant 3 (ln 10)^2
                                             -------------
                                              Pi^2 400^2    .
 
          Note that this is between 0 and 1 - if RD is very big,
          then f will be closer to 0.
 
Step 3.   r1 <- your rating,
          r2 <- opponent's rating,
 
                    1
      E <-  ----------------------
                    -(r1-r2)*f/400     <- it has f(RD) in it!
              1 + 10
 
          This quantity E seems to be treated kind of like a probability.
 
Step 4.   K =               q*f
              --------------------------------------
               1/(RD)^2   +   q^2 * f^2 * E * (1-E)
 
          where q is a mathematical constant:  q = (ln 10)/400.
 
Step 5.   This is the K factor for the game, so
 
          Your new rating = (pregame rating) + K * (w - E)
 
          where w is 1 for a win, .5 for a draw, and 0 for a loss.
 
Step 6.   Your new RD is calculated as
 
          RD' =                     1
                  -------------------------------------------------
                  Sqrt(    1/(RD)^2   +   q^2 * f^2 * E * (1-E)   )  . 
 
The same steps are done for your opponent.
 
Further information
-------------------
 
A PostScript file containing Mark Glickman's paper discussing this ratings
system may be obtained via ftp.  The ftp site is hustat.harvard.edu, the
directory is /pub/glickman, and the file is called "glicko.ps".  It is
available at http://hustat.harvard.edu/pub/glickman/glicko.ps.

Credits
-------
 
The Glicko Ratings System was invented by Mark Glickman, Ph.D. who is
currently at the Harvard Statistics Department, and who is bound for Boston
University.

Vek and Hawk programmed and debugged the new ratings calculations (we may
still be debugging it).  Helpful assistance was given by Surf, and Shane fixed
a heinous bug that Vek invented.
 
Vek wrote this helpfile and Mark Glickman made some essential
corrections and additions.

  Last major update: April 19, 1995.
  Minor revisions: August 28, 1995 by Friar.