Ranking in Swiss System Chess Team Tournaments
Ranking in Swiss System Chess Team Tournaments
net/publication/280221155
CITATIONS READS
53 2,619
1 author:
Laszlo Csato
Corvinus University of Budapest
119 PUBLICATIONS 1,012 CITATIONS
SEE PROFILE
All content following this page was uploaded by Laszlo Csato on 04 January 2017.
Budapest, Hungary
January 4, 2017
Abstract
The paper suggests a family of paired comparison-based scoring procedures for
ranking the participants of a Swiss system chess team tournament. We present the
challenges of ranking in Swiss system, the features of individual and team compe-
titions as well as the failures of the official rankings based on lexicographical order.
The tournament is represented as a ranking problem such that the linearly-solvable
row sum (score), generalized row sum, and least squares methods have favourable
axiomatic properties.
Two chess team European championships are analysed as case studies. Final
rankings are compared by their distances and visualized with multidimensional scal-
ing (MDS). Differences to the official ranking are revealed by the decomposition
of the least squares method. Rankings are evaluated by prediction power, retro-
dictive performance, and stability. The paper argues for the use of least squares
method with a results matrix favouring match points on the basis of its relative
insensitivity to the choice between match and board points, retrodictive accuracy,
and robustness.
JEL classification number: D71
AMS classification number: 15A06, 91B14
Keywords: Paired comparison; ranking; linear system of equations; Swiss system;
chess
1 Introduction
Chess tournaments are often organized in the Swiss system when there are too many
participants to play a round-robin tournament. They go for a predetermined number of
∗
e-mail: laszlo.csato@uni-corvinus.hu
1
rounds, in each round two players compete head-to-head. None of them are eliminated
during the tournament, but there are pairs of players without a match between them. Let
us denote the number of rounds by c and the number of participants by n.
Two emerging issues in Swiss system tournaments are how to pair the players and how
to rank the participants on the basis of their respective results. The pairing algorithm
aims to pair players with a similar performance, measured by the number of their wins
and draws (see FIDE (2015) for details).
The ranking involves two main challenges. The first one is the possible appearance of
circular triads when player i has won against player j, player j has won against player
k, but player k has won against player i. The second issue arises as the consequence of
incomplete comparisons since c < n − 1. For example, if player i has played only against
player j, then its rank probably should depend on the results of player j.
The final ranking of the players is usually determined by the aggregated number of
points scored: the winner of a match gets one point, the loser gets zero points, and a draw
means half-half points for both players. However, it usually does not result in a linear
order (a complete, transitive and antisymmetric binary relation) of the participants.1 Ties
are eliminated by the sequential application of various tie-breaking rules (FIDE, 2015).
These ranking(s), based on lexicographical orders, will be called official ranking(s).
They can differ in the tie-breaking rules.
Official rankings are not able to solve the problem caused by different schedules
as players with weaker opponents can score the same number of points more easily
(Brozos-Vázquez et al., 2010; Csató, 2012, 2013; Forlano, 2011; Jeremic and Radojicic,
2010; Redmond, 2003). It turns out that players with an improving performance during
the tournament are preferred contrary to players with a declining one. Consider two play-
ers i and j with an equal number of points after playing some rounds. Player i is said
to be on the inner circle if it scored more points in some of the first rounds relative to
player j who is said to be on the outer circle. Since they have played against opponents
with a similar number of points in each round, it is probable that player j has met with
weaker opponents. Tie-breaking rules may take the performance of opponents into ac-
count but a similar problem arises if player j has marginally more points than player i as
a lexicographical order is not continuous.
This is known to be an inherent defect of these systems. In fact, players sometimes
may deliberately seek for a draw or defeat in the first (Forlano, 2011). It is tolerated
because the case concerns very rarely the winner of the tournament, at least with an
adequate number of rounds.
Nevertheless, some works have aimed to improve on ranking in Swiss system tourna-
ments. Redmond (2003) presents a generalization of win-loss ratings by accounting for the
strength of schedule. Brozos-Vázquez et al. (2010) argue for the use of recursive methods
(as a tie-breaking rule) in Swiss system chess tournaments. Forlano (2011) shows a way
to correct the points for wins and draws in order to derive a more legitimate ranking.
The current paper will attempt to solve the problem through the use of paired comparison-
based ranking procedures. In a sense, it is a return to the origins of this line of research
as early works were often inspired by chess tournaments (Landau, 1895, 1914; Zermelo,
1929).
We use a parametric family of scoring procedures based on linear algebra, the gener-
alized row sum (Chebotarev, 1989, 1994) as well as the least squares method, which was
1
In c rounds the number of match points can be an integer or a half-integer between 0 and c, so
there always will be players with equal score if n > 2c + 1.
2
extensively used for sport rankings (Leeflang and van Praag, 1971; Stefani, 1980). Despite
the issue has been touched by Csató (2013), here a deeper methodological foundation will
be given for the problem and the evaluation of rankings will be revisited.
The analysis is based on some recent results. González-Dı́az et al. (2014) have dis-
cussed the axiomatic properties of generalized row sum and least squares, Csató (2015a)
has given an interpretation for the least squares method, while Can (2014) has contributed
to the choice of distance functions between rankings.
In order to avoid the prominent role of colour allocation in individual championships,
the discussion focuses on team tournaments, where a match between two teams is played
on 2t boards such that t players of a team play with white and the other t players of the
team play with black. The winner of a game on a board gets 1 point, the loser gets 0
points, and the draw yields 0.5 points for both teams, thus 2t board points are allocated
in a given match. The winner team achieving more (at least t + 0.5) board points scores 2
match points, the loser 0, while a draw results in 1 match point for both teams. Therefore
one can choose between the board points and the match points as the basis of the official
ranking. Recently the use of match points is preferred in chess olympiads and team
European championships.
The paper is structured as follows. Section 2 shortly outlines the ranking problem,
ranking methods and their relevant properties. It also aims to incorporate Swiss system
chess team tournaments into this framework. The proposed model is applied in Section 3
to rank the participants in the 2011 and 2013 European Team Chess Championship open
tournaments. Twelve rankings, distinguished by the influence of opponents’ performance
and match versus table points, are introduced and compared on the basis of their distances.
They are visualized with multidimensional scaling (MDS), while differences to the official
ranking are revealed by the decomposition of the least squares method.
On the basis of these results, we argue for the use of least squares method with a
generalized result matrix favouring match points. It is supported by several arguments,
variability of the ranking with respect to the role of match and board points as well as
retrodictive performance (the ability to match the outcomes of matches already played)
and robustness (stability of the ranking between two subsequent rounds).
Finally, Section 4 summarizes the findings and review possible extensions of the model.
A reader familiar with ranking problems (González-Dı́az et al., 2014; Csató, 2015a) may
skip Subsections 2.1 and 2.2.
2
In most practical applications (including ours) the condition mij ∈ N means no restriction. Modifi-
cation of the domain to R+ has no impact on the results but the discussion becomes more complicated.
This generalization has some significance for example in the case of forecasting sport results when the
latest comparisons may give more information about the current form of a player.
3
the total number of comparisons of object i and d = max{di : i ∈ N} be the maximal
number of comparisons with the other objects. Let m = max{mij : i, j ∈ N}.
The results matrix R = (rij ) ∈ Rn×n contains the outcome of comparisons between the
objects, and is skew-symmetric (R⊤ = −R). All elements are limited by rij ∈ [−mij , mij ].
(rij + mij )/(2mij ) ∈ [0, 1] may be regarded as the likelihood that object i defeats j. Then
rij = mij means that i is perfectly better than j, and rij = 0 corresponds to an undefined
relation (if mij = 0) or to the lack of preference (if mij > 0) between the two objects. A
ranking problem is given by the triplet (N, R, M). Let Rn be the class of ranking problems
with |N| = n.
A ranking problem is called round-robin if mij = 1 for all i 6= j, that is, every object
has been compared exactly once with all of the others. A ranking problem is called
balanced if di = dj for all i, j = 1, 2, . . . , n, that is, every object has the same number of
comparisons.
Row sum will also be referred to as scores, s is sometimes called the scores vector. It
does not take the comparison multigraph into account.
(I + εL)x(ε) = (1 + εmn)s,
4
row sum by accounting for the performance of objects compared with it, and so on. ε
indicates the importance attributed to this correction. It follows from the definition that
limε→0 x(ε) = s for all ranking problems (N, R, M).
Both the row sum and the generalized row sum ratings are well-defined and can be
obtained from a system of linear equations for all ranking problems since (I + εL) is
positive definite for any ε ≥ 0.
In our model the outcome of paired comparisons is restricted by −m ≤ rij ≤ m for
all i, j ∈ N. Then Chebotarev (1994, Proposition 5.1) argues that the reasonable upper
bound of ε is 1/ [m(n − 2)].
hij = rij / min{mij ; 1} ∈ [−1, 1]3 may be identified as the normalized difference be-
tween the latent ratings qi and qj of objects i and j. Then it makes sense to choose q in
order to minimize the error according to an appropriate objective function.
Definition 3. Least squares method: q(N, R, M) : Rn → Rn such that it is the solution
to the problem
minn mij [hij − (qi − qj )]2
q∈R
⊤
satisfying e q = 0.
The normalization e⊤ q = 0 is necessary because the value of the objective function is
the same for q and q + βe, β ∈ R.
The least squares ranking method is well-known in a lot of fields, a review about its
origin is given by González-Dı́az et al. (2014) and Csató (2015a). It has strong connections
to generalized row sum.
Proposition 1. The least squares rating can be obtained as a solution of the linear system
of equations Lq = s and e⊤ q = 0 for all ranking problems (N, R, M).
Proof. See Csató (2015a, p. 57).
Lemma 1. For all ranking problems (N, R, M), the least squares method is equivalent to
the limit of generalized row sum if ε → ∞ since limε→∞ x(ε) = mnq.
Proof. See Chebotarev and Shamis (1998, p. 326) and Csató (2016).
Proposition 2. The least squares rating q(N, R, M) is unique if and only if comparison
multigraph G of the ranking problem is (N, R, M) connected.
Proof. See Csató (2015a, p. 59).
Note that in the case of an unconnected comparison multigraph there are independent
ranking problems.
A graph-theoretic interpretation of the generalized row sum method is given by Shamis
(1994). Csató (2015a) provides the following iterative decomposition of least squares.
Proposition 3. Let the comparison multigraph of a ranking problem (N, R, M) be con-
nected and not regular bipartite. Then the unique solution of the least squares problem is
q = limk→∞ q(k) where
q(0) = (1/d)s,
k
1 1
q(k) = q(k−1) + (dI − L) s (k = 1, 2, . . . ).
d d
3
min{mij ; 1} is written in order to avoid division by zero.
5
2.3 Two properties of scoring procedures
In order to argue for the use of these methods we discuss some axioms.
Multiplier k cannot be too large since −mij ≤ krij ≤ mij should be satisfied for all
i, j ∈ N according to the definition of the results matrix. k ≤ 1 is always allowed.
Definition 5. Scale invariance (SI) (Csató, 2014): Let (N, R, M), (N, kR, M) ∈ Rn
be two ranking problems such that (N, kR, M) is obtained from (N, R, M) through an
admissible transformation of the results. Scoring procedure f : Rn → Rn is scale invariant
if fi (N, R, M) ≥ fj (N, R, M) ⇔ fi (N, kR, M) ≥ fj (N, kR, M) for all i, j ∈ N.
Lemma 2. The row sum, generalized row sum and least squares methods satisfy SI.
A score consistent method is equivalent to the row sum method in the case of round-
robin ranking problems. A similar requirement is mentioned by Zermelo (1929) and David
(1987, Property 3).
Remark 1. Regarding the generalized row sum method, Chebotarev (1994, Property 3)
introduces a more general axiom called agreement: if (N, R, M) ∈ Rn is a round-robin
ranking problem, then x(ε)(N, R, M) = s(N, R, M).
Lemma 3. Row sum, generalized row sum and least squares methods satisfy SCC.
Proof. For generalized row sum, see Remark 1. In the case of least squares the proof is
given by González-Dı́az et al. (2014, Proposition 5.3).
Further properties of the scoring procedures are discussed by González-Dı́az et al.
(2014) and Csató (2014).
6
2.4 Interpretation of Swiss system chess team tournaments as
a ranking problem
In order to use the scoring procedures presented above, the chess tournament should be
formulated as a ranking problem:
• Set of objects N consists of the teams of the competition;
7
Proof. di = c for all i ∈ N, hence s(N, RBP , M) = bp − cte.
A crucial argument for the application of paired comparison-based ranking methodol-
ogy is provided by the following result.
• Generalized row sum and least squares methods applied on the match points
based results matrix are equivalent to the match points ranking:
xi (ε)(RM P ) ≥ xi (ε)(RM P ) ⇐⇒ qi (RM P ) ≥ qj (RM P ) ⇐⇒ mpi ≥ mpj .
• Generalized row sum and least squares methods applied on the board points based
results matrix are equivalent to the board points ranking:
xi (ε)(RBP ) ≥ xi (ε)(RBP ) ⇐⇒ qi (RBP ) ≥ qj (RBP ) ⇐⇒ bpi ≥ bpj .
Proof. In the case of round-robin problems, generalized row sum and least squares are
equivalent to the row sum method due to axiom SCC (Lemma 3), hence Lemmata 5 and
6 provide the statement.
Generalized row sum and least squares methods take the opponents of each team into
account. Due to Theorem 1, they result in the official ranking without tie-breaking rules
in the ideal round-robin case. When the official ranking is based on match points, the
transformation RM P is recommended. Generalized results matrix with a small (i.e. close
to 0) parameter λ gives a similar outcome but it reflects the number of board points, the
magnitude of wins and losses. This effect becomes more significant as λ increases. RBP
extends the board points ranking to Swiss system competitions.
8
3.1 Examples and implementation
The method proposed in Section 3 is illustrated with an analysis of two chess team tour-
naments:
60
50
Number of matches
40
30
20
10
0
2:2 2.5 : 1.5 3:1 3.5 : 0.5 4:0
Result
Distribution of match results for ETCC 2013 is drawn in Figure 1. Minimal victory
(2.5 : 1.5) is the mode, so incorporating board points will not influence the rankings much.
There are two exogenous rankings called Official according to the tournament rules
and Start based on Élő points of players, reflecting the past performance of team members.
5
Match results can be found in Tables A.1 (2011) and A.2 (2013), and – in another form – in
Tables A.3 (2011) and A.4 (2013) of Csató (2016).
9
Further 12 rankings have been calculated from the ranking problem representation. Four
results matrices have been considered: RM P , RM B = RP (1/4) = 3/4 RM P + 1/4 RBP ,
RBM = RP (2/3) = 1/3 RM P + 2/3 RBP and RBP . They were plugged into three methods,
least squares (LS) and generalized row sum with ε1 = 1/324 (GRS1 ) and ε2 = 1/6
(GRS2 ). Note that ε1 is smaller and ε2 is larger than the reasonable upper bound of 1/36.
Existence of a unique least squares solution requires connectedness of the comparison
multigraph (Proposition 2), which is provided after the third round. Rankings in the first
two rounds are highly unreliable, therefore they were eliminated. From the third round
all methods give one, thus 7 × 13 + 1 = 92 rankings will be analysed as Start remains
unchanged.6
Start and Official rankings are strict, that is, they do not allow for ties by definition.
It can be checked that the other rankings also give a linear order of teams in all cases.
Notation 2. The 14 final rankings are denoted by Start, Official; GRS1 (RM P ), GRS1 (RM B ),
GRS1 (RBM ), GRS1 (RBP ); GRS2 (RM P ), GRS2 (RM B ), GRS2 (RBM ), GRS2 (RBP ); and
LS(RM P ), LS(RM B ), LS(RBM ), LS(RBP ). In the figures they are abbreviated by Start,
Off; G1, G2, G3, G4; S1, S2, S3, S4; and L1, L2, L3, L4, respectively.
Kemeny distance was characterized by Kemeny and Snell (1962), however, Can and Storcken
(2013) achieved the same result without one condition. Can and Storcken (2013) also pro-
vides an extensive overview about the origin of this measure.
According to Example 1, the dissimilarity between a ≻ b ≻ c and b ≻ a ≻ c and
between a ≻ b ≻ c and a ≻ c ≻ b by the Kemeny distance is identical. However, in our
chess example a disagreement at the top of the rankings may be more significant than a
disagreement at the bottom since the audience is usually interested in the first three, five
or ten places but people are not bothered much whether a team is the 31st or 34th.
For this purpose, Can (2014) proposes some functions on strict rankings in the spirit
of Kemeny metric. They are respectful to the number of swaps but allow for variation
in the treatment of different pairs of disagreements by weighting them according to an
exogenous weight vector. It has some price since the calculation will depend on the order
of swaps between the two rankings. Can (2012, Theorem 1) shows that only the path-
minimizing function satisfies the triangular inequality condition for all possible weight
vectors. Finding the path-minimizing metric is not trivial, it is equivalent to solving a
6
Rankings according to different methods are displayed in Csató (2016, Tables A.5 (2011) and
A.6 (2013)).
10
short-path problem in general, but the solution is known if the weights are monotonically
decreasing (increasing) from the upper parts of a ranking to the lower parts.7
These results have inspired us to choose a monotonically decreasing weight vector
meaning that swaps in the first places are more important than changes at the bottom of
the rankings.
Definition 12. The weight vector of our weighted distance is given by ωi = 1/i for all
i = 1, 2, . . . , n − 1.
Lemma 7. The maximum of Kemeny distance is n(n − 1)/2(= 703) and the maximum
of weighted distance is n − 1(= 37) if and only if the two rankings are entirely opposite.
Proof. The maximal number of swaps between two rankings is n(n − 1)/2 in the case of
two entirely opposite rankings, which is also their Kemeny distance.
Take the ranking a1 ≻ a2 ≻ · · · ≻ an . The winners’ decomposition (Can, 2014, Example 2)
first permutes an to the first place, which involves one swap in each position from the
first to the (n − 1)th, contributing by 1 + 1/2 + . . . 1/(n − 1) to the weighted distance.
Thereafter, it permutes an−1 to the second place, which involves one swap in each position
from the second to the (n − 1)th, contributing by 1/2 + . . . 1/(n − 1) to the weighted
distance, and so on. Thus the total weighted distance of two entirely opposite rankings
is 1 × 1 + 2 × 1/2 + 3 × 1/3 + · · · + (n − 1) × 1/(n − 1) = n − 1.
We do not know about any other application of Can (2014)’s novel method.
Distances of rankings of ETCC 2011 is presented in Table 1. All Kemeny distances
are significantly smaller than its maximum of 703 for entirely opposite rankings. Largest
values usually occur in comparison with Start since the latter is not influenced by the
results. However, rankings based on match points and board points are also relatively far
from each other. Official coincides with GRS1 (RM B ).
Weighted distances are presented in Table A.1.b. Its maximum is 37. Ratio of Kemeny
and weighted distances are between 8.73 and 17.44 for ETCC 2011, and between 5.81 and
18.73 for ETCC 2013. In the second case accounting for swaps’ positions has a larger
effect but the discrepancy of the two distances remains smaller than expected, that is,
variations are more or less equally distributed along the rankings.
The ranking from GRS1 (RM P ) means a kind of tie-breaking rule for match points both
in ETCC 2011 and ETCC 2013: generalized row sum gives the match points ranking for
ε = 0, while a small increase in the parameter breaks ties among teams according to the
strength of their opponents. The official ranking also aims to eliminate ties, although it
uses a different approach.
The pairwise distances of 14 rankings can be plotted in a 13-dimensional space without
loss of information but it still seems to be unmanageable. Therefore multidimensional
scaling (Kruskal and Wish, 1978) has been applied, similarly to Csató (2013). It is a
statistical method in information visualization for exploring similarities or dissimilarities
in data, a textbook application of MDS is to draw cities on a map from the matrix
consisting of their air distances.
7
Then the path-minimizing metric is equivalent to winners’ and losers’ decomposition (the Lehmer
function and the inverse Lehmer function), respectively (Can, 2014, Corollaries 1 and 2).
11
Table 1: Distances of rankings, ETCC 2011
(a) Kemeny distance: a swap in the kth position has a weight of 1
)
)
)
1 (R M B
2 (R M B
1 (R B M
2 (R B M
1 (R M P
2 (R M P
1 (R B P
2 (R B P
)
)
R B
R M
R P
R P
cial
LS ( M
LS ( M
LS ( B
t
LS ( B
Star
Offi
GRS
GRS
GRS
GRS
GRS
GRS
GRS
GRS
Start 107 100 98 100 107 99 96 110 93 93 130 99 85
Official 107 37 45 73 0 38 69 25 34 60 71 52 60
GRS1 (RM P ) 100 37 16 44 37 13 42 62 31 43 108 61 53
GRS2 (RM P ) 98 45 16 28 45 7 26 70 27 29 114 67 45
LS(RM P ) 100 73 44 28 73 35 8 94 47 21 130 81 41
GRS1 (RM B ) 107 0 37 45 73 38 69 25 34 60 71 52 60
GRS2 (RM B ) 99 38 13 7 35 38 33 63 20 32 107 60 40
LS(RM B ) 96 69 42 26 8 69 33 88 41 13 122 73 33
GRS1 (RBM ) 110 25 62 70 94 25 63 88 49 79 46 41 71
GRS2 (RBM ) 93 34 31 27 47 34 20 41 49 30 87 40 26
LS(RBM ) 93 60 43 29 21 60 32 13 79 30 111 60 20
GRS1 (RBP ) 130 71 108 114 130 71 107 122 46 87 111 57 97
GRS2 (RBP ) 99 52 61 67 81 52 60 73 41 40 60 57 44
LS(RBP ) 85 60 53 45 41 60 40 33 71 26 20 97 44
(b) Weighted distance: a swap in the kth position has a weight of 1/k
)
)
)
)
1 (R M B
2 (R M B
1 (R B M
2 (R B M
1 (R M P
2 (R M P
1 (R B P
2 (R B P
)
)
R B
R M
R P
R P
cial
LS ( M
LS ( M
t
LS ( B
LS ( B
Star
Offi
GRS
GRS
GRS
GRS
GRS
GRS
GRS
GRS
Start 10.79 9.68 9.60 9.39 10.79 9.54 9.15 11.30 9.45 8.66 12.16 10.05 8.10
Official 10.79 3.04 3.67 6.33 0.00 3.08 6.12 2.09 2.74 5.62 6.55 4.89 5.58
GRS1 (RM P ) 9.68 3.04 1.02 3.80 3.04 0.75 3.73 5.04 2.29 3.80 9.41 5.87 4.39
GRS2 (RM P ) 9.60 3.67 1.02 2.80 3.67 0.60 2.73 5.66 2.24 2.87 9.97 6.36 4.15
LS(RM P ) 9.39 6.33 3.80 2.80 6.33 3.39 0.53 8.07 4.36 1.53 9.94 6.20 3.01
GRS1 (RM B ) 10.79 0.00 3.04 3.67 6.33 3.08 6.12 2.09 2.74 5.62 6.55 4.89 5.58
GRS2 (RM B ) 9.54 3.08 0.75 0.60 3.39 3.08 3.31 5.09 1.65 3.27 9.42 5.80 3.69
LS(RM B ) 9.15 6.12 3.73 2.73 0.53 6.12 3.31 7.74 4.04 1.00 9.48 5.71 2.49
GRS1 (RBM ) 11.30 2.09 5.04 5.66 8.07 2.09 5.09 7.74 4.10 7.20 4.48 3.96 6.58
GRS2 (RBM ) 9.45 2.74 2.29 2.24 4.36 2.74 1.65 4.04 4.10 3.29 8.01 4.23 2.98
LS(RBM ) 8.66 5.62 3.80 2.87 1.53 5.62 3.27 1.00 7.20 3.29 8.79 4.79 1.49
GRS1 (RBP ) 12.16 6.55 9.41 9.97 9.94 6.55 9.42 9.48 4.48 8.01 8.79 4.53 7.78
GRS2 (RBP ) 10.05 4.89 5.87 6.36 6.20 4.89 5.80 5.71 3.96 4.23 4.79 4.53 3.64
LS(RBP ) 8.10 5.58 4.39 4.15 3.01 5.58 3.69 2.49 6.58 2.98 1.49 7.78 3.64
Kemeny and weighted distances are measured on a ratio scale due to the existence of a
natural minimum and maximum. Then discrepancies of the reduced dimensional map are
linear functions of the original distances. Both Stress and RSQ tests for validity strengthen
that two dimensions are sufficient to plot the 14 rankings, however, one dimension is too
restrictive. The method produces a map where only the position of objects count, more
similar rankings are closer to each other. Only the distances of points representing the
rankings yield information, the meaning of the axes remains obscure.
MDS maps reinforce the conjecture from Table 1 that Start is far away from all other
rankings (see Csató (2016, Figure 2)). Thus Start ranking is omitted from further analysis,
which improves the mapping, too.
There is not much difference between the four charts (ETCC 2011 vs. 2013, Kemeny
vs. weighted distances). MDS procedures of ETCC 2013 and Kemeny distances have
more favourable validity measures than MDS procedures of ETCC 2011 and weighted
distances. They reveal the following results shown by Figure 2:
1. Start significantly differs from all other rankings since it does not depend on the
results of the tournament;
2. Generalized row sum rankings (with low λ) are more similar to the official one than
least squares;
12
Figure 2: MDS maps of rankings, ETCC 2013
(a) Kemeny distance, without Start
⋄
L4
×
✠ S4 ⋄ L2
G4 L3 ⋄
⋄
× S2 L1
S3 ××
• ✠ S1
✠ ✠ Off G1
G3 G2
⋄
L4
⋄ L2
✠ × L3 ⋄⋄
G4 S4 L1
S3 S2
× ×× S1
• ✠
✠ ✠ Off G1
G3 G2
3. The order of results matrices by the variability of rankings for a given scoring
method is RM P < RM B < RBM < RBP , a greater role of match points (smaller λ)
stabilizes the rankings;
4. The order of scoring procedures by the variability of rankings for a given results
matrix is LS < GRS2 < GRS1 , a greater influence of opponents’ results stabilizes
the rankings;
5. The effect of tie-breaking rule for match points is not negligible (Off and G1 are
not very close to each other).
On the basis of these observations, the application of least squares with a generalized
results matrix favouring match points (a low λ, for example, 1/4 as in RM B ) is proposed
for ranking in Swiss system chess team tournaments. It gives an incentive to score more
board points but still prefers match points.
13
3.3 Analysis of a ranking
The decomposition of the least squares rating (Csató, 2015a) offers another approach to
compare the rankings. The ranking problem is balanced and the comparison multigraph is
regular, therefore Proposition 3 can be applied. In the zeroth step (q(0) ) it gives the match
points ranking, the official ranking without the application of tie-breaking rules. After
that, the iterated ratings reflect the strength of opponents, opponents of opponents and so
on by accounting for their average match points as dI − L = M. A ranking equivalent to
q(RM P ) is obtained after the seventh (from q(7) (RM P )) and after the twelfth step (from
q(12) (RM P )) in the case of ETCC 2011 and ETCC 2013, respectively.
Table 2 shows the changes of teams’ positions in each step of the decomposition of the
ranking LS(RM P ) for ETCC 2013. In the second column (q(0) ), ties are broken according
to the official rules, so it coincides with the official ranking. In subsequent steps there are
no ties. The last change is a swap of Turkey and Montenegro in the twelfth step of the
iteration. The least squares method is far from being only a tie-breaking rule for match
points (contrary to generalized row sum with ε1 = 1/324), a team may overtake another
one despite its disadvantage of two match points.
Correction according to opponents’ strength results in seven positions improvement
for Slovenia together with a four positions decline for Romania and six for Netherlands.
Hence Slovenia overtakes Netherlands despite it has a two match points disadvantage.8
Subsequent steps of the iteration usually result in a similar direction of swaps, however,
in a more moderated extent. A notable exception is Romania, regaining some positions
due to indirect opponents. The monotonic decrease of absolute adjustments is violated
only by Lithuania.
There are two changes among the top six teams. France becomes the winner of the
tournament after k = 2 instead of Azerbaijan. It can be debated since the latter team
has no loss, however, the schedule of France was more difficult. The swap of Russia
and Armenia may be explained by the advance on an outer circle of the former team (i.e.
Russia had a worse performance than Armenia during the first rounds of the tournament).
Imperfection of the official ranking is further highlighted by ETCC 2011, for which Ta-
ble 3 contains the positional changes according to the iterative decomposition of LS(RM P ).
Here France scored three wins and three draws in the first six rounds but it has been de-
feated three times after that, presenting an extreme example of advance on an inner circle.
Thus France had a more challenging schedule compared to teams with the same number
of match points, reflected in the significant adjustment by the least squares method.
On the other side, Serbia loses nine, and Georgia loses 14 positions. They had luck
with the opponents, for example, Georgia had not played against a better team according
to the official ranking, which is quite strange for a team at the 13th place. Consequently,
both Serbia and Georgia significantly benefit from decreasing ε or increasing the role of
board points.
14
Table 2: Positional changes in the decomposition of LS(RM P ), ETCC 2013
Rank improvements and declines between the rankings from the corresponding q(k) rating are indicated
➔
➔
➔
by the arrows and , respectively. A number in brackets represent the same number of or arrows.
Lack of change is indicated by –.
Value of k in q(k)
Team
Off (0) 1 2 3 4 5 6 9 12 Cumulated LS (∞)
➔
Azerbaijan 1 – – – – – – – 2
➔
France 2 – – – – – – – 1
➔
Russia 3 – – – – – – – 4
➔
Armenia 4 – – – – – – – 3
Hungary 5 – – – – – – – – – 5
Georgia 6 – – – – – – – – – 6
➔➔
Greece 7 ➔ – – – – – – – 8
➔
Czech Rep. 8 – – – – – – 10
➔
➔➔
➔
Ukraine 9 – – – – – – 7
England 10 – ➔ – – – – – – 9
➔
➔
Netherlands 11 (6) – – – – – – – (6) 17
➔
➔➔
Italy 12 – – – – – – – 12
➔➔
➔
➔
➔
➔
➔
Serbia 13 – – – – – (6) 19
➔
➔
➔
Romania 14 (4) – – – – – 15
➔
➔
➔➔
➔
Belarus 15 – – – ➔ – – – (4) 11
➔➔
➔➔
➔➔
➔
Poland 16 – – – – – – 14
➔
Croatia 17 – – – – – – 16
➔➔
➔➔
➔➔
➔➔
Montenegro 18 – – – – – 21
➔
Spain 19 – – – – – – 22
➔
➔➔
➔
➔
Germany 20 – – – – – – 18
➔
➔
Slovenia 21 (7) – – – – – – (8) 13
➔➔
➔➔
➔➔
Poland Futures 22 – – – – – (4) 26
➔
➔
Turkey 24 – – – – – (4) 20
➔
➔
Bulgaria 25 – – – – – – – 23
➔➔
➔
➔
Sweden 26 – – – – – – 28
➔
➔
Denmark 27 – – – – – (5) 32
➔
➔
Israel 28 – – – – – (4) 24
➔
➔
➔
➔
➔
➔
Iceland 29 – – – – – – 31
➔
➔
➔➔➔
➔
Austria 30 – – – – – (5) 25
➔
➔
➔
Poland Goldies 31 – – – – – – 29
➔
➔
➔
Switzerland 32 – – – – – (5) 27
➔
Belgium 33 – – – – – – – 34
➔
Finland 34 – – – – – – – 33
Norway 35 – – – – – – – – – 35
Scotland 36 – – – – – – – – – 36
FYR Macedonia 37 – – – – – – – – – 37
Wales 38 – – – – – – – – – 38
The first two are standard aspects for the classification of mathematical ranking models
(Pasteur, 2010). However, for the ranking of a Swiss system tournament, the second is
much more important: the aim is to get a meaningful ranking on the basis of matches
already played, shown by in-sample fit.
15
Table 3: Positional changes in the decomposition of LS(RM P ), ETCC 2011
Rank improvements and declines between the rankings from the corresponding q(k) rating are indicated
➔
➔
➔
by the arrows and , respectively. A number in brackets represents the same number of or arrows.
Lack of change is indicated by –.
Value of k in q(k)
Team
Off (0) 1 2 3 4 5 7 8 Cumulated LS (∞)
➔
Germany 1 – – – – – – 2
➔
Azerbaijan 2 – – – – – – 1
➔
➔➔
➔
➔
➔➔
➔
Hungary 3 – – – – – – 6
Armenia 4 – – – – – – 5
➔
➔
➔
➔
Russia 5 – – – – – – 3
➔
➔
Netherlands 6 – – – – – 8
➔
➔
➔
➔
➔
➔
Bulgaria 7 ➔ – – – – – – 4
➔➔
➔➔
Poland 8 (6) – – – – (8) 16
➔
➔
Romania 9 – – – – 12
➔➔
➔
➔➔
➔➔
➔
Spain 10 – – – – – 7
Italy 11 ➔ – – – – – 9
➔➔
➔➔
➔
➔➔
Serbia 12 (7) – – – – – (9) 21
➔
➔
Georgia 13 (9) – (14) 27
➔
➔
➔
➔
Israel 14 – – – – – 15
➔➔➔
➔➔
➔➔
➔➔
➔
➔
Ukraine 15 – – – (4) 11
➔➔
➔➔
➔
➔
Czech Rep. 16 – – – – 14
➔
Slovenia 17 (6) – – – – – (4) 13
➔
➔
➔
➔
Moldova 18 – – – – – – 20
➔➔➔
➔
➔
➔
➔
➔
➔➔➔➔
France 19 (4) – – – (9) 10
➔➔➔
➔➔
➔➔➔
➔➔➔
Greece 20 – – – – – – 17
Croatia 21 – – – – – – 18
➔
➔
England 22 – – – – – 19
➔
➔➔
➔
➔
➔
Switzerland 23 (4) – – – – – 23
➔
➔
Latvia 24 – – – – – ➔ 22
➔➔
➔
➔
➔
Montenegro 25 – – – – – (4) 29
➔➔
➔
➔
Iceland 26 – – – – – – 28
➔
➔
➔
➔
➔➔
➔➔
Sweden 27 – – – 25
➔
➔
➔
Denmark 28 – – – 26
➔➔➔
➔➔
➔➔
Norway 29 – – – – – 31
➔
➔
FYROM 30 – – – – – – 33
➔
Finland 31 – – – – – – 32
➔
➔
➔
Austria 32 – – – – – 30
➔
➔
➔
➔
➔
The third measure, stability, seems to be important because of (at least) two causes.
First, both the participants and the audience may dislike if the rankings are volatile.
Naturally, extreme stability is not favourable, too, but it is usually not a problem in a
Swiss system tournament. The second argument for robustness may be that the number
of rounds is often determined arbitrarily, for instance, it was 13 in the 2006 and 11 in the
2013 chess olympiads with 148 and 146 teams, respectively.
Predictive and retrodictive performances are measured by the number of match and
board points scored by an underdog against a better team. It does not take into account
16
the difference of positions, only its sign.
Prediction power has a meaning only after the third round, when the comparison
multigraph becomes connected. Start has the most favourable forecasting performance
for the remaining matches, especially in the first rounds, that is, match outcomes are
determined by teams’ ability rather than by their results in the competition (Csató, 2016,
Figure A.1). There is also no difference among the methods in prediction power if only
the next round is scrutinized (Csató, 2016, Figure A.2).
The fact that Start ranking is the best for forecasting match results reflects the in-
significance of prediction precision for a Swiss system tournament ranking: after all, what
is the meaning to organize a contest if its final result is determined by teams’ ability?
Retrodictive performance has a meaning after the third round, too, however, it is also
defined after the last round when prediction power cannot be interpreted. Least squares
method seems to be the best from this point of view, despite its statistical significance
remains dubious (Csató, 2016, Figure A.3). Generalized row sum is placed between the
least squares and official rankings. Choice of the results matrix and the tournament does
not influence these findings.
Stability is defined as the distance of rankings in subsequent rounds. It has no meaning
for Start but can be calculated for all other rankings from the third round. Figure 3
illustrates the robustness of some rankings in ETCC 2011. Variability does not decrease
monotonically, but a solid decline is observed as the actual round gives relatively fewer and
fewer information. Ranking LS(RM P ) is the most robust according to both Kemeny and
weighted distances, followed by GRS2 (RM P ), then GRS1 (RM P ) and Official: rankings
become less volatile by taking into account the performance of opponents. Difference of
absolute values seems to be more significant in the case of weighted distance, the least
squares method is robust especially in the first, critical places. The order of variability
LS < GRS2 < GRS1 is valid for all other result matrices, however, GRS1 is sometimes
more volatile than the official ranking.
In the case of ETCC 2013, these conclusions are more uncertain but least squares
remains the most stable with the exception of first rounds (Csató, 2016, Figure A.4).
Readers interested in a somewhat more detailed analysis of the two tournaments are
encouraged to study Csató (2016).
To summarize, the least squares method gives the most robust and legitimate ranking.
Therefore, its application is also recommended if the organizers want to mitigate the
effects of the (predetermined) number of rounds on the ranking.
4 Discussion
The paper has given an axiomatic analysis of ranking in Swiss system chess team tour-
naments. The framework is flexible with respect to the role of the opponents (parameter
ε) and the influence of match and board points (choice of the results matrix). The sug-
gested methods are close to the concept of official rankings (they coincide in the case of
round-robin tournaments), can be calculated iteratively or by solving a system of linear
equations and have a clear interpretation on the comparison multigraph. They also do
not call for arbitrary tie-breaking rules.
The model is tested on the results of the 2011 and 2013 European Team Chess Cham-
pionship open tournaments, which supports the application of least squares method due
to its relative insensitivity to the choice between match and board points, retrodictive
accuracy and stability. There is an opportunity to take into account the number of board
17
Figure 3: Robustness (distance between subsequent rounds), ETCC 2011
(a) Kemeny distance, results matrix RM P
100
90
80
70
60
50
40
30
20
L1
S1
G1
Off
10
3−4 4−5 5−6 6−7 7−8 8−9
Off G1 S1 L1
2
L1
S1
G1
Off
1
3−4 4−5 5−6 6−7 7−8 8−9
Off G1 S1 L1
18
points scored by using a generalized results matrix favouring match points (small λ close
to zero). The findings confirm that the official rankings have significant failures, therefore
recursive methods, similar to generalized row sum and least squares, are worth to consider
for ranking purposes.
Naturally, the framework may have some disadvantages (Brozos-Vázquez et al., 2010):
a computer is needed in order to calculate the ranking of the tournament, and it will
be difficult for the players to verify and understand the whole procedure. However, we
agree with Forlano (2011) that ’The fact that players are not able to foresee the final
standing should not be considered a disadvantage but a way to force the players to play
each round as the decisive one.’ as well as ’The fact that the the players cannot replicate
the method manually should be seen of no significance.’ While the least squares method
is more complicated than usual tie-breaking rules, its simple graph interpretation (Csató,
2015a) and its similarity to an ’infinite Buchholz’ may help in the understanding.
Anyway, there usually exists a trade-off between simplicity and other favourable prop-
erties (sample fit, robustness), and the use of more developed methods is worth to consider
in the case of Swiss system tournaments in order to avoid anomalies of the ranking,9 such
as when a Hungarian commentator speaks about ’the curse of the Swiss system’.10 It is
not necessarily the mistake of Swiss system rather a failure of the official ranking, which
can be improved significantly by accounting for the strength of opponents.
Nevertheless, the choice between simplicity and more plausible rankings is not a mod-
elling issue. An alternative may be to use these methods only for tie-breaking purposes.
There are some obvious areas of future research. In the analysis several complications
observed have been neglected like matches played with black or white (an unavoidable
problem in individual tournaments) or different number of matches due to byes or un-
played games. The choice of parameter ε also requires further investigation. Our findings
can be strengthened or falsified by the examination of other competitions and simulations
of Swiss system tournaments.
Finally, two possible uses of the proposed ranking method are worth to mention. First,
it can be incorporated into the pairing algorithm, resulting in a more balanced schedules.
Second, extensive analysis of the stability of a ranking between subsequent rounds may
contribute to the choice of the number of rounds, which can be made endogenous as a
function of the number of participants and other restrictions.
Acknowledgements
We are grateful to two anonymous referees for their valuable comments and suggestions.
The research was supported by OTKA grant K 111797 and by the MTA Premium Post
Doctorate Research Program.
This research was partially supported by Pallas Athene Domus Scientiae Foundation. The
views expressed are those of the author’s and do not necessarily reflect the official opinion
of Pallas Athene Domus Scientiae Foundation.
9
An excellent example is Georgia’s 13th place in ETCC 2011 such that it have not played any teams
better according to the official ranking.
10
See at http://sakkblog.postr.hu/sokan-palyaznak-dobogos-helyezesre-izgalmas-utolso-fordulo-dont.
19
References
Brozos-Vázquez, M., Campo-Cabana, M. A., Dı́az-Ramos, J. C., and
González-Dı́az, J. (2010). Recursive tie-breaks for chess tournaments.
http://eio.usc.es/pub/julio/Desempate/Performance_Recursiva_en.htm.
Chebotarev, P. (1989). Generalization of the row sum method for incomplete paired
comparisons. Automation and Remote Control, 50(8):1103–1113.
Csató, L. (2014). Additive and multiplicative properties of scoring methods for prefer-
ence aggregation. Corvinus Economics Working Papers 3/2014, Corvinus University of
Budapest, Budapest.
Csató, L. (2015a). A graph interpretation of the least squares ranking method. Social
Choice and Welfare, 44(1):51–69.
20
ECU (2013). European Team Chess Championship 2013. Tournament Rules.
http://etcc2013.com/wp-content/uploads/2013/06/ETCC-2013-tournament-rules-June-06-2
ECU stands for European Chess Union.
Forlano, L. (2011). A new way to rank the players in a Swiss systems tournament.
http://www.vegachess.com/Missing_point_score_system.pdf.
González-Dı́az, J., Hendrickx, R., and Lohmann, E. (2014). Paired comparisons analysis:
an axiomatic approach to ranking methods. Social Choice and Welfare, 42(1):139–169.
Jeremic, V. M. and Radojicic, Z. (2010). A new approach in the evaluation of team chess
championships rankings. Journal of Quantitative Analysis in Sports, 6(3):Article 7.
Pasteur, R. D. (2010). When perfect isn’t good enough: Retrodictive rankings in col-
lege football. In Gallian, J. A., editor, Mathematics & Sports, Dolciani Mathematical
Expositions 43, pages 131–146. Mathematical Association of America, Washington, DC.
Stefani, R. T. (1980). Improved least squares football, basketball, and soccer predictions.
IEEE Transactions on Systems, Man, and Cybernetics, 10(2):116–123.
Zermelo, E. (1929). Die Berechnung der Turnier-Ergebnisse als ein Maximumproblem der
Wahrscheinlichkeitsrechnung. Mathematische Zeitschrift, 29:436–460.
21