Ranking Soccer Teams Using Graph Theory
Ranking Soccer Teams Using Graph Theory
Theory
Sukrit Tripathi
Sukrit.tripathi@gmail.com
1
Table of contents
1. Current Ranking System ......................................................................................................... 3
1.1 M: Points gained by as per the match result. ....................................................... 3
1.2 I: Importance of match ................................................................................................. 3
1.3 T: Strength of opposing team..................................................................................... 3
1.4 C: Strength of confederation ...................................................................................... 3
1.5 Ranking Formula ............................................................................................................ 4
12. Application of Graph Theory ............................................................................................... 5
13. Method of ranking teams .................................................................................................... 6
14. The Hamiltonian Program ................................................................................................... 8
15. The Solution ...................................................................................................................... 10
16. Conclusion ......................................................................................................................... 11
2
1. Current Ranking System
The points that are won from a match depend on the factors listed below:-
Whether the match is won or drawn. (M)
How important the match is. (I)
How strong the opposing team is in terms of the current rank.(T)
How strong the opposing team is as regards to the confederation it
belongs to.(C)
3
CONMEBOL 1.00
The members of CONMEBOL are as follows:
2. Argentina
3. Bolivia
4. Brazil
5. Chile
6. Colombia
7. Ecuador
8. Paraguay
9. Peru
10. Uruguay
11. Venezuela
UEFA 0.99
The members of UEFA are France, Spain, Germany, England and the other
European Countries. :
AFC/CAF/OFC/CONCACAF 0.85
We shall not look at these blocks of countries into too much detail, given their
lack of performance in important international competitions.
With all these considerations, we have a particular formula to calculate the given
number of points:
𝑃 = 𝑀𝑥𝐼𝑥𝑇𝑥𝐶
The ranking system takes the last 4 years of matches into account, with a
weightage of:
4 years ago: 20 % weight
3 years ago: 30 % weight
2 years ago: 50 % weight
Current year; 100 % weight
While the formula has been used to rank countries for nearly 17 years, it has
been criticized heavily for its oversimplification, neglecting factors such as team
form, injured players, etc.
In the Barclays Premier League, teams are awarded points in a similar system,
with 3 points for a win, 1 for a draw and 0 for a loss. This system of numbers is
what determines a champion after a season long competition, as teams play 38
matches in a season, with a match a week. With 20 teams in the premier league,
each team plays the other 19 teams, with one home match and one away match.
Being a die-hard soccer fan, I analyzed the system and found that there is no
particular ranking system besides the point system for the teams in the league.
While the international system has its own formula for ranking teams, based on
matches played, the premier league doesn’t. I felt the premier league was just
4
crying out loud for a way to rank the teams besides the points, as the point
system might not always be the most accurate. Some teams might not field their
strongest players for all 38 games, while some matches might be won or lost on a
poor referee decision. So, I decided to use a slightly different system that ranks
teams according to their results with all the teams. For example, let us say we
have teams A, B and C. If team A is the best, B is the second, and C is the worst, a
normal ranking system would rank A as 1st, B as 2nd, and C as 3rd. However, let us
say A loses to C due to some very bad refereeing, while B beats C and A beats B.
This would put A, B and C all on 3 points, which would indicate they are equally
good. So, we can see we need to find an alternative method to rank the teams to
validate the accuracy of the points system. This will be done using a field in math
called Graph Theory.
In mathematics and computer science, graph theory is the study of graphs, which
are structures that model relations between objects mathematically. Graph
theory is a part of discrete mathematic, which deals with mathematical
structures that are fundamentally discrete as opposed to continuous structures.
In graph theory, the graph itself is made of vertices or nodes, and there are lines
that connect these vertices. A graph can be undirected or directed. Undirected
graphs do not differentiate between vertices, while directed graphs require an
ordered set of vertices.
We will now see how these theoretical concepts can be applied to the system of
ranking to generate a new method to rank the teams.
5
13. Method of ranking teams
To start off, our data points include 20 teams from the premier league, with each
match of the team taken into consideration. We will attempt to create a graph of
all these teams, with 20 teams forming 20 nodes. Using the results, if team A
beats team B, a line will be drawn from node A to node B. By doing this, we will
create a directed graph involving all the teams. After creating the graph, we will
use a method of finding a Hamiltonian path, which will connect all the edges in
such a manner that we get a ranking system with higher teams in the end of the
path. This will be done using programming in Python, which makes the process
much easier.
For our input file for the program, we need the adjacency matrix for the graph of
all our teams.
Man
Chelsea Arsenal United Liverpool City
Chelsea x 1 1 1 1
Arsenal 0 x 1 0 0
ManUnited 0 0 x 0 0
Liverpool 0 0 0 x 1
City 0 1 1 0 x
This is a sample adjacency matrix for 5 teams, as 20 teams would be too much to
fit in the page.
The 1 means that the horizontal team beats the vertical team, and thus a line is
drawn from the node with the horizontal team to the node with the vertical
team.
6
With this graph, let us try to formulate the Hamiltonian path. Since this is a
directed graph, the Hamiltonian path can only move along the arrows. So, we
start off with Chelsea, as they have the most number of arrows leaving the node.
We then move on to Manchester United, as one arrow from Chelsea leads to
Manchester United. From there, we have no option to move, as Manchester
United has not won against any team. Since a Hamiltonian path must pass
through all the nodes once, we cannot take this route.
In our next try, we move from Chelsea to Liverpool. We then go from Liverpool
to City, City to Arsenal and Arsenal to United. Viola!. The path passes through all
of the nodes once, and follows all the directions the directed graph provides.
According to our path, the following teams are ranked as: -
1) Chelsea
2) Liverpool
3) City
4) Arsenal
5) United
In 2014, the season from which these results were taken, the teams were ranked
as: -
1) Chelsea
2) City
3) Arsenal
4) United
5) Liverpool
While the rankings are more or less similar, Liverpool seems to have the
biggest difference. This is because here we have taken only the top 5 teams,
while there are 20 teams in the British Premier League. So, the reason for the
difference is that while Liverpool performed admirably in the Premier
League, it did quite poorly against the other 15 teams, causing a disparity
7
between its ranking amongst just the top 5 teams and its position in the
Barclays Premier League.
adj=matrix([[0,1,1,1,1,1,0,1,0,1,1,1,0,1,1,1,1,1,1,1],[0,0,1,0,0,1,1,0,1,1,0,1,1,0,1,1,
0,1,1,1],[0,0,0,1,0,1,1,1,1,1,0,0,1,0,0,1,1,0,0,1],[0,1,0,0,1,0,1,0,0,0,1,1,1,1,1,0,1,0,1,
1],[0,1,1,0,0,1,1,1,1,1,1,0,0,1,1,1,1,1,1,1],[0,0,0,1,0,0,1,1,1,1,0,1,0,1,0,1,0,0,1,1],[1,
1,0,1,0,1,0,1,1,1,1,1,0,0,1,1,0,1,1,0],[0,1,0,1,0,0,0,0,1,1,0,1,1,0,1,1,0,1,0,1],[1,1,0,0,
0,0,0,0,0,1,0,1,1,0,0,1,0,0,0,1],[0,1,0,0,0,1,1,0,1,0,1,1,0,0,0,1,1,0,0,1],[0,1,1,0,0,0,0,
0,1,1,0,0,0,1,1,1,0,1,1,0],[0,1,0,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1],[1,0,0,0,0,1,1,0,1,1,
1,1,0,0,0,1,1,1,0,0],[0,0,0,0,0,1,0,0,1,0,1,0,1,0,0,1,0,0,1,1],[0,1,0,1,0,0,0,1,1,1,1,1,0,
0,0,1,0,0,0,1],[0,1,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0],[0,0,0,1,0,1,0,1,0,1,1,1,1,1,1,1,
0,0,1,0],[0,1,1,1,0,1,0,1,1,1,1,1,1,0,0,1,0,0,0,1],[0,1,0,1,0,0,0,1,1,1,1,1,0,0,1,1,1,1,0,
1],[0,1,0,0,0,0,0,0,1,1,0,0,0,0,0,1,0,0,0,0]])
print adj
g = graph_tool.Graph(directed = False)
g.add_vertex(len(adj))
edge_weights = g.new_edge_property('double')
for i in range(adj.shape[0]):
for j in range(adj.shape[1]):
e = g.add_edge(i, j)
edge_weights[e] = adj[i,j]
grph = {'1': ['2', '3', '4', '5', '6', '8', '10', '11', '12', '14', '15', '16', '17', '18', '19'],
'2': ['3', '5', '6', '9', '10', '12', '13', '15', '16', '17', '19'],
8
'3': ['4', '6', '7', '8', '9', '10', '13', '16', '17', '20'],
'4': ['2', '5', '7', '12', '13', '14', '17', '19', '20'],
'5': ['2', '3', '6', '8', '10', '11', '14', '15', '16', '17', '18', '19'],
'6': ['4', '7', '8', '9', '10', '12', '14', '16', '20'],
'7': ['1', '2', '8', '9', '10', '11', '15', '16', '18', '19' ],
'8': ['2', '4', '9', '10', '12', '13', '15', '20'],
'9': ['1', '10', '12', '13', '11', '15', '16', '20'],
'10': ['2', '6', '7', '11', '12', '16', '19' ],
'11': ['2', '3', '9', '10', '14', '15', '16', '18', '20' ],
'12': ['4', '5', '20'],
'13': ['1', '6', '7', '9', '10', '11', '15', '20' ],
'14': ['6', '9', '11', '13', '16', '19', '20'],
'15': ['2', '4', '8', '9', '10', '11', '12', '16', '20' ],
'16': ['10' ],
'17': ['4', '6', '8', '10', '11', '12', '13', '14', '16', '20'],
'18': ['3', '6', '8', '9', '11', '14', '16', '18', '20'],
'19': ['4', '7', '9', '10', '11', '15', '16''17', '20' ],
'20': ['2', '11' ]
}
9
if (len(path)==len(grph)):
cycles.append(path)
return cycles
def cycl(grph):
cycles=[]
for firstnode in grph:
for lastnode in grph:
npat = hamilpaths(grph, firstnode, lastnode)
for path in npat:
if (len(path)==len(grph)):
if path[0] in grph[path[len(grph)-1]]:
path.append(path[0])
cycles.append(path)
return cycles
print" hello"
hamil = findpath(grph)
print "hello"
10
6) Southampton
7) Tottenham
8) Swansea
9) Crystal Palace
10)Stoke City
11)West Ham United
12)Everton
13)Leicester City
14)Newcastle
15)West Bromwich Albion
16)Aston Villa
17)Sunderland
18)Burnley
19)Hull City
20)Queens Park Rangers
Let us now compare this to the Barclays Premier League points system
ranking.
1. Chelsea
2. Manchester City
3. Arsenal
4. Manchester United
5. Tottenham
6. Liverpool
7. Southampton
8. Swansea
9. Stoke City
10. Crystal Palace
11. Everton
12. West Ham United
13. West Bromwich Albion
14. Leicester City
15. Newcastle
16. Sunderland
17. Aston Villa
18. Hull City
19. Burnley
20. Queens Park Rangers
16. Conclusion
While the rankings are similar, there are numerous shifts of one or two
places by the teams. This is due to the fact that while the premier league
allots 3 points for every win, it does not distinguish between the type of win.
For example, when a small team defeats a bigger, more successful team, it
11
gets the same number of points as a big team when it defeats a much smaller
team. However, since the Hamiltonian path takes into account small teams
defeating big teams in the directed graph, it solves this problem quite well.
For example, the reason Newcastle is ranked higher than West Brom in my
ranking is because it defeated Chelsea, the eventual champions of the league,
giving it extra points.
This method of ranking can also be extended for the international system of
ranking teams. Their point system with the formula is regarded to be highly
inaccurate, failing to account for the fitness of the team, home, away etc. So, a
possible extension of this project is to use this formula to rank teams
internationally.
12