Graph-BFS
Graph-BFS
▪ Mini-HW 7 released
▪ Due on 11/30 (Thur) 17:20
▪ Class change!!
▪ Start from 12/07 (Thur) to forever (NOT next week!!)
▪ Location: R103
2
3
▪ Graph Basics
▪ Graph Theory
▪ Graph Representations
▪ Graph Traversal
▪ Breadth-First Search (BFS)
▪ Depth-First Search (DFS)
▪ DFS Applications
▪ Connected Components
▪ Strongly Connected Components
▪ Topological Sorting
4
▪ A graph G is defined as
▪ V: a finite, nonempty set of vertices
▪ E: a set of edges / pairs of vertices
1 2
3 4 5
5
▪ Graph type
▪ Undirected: edge 𝑢, 𝑣 = 𝑣, 𝑢
▪ Directed: edge 𝑢, 𝑣 goes from vertex 𝑢 to vertex 𝑣; 𝑢, 𝑣 ≠ 𝑣, 𝑢
▪ Weighted: edges associate with weights
1 2 1 2
3 4 5 3 4 5
6
How many edges at most can a undirected (or directed) graph have?
▪ Adjacent (相鄰)
▪ If there is an edge 𝑢, 𝑣 , then 𝑢 and 𝑣 are adjacent.
▪ Incident (作用)
▪ If there is an edge 𝑢, 𝑣 , the edge 𝑢, 𝑣 is incident from 𝑢 and
is incident to 𝑣.
▪ Subgraph (子圖)
▪ If a graph 𝐺 ′ = 𝑉 ′ , 𝐸′ is a subgraph of 𝐺 = 𝑉, 𝐸 , then 𝑉 ′ ⊆
𝑉 and 𝐸 ′ ⊆ 𝐸
7
▪ Degree
▪ The degree of a vertex 𝑢 is the number of edges incident on 𝑢
▪ In-degree of 𝑢: #edges 𝑥, 𝑢 in a directed graph
▪ Out-degree of 𝑢: #edges 𝑢, 𝑥 in a directed graph
▪ Degree = in-degree + out-degree
▪ Isolated vertex: degree = 0
σ𝑖 𝑑𝑖
𝐸 =
2
8
▪ Path
▪ a sequence of edges that connect a sequence of vertices
▪ If there is a path from 𝑢 (source) to 𝑣 (target), there are a sequence
of edges 𝑢, 𝑖1 , 𝑖1 , 𝑖2 , … , 𝑖𝑘−1 , 𝑖𝑘 , (𝑖𝑘 , 𝑣)
▪ Reachable: 𝑣 is reachable from 𝑢 if there exists a path from 𝑢 to 𝑣
▪ Simple Path
▪ All vertices except for 𝑢 and 𝑣 are all distinct
▪ Cycle
▪ A simple path where 𝑢 and 𝑣 are the same
▪ Subpath
▪ A subsequence of the path 9
▪ Connected
▪ Two vertices are connected if there is a path between them
▪ A connected graph has a path from every vertex to every other
▪ Tree
▪ a connected, acyclic, undirected graph
▪ Forest
▪ an acyclic, undirected but possibly disconnected graph
1 2 1 2 1 2
3 4 5 3 4 5 3 4 5
10
▪ Theorem. Let 𝐺 be an undirected graph. The following
statements are equivalent:
▪ 𝐺 is a tree
▪ Any two vertices in 𝐺 are connected by a unique simple path
▪ 𝐺 is connected, but if any edge is removed from 𝐸, the
resulting graph is disconnected.
▪ 𝐺 is connected and 𝐸 = 𝑉 − 1
▪ 𝐺 is acyclic, and 𝐸 = 𝑉 − 1
▪ 𝐺 is acyclic, but if any edge is added to 𝐸, the resulting graph
contains a cycle
A A
B C
B C
D
D 13
▪ Euler path
▪ Can you traverse each edge in a connected graph exactly once
without lifting the pen from the paper?
▪ Euler tour
▪ Can you finish where you started?
A A A
B C B C B C
D D D
Euler path Euler path Euler path 14
Euler tour Euler tour Euler tour
Is it possible to determine whether a graph has an Euler path or an Euler
tour, without necessarily having to find one explicitly?
15
▪ Hamiltonian Path
▪ A path that visits each vertex exactly once
▪ Hamiltonian Cycle
▪ A Hamiltonian path where the start and destination are the same
16
▪ Modeling applications using graph theory
▪ What do the vertices represent?
▪ What do the edges represent?
▪ Undirected or directed?
17
Social Network Knowledge Graph
18
▪ How to represent a graph in computer programs?
▪ Two standard ways to represent a graph 𝐺 = 𝑉, 𝐸
▪ Adjacency matrix
▪ Adjacency list
19
Matrix
▪ Adjacency matrix = 𝑉 × 𝑉 matrix 𝐴 with 𝐴[𝑢][𝑣] = 1 if
(𝑢, 𝑣) is an edge
1 2 3 4 5 6
1 1 1 1
3 2 1 1 1
3 1 1 1
2
4 1 1 1
6 5 1
5 6 1 1
4
• For undirected graphs, 𝐴 is symmetric; i.e., 𝐴 = 𝐴𝑇
• If weighted, store weights instead of bits in 𝐴
20
Matrix
▪ Space:
▪ Time for querying an edge:
▪ Time for inserting an edge:
▪ Time for deleting an edge:
▪ Time for listing all neighbors of a vertex:
▪ Time for identifying all edges:
▪ Time for finding in-degree and out-degree of a vertex?
21
List
▪ Adjacency lists = vertex indexed array of lists
▪ One list per vertex, where for 𝑢 ∈ 𝑉, 𝐴[𝑢] consists of all
vertices adjacent to 𝑢
1
1 2 3
3 2 1 4 5
2 3 1 4 6
4 2 3 6
6 5 2
5 6 3 4
4
If weighted, store weights also in adjacency lists
22
List
▪ Space:
▪ Time for querying an edge:
▪ Time for inserting an edge:
▪ Time for deleting an edge:
▪ Time for listing all neighbors of a vertex:
▪ Time for identifying all edges:
▪ Time for finding in-degree and out-degree of a vertex?
23
▪ Matrix representation is suitable for dense graphs
▪ List representation is suitable for sparse graphs
▪ Besides graph density, you may also choose a data structure
based on the performance of other operations
List a
Query an Insert an Delete an Identify
Space vertex’s
edge edge edge all edges
neighbors
Adjacency Matrix
Adjacency List
24
25
26
27
Layer 1
Source 𝒔
28
▪ Input: directed/undirected graph 𝐺 = (𝑉, 𝐸) and source 𝑠
▪ Output: a breadth-first tree with root 𝑠 (𝑇BFS ) that contains
all reachable vertices
▪ 𝑣. 𝑑: distance from 𝑠 to 𝑣, for all 𝑣 ∈ 𝑉
▪ Distance is the length of a shortest path in G
▪ 𝑣. 𝑑 = ∞ if 𝑣 is not reachable from 𝑠
▪ 𝑣. 𝑑 is also the depth of 𝑣 in 𝑇BFS
▪ 𝑣. 𝜋 = 𝑢 if (𝑢, 𝑣) is the last edge on shortest path to 𝑣
▪ 𝑢 is 𝑣’s predecessor in 𝑇BFS
29
BFS(G, s)
▪ Initially 𝑇BFS contains only 𝑠 for each vertex u in G.V-{s}
u.color = WHITE
▪ As 𝑣 is discovered from 𝑢, 𝑣 and u.d = ∞
(𝑢, 𝑣) are added to 𝑇BFS u.pi = NIL
▪ 𝑇BFS is not explicitly stored; can be s.color = GRAY
reconstructed from 𝑣. 𝜋 s.d = 0
s.pi = NIL
▪ Implemented via a FIFO queue Q = {}
ENQUEUE(Q, s)
▪ Color the vertices to keep track of while Q! = {}
progress: u = DEQUEUE(Q)
▪ GRAY: discovered (first time for each v in G.Adj[u]
encountered) if v.color == WHITE
v.color = GRAY
▪ BLACK: finished (all adjacent v.d = u.d + 1
vertices discovered) v.pi = u
▪ WHITE: undiscovered ENQUEUE(Q,v)
u.color = BLACK
30
𝑠 𝑤 𝑟
0 1 1
𝑟 𝑡 𝑥 𝑡 𝑥 𝑣
1 2 2 2 2 2
𝑥 𝑣 𝑢 𝑣 𝑢 𝑦
2 2 3 2 3 3
31
𝑢 𝑦 𝑦
3 3 3
32
▪ Definition of 𝛿(𝑠, 𝑣): the shortest-path distance from 𝑠 to 𝑣 = the
minimum number of edges in any path from 𝑠 to 𝑣
▪ If there is no path from 𝑠 to 𝑣, then 𝛿 𝑠, 𝑣 = ∞
33
Lemma 22.1
Let 𝐺 = 𝑉, 𝐸 be a directed or undirected graph, and let 𝑠 ∈ 𝑉 be an
arbitrary vertex. Then, for any edge 𝑢, 𝑣 ∈ 𝐸, 𝛿 𝑠, 𝑣 ≤ 𝛿 𝑠, 𝑢 + 1.
𝑠-𝑣的最短路徑一定會小於等於𝑠-𝑢的最短路徑距離+1
▪ Proof
▪ Case 1: 𝑢 is reachable from 𝑠
▪ 𝑠- 𝑢- 𝑣 is a path from 𝑠 to 𝑣 with length 𝛿 𝑠, 𝑢 + 1 s
▪ Hence, 𝛿 𝑠, 𝑣 ≤ 𝛿 𝑠, 𝑢 + 1 v
▪ Case 2: 𝑢 is unreachable from 𝑠
▪ Then 𝑣 must be unreachable too. 𝛿 𝑠, 𝑢 u
▪ Hence, the inequality still holds.
34
Lemma 22.2
Let 𝐺 = 𝑉, 𝐸 be a directed or undirected graph, and suppose BFS is run
on 𝐺 from a given source vertex 𝑠 ∈ 𝑉. Then upon termination, for each
vertex 𝑣 ∈ 𝑉, the value 𝑣. 𝑑 computed by BFS satisfies 𝑣. 𝑑 ≥ 𝛿 𝑠, 𝑣 .
▪ Proof by induction BFS算出的d值必定大於等於真正距離
36
Inductive H1 (Q中最後一個點的d值 ≤ Q中第一個點的d值+1)
hypothesis: H2 (Q中第i個點的d值 ≤ Q中第i+1點的d值)
▪ Dequeue op
𝑣1 𝑣2 … 𝑣𝑟−1 𝑣𝑟 (induction hypothesis H1)
(induction hypothesis H2) H1 holds
𝑣2 … 𝑣𝑟−1 𝑣𝑟
H2 holds
▪ Enqueue op
Let 𝑢 be 𝑣𝑟+1 ’s predecessor,
𝑢 𝑣1 𝑣2 … 𝑣𝑟−1 𝑣𝑟
Since 𝑢 has been removed from 𝑄, the new head
𝑢 𝑣1 𝑣2 … 𝑣𝑟−1 𝑣𝑟 𝑣𝑟+1 𝑣1 satisfies (induction hypothesis H2)
H1 holds
(induction hypothesis H1)
H2 holds 37
Corollary 22.4
Suppose that vertices 𝑣𝑖 and 𝑣𝑗 are enqueued during the execution of BFS,
and that 𝑣𝑖 is enqueued before 𝑣𝑗 . Then 𝑣𝑖 . 𝑑 ≤ 𝑣𝑗 . 𝑑 at the time that 𝑣𝑗 is
enqueued.
▪ Proof 若𝑣𝑖 比𝑣𝑗 早加入queue 𝑣𝑖 . 𝑑 ≤ 𝑣𝑗 . 𝑑
38
Theorem 22.5 – BFS Correctness
Let 𝐺 = 𝑉, 𝐸 be a directed or undirected graph, and and suppose that BFS is
run on 𝐺 from a given source vertex 𝑠 ∈ 𝑉.
1) BFS discovers every vertex 𝑣 ∈ 𝑉 that is reachable from the source 𝑠
2) Upon termination, 𝑣. 𝑑 = 𝛿 𝑠, 𝑣 for all 𝑣 ∈ 𝑉
3) For any vertex 𝑣 ≠ 𝑠 that is reachable from 𝑠, one of the shortest paths
from 𝑠 to 𝑣 is a shortest path from 𝑠 to 𝑣. 𝜋 followed by the edge 𝑣. 𝜋, 𝑣
▪ Proof of (1)
▪ All vertices 𝑣 reachable from 𝑠 must be discovered; otherwise they
would have 𝑣. 𝑑 = ∞ > 𝛿 𝑠, 𝑣 . (contradicting with Lemma 22.2)
39
(2)
42
▪ BFS(G, s) forms a BFS tree with all reachable 𝑣 from 𝑠
▪ We can extend the algorithm to find a BFS forest that contains every
vertex in 𝐺
BFS-Visit(G, s)
//explore full graph and builds up s.color = GRAY
a collection of BFS trees s.d = 0
BFS(G) s.π = NIL
for u in G.V Q = empty
u.color = WHITE ENQUEUE(Q, s)
u.d = ∞ while Q ≠ empty
u.π = NIL u = DEQUEUE(Q)
for s in G.V for v in G.adj[u]
if(s.color == WHITE) if v.color == WHITE
// build a BFS tree v.color = GRAY
BFS-Visit(G, s) v.d = u.d + 1
v.π = u
ENQUEUE(Q, v)
u.color = BLACK
43
44
27
45
36
49
▪ Classification of Edges in 𝐺
▪ Tree Edge (GRAY to WHITE)
▪ Edges in the DFS forest
▪ Found when encountering a new vertex 𝑣 by exploring 𝑢, 𝑣
▪ Back Edge (GRAY to GRAY)
▪ 𝑢, 𝑣 , from descendant 𝑢 to ancestor 𝑣 in a DFS tree
▪ Forward Edge (GRAY to BLACK)
▪ 𝑢, 𝑣 , from ancestor 𝑢 to descendant 𝑣. Not a tree edge.
▪ Cross Edge (GRAY to BLACK)
▪ Any other edge between trees or subtrees. Can go between vertices in
same DFS tree or in different DFS trees
In an undirected graph, back edge = forward edge.
To avoid ambiguity, classify edge as the first type in the list that applies.
50
▪ Edge classification by the color of 𝑣 when visiting 𝑢, 𝑣
▪ WHITE: tree edge
▪ GRAY: back edge
▪ BLACK: forward edge or cross edge
▪ 𝑢. 𝑑 < 𝑣. 𝑑 forward edge
▪ 𝑢. 𝑑 > 𝑣. 𝑑 cross edge
Theorem 22.10
In DFS of an undirected graph, there are only tree edges and back edges
without forward and cross edge.
Why?
51
▪ Connected Components
▪ Strongly Connected Components
▪ Topological Sort
52
53
▪ Input: a graph 𝐺 = 𝑉, 𝐸
▪ Output: a connected component of 𝐺
▪ a maximal subset 𝑈 of 𝑉 s.t. any two nodes in 𝑈 are connected in 𝐺
54
Why must the connected components of a graph be disjoint?
1 7
5
2 8 9
6
3
4
10
Time Complexity:
BFS and DSF both find the connected components with the same complexity
55
56
57
2
8
5
6
3
58
▪ Step 1: Run DFS on 𝐺 to obtain the finish time 𝑣. 𝑓 for 𝑣 ∈ 𝑉.
▪ Step 2: Run DFS on the transpose of 𝐺 where the vertices 𝑉 are
processed in the decreasing order of their finish time.
▪ Step 3: output the vertex partition by the second DFS
59
1 1
4 4
2 2
5 5
6 6
3 3
60
1
2 1
3 4 3
4
6
5
5
6
61
G 𝑣
𝑤
C
Lemma
Let 𝐶 be the strongly connected component of 𝐺 (and 𝐺 𝑇 ) that contains
the node 𝑢 with the largest finish time 𝑢. 𝑓. Then 𝐶 cannot have any
incoming edge from any node of 𝐺 not in 𝐶.
▪ Proof by contradiction
▪ Assume that 𝑣, 𝑤 is an incoming edge to 𝐶.
▪ Since 𝐶 is a strongly connected component of 𝐺, there cannot be any
path from any node of 𝐶 to 𝑣 in 𝐺.
▪ Therefore, the finish time of 𝑣 has to be larger than any node in 𝐶,
including 𝑢. 𝑣. 𝑓 > 𝑢. 𝑓, contradiction
62
Theorem
By continuing the process from the vertex 𝑢∗ whose finish time 𝑢∗ . 𝑓 is
the largest excluding those in 𝐶, the algorithm returns the strongly
connected components.
▪ Practice to prove using induction
G GT
𝑢 𝑢
C C
63
1
4
6
5
64
1
4
6
5
65
▪ Step 1: Run DFS on 𝐺 to obtain the finish time 𝑣. 𝑓 for 𝑣 ∈ 𝑉.
▪ Step 2: Run DFS on the transpose of 𝐺 where the vertices 𝑉 are
processed in the decreasing order of their finish time.
▪ Step 3: output the vertex partition by the second DFS
Time Complexity:
66
67
68
3 3
2 2
6 6
5 5
4 4
69
▪ Definition
▪ a directed graph without any directed cycle
6
5
4
70
▪ Taking courses should follow the specific order
▪ How to find a course taking order?
微積分上 微積分下 機率
計算機網路
計概 計組 作業系統
計程 資料結構 演算法
71
▪ Input: a directed acyclic graph 𝐺 = (𝑉, 𝐸)
▪ Output: a linear order of 𝑉 s.t. all edges of 𝐺 going from lower-
indexed nodes to higher-indexed nodes
a
f a b c e d d
a f b c e d e
f
c
72
▪ Run DFS on the input DAG G.
▪ Output the nodes in decreasing order of their finish time.
DFS(G) DFS-Visit(G, u)
for each vertex u in G.V time = time + 1
u.color = WHITE u.d = time
u.pi = NIL u.color = GRAY
time = 0 for each v in G.Adj[u] (outgoing)
for each vertex u in G.V if v.color == WHITE
if u.color == WHITE v.pi = u
DFS-VISIT(G, u) DFS-VISIT(G, v)
u.color = BLACK
time = time + 1
u.f = time // finish time
73
Example Illustration
5
f a b c e d a
1
d
4 b 2
e
f
c
3
6
74
Example Illustration
6
a
1
d
4 b 2
e
a f b c e d f
c
3
5
75
▪ Run DFS on the input DAG G.
▪ Output the nodes in decreasing order of their finish time.
▪ As each vertex is finished, insert it onto the front of a linked list
▪ Return the linked list of vertices
DFS-Visit(G, u)
Time Complexity: time = time + 1
u.d = time
u.color = GRAY
DFS(G) for each v in G.Adj[u]
for each vertex u in G.V
if v.color == WHITE
u.color = WHITE
u.pi = NIL v.pi = u
time = 0 DFS-VISIT(G, v)
for each vertex u in G.V u.color = BLACK
if u.color == WHITE time = time + 1
DFS-VISIT(G, u) u.f = time // finish time
76
Lemma 22.11
A directed graph is acyclic a DFS yields no back edges.
▪ Proof
▪ : suppose there is a back edge 𝑢, 𝑣
▪ 𝑣 is an ancestor of 𝑢 in DFS forest
▪ There is a path from 𝑣 to 𝑢 in 𝐺 and 𝑢, 𝑣 completes the cycle
▪ : suppose there is a cycle 𝑐
▪ Let 𝑣 be the first vertex in 𝑐 to be discovered and 𝑢 is a predecessor of 𝑣 in 𝑐
▪ Upon discovering 𝑣 the whole cycle from 𝑣 to 𝑢 is WHITE
▪ At time 𝑣. 𝑑, the vertices of 𝑐 form a path of white vertices from 𝑣 to 𝑢
▪ By the white-path theorem, vertex 𝑢 becomes a descendant of 𝑣 in the DFS forest
▪ Therefore, 𝑢, 𝑣 is a back edge
77
Theorem 22.12
The algorithm produces a topological sort of the input DAG. That is, if
𝑢, 𝑣 is a directed edge (from 𝑢 to 𝑣) of 𝐺, then 𝑢. 𝑓 > 𝑣. 𝑓.
▪ Proof
▪ When 𝑢, 𝑣 is being explored, 𝑢 is GRAY and there are three cases for 𝑣:
▪ Case 1 – GRAY
▪ 𝑢, 𝑣 is a back edge (contradicting Lemma 22.11), so 𝑣 cannot be GRAY
▪ Case 2 – WHITE
▪ 𝑣 becomes descendant of 𝑢
▪ 𝑣 will be finished before 𝑢
▪ Case 3 – BLACK
▪ 𝑣 is already finished
78
79
▪ Since cycle detection becomes back edge detection (Lemma
22.11), DFS can be used to test whether a graph is a DAG
▪ Is there a topological order for cyclic graphs?
▪ Given a topological order, is there always a DFS traversal
that produces such an order?
80
81
82