8300 Gui SV
8300 Gui SV
COMPUTATION
Copyright © 2023 the author(s). This is an open access article distributed under the Creative Commons Attribution License, which permits
unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
Abstract: When building sequential algorithms for problems on the graphic network, the algorithms themselves are
not only very complex but the complexity of the algorithms also is very considerab. Thus, sequential algorithms
must be parallel to share work and reduce computation time. For above reasons, it is crucial to build parallelization
of algorithms in extended graph to find the shortest path. Therefore, a study of algorithm finding the shortest path
from a source node to all nodes in the MapReduce architectures is essential to deal with many real problems with
huge input data in our daily life. MapReduce architectures processes on (Key, Value) pairs are independent between
processes, so multiple processes can be assigned to execute simultaneously on the Hodoop system to reduce
calculation time.
1. INTRODUCTION
Given extended graph G= (V, E) with a set of vertices V and a set of edges E, where edges
can be directed or undirected. Each edge (u,v) E is weighted w(u,v). Problem finding the
shortest path there are 3 cases:
(a) Problem finding the shortest path from a source node to all nodes (1-n)
(b) Problem finding the shortest path from a source node to destination node (1-1);
(c) Problem finding the shortest path between every pair of vertices (n-n)
To deal with the problems effectively in computers, it is crucial to build parallel algorithms
and the common way we do is to convert the sequential algorithms into parallel algorithms, or
convert parallel algorithms into other suitable parallel algorithms which are totally equal to the
original algorithms
In paper [3], [4], authors construct parallel all-pairs shortest path algorithm with a MapReduce
architecture. Parallel shortest path of an A* algorithm with a MapReduce architecture are
implemented in [5], [6], [7], [8]. In paper [9], [10], [11], [12], authors perform parallel data-
1
2
cluster basically consists of a Master node and Worker nodes. The Master node is responsible for
managing and regulating Workers. [9-12, 14,15]
According to Hadoop documentation [6, 9], Hadoop is an Apache open source framework
inspired by Google File System [6] [10]. It allows parallel processing on distributed data sets
across a cluster of multiple nodes connected under a master-slaves architecture. Hadoop consists
of two main components: HDFS [6], [7], [11] and MapReduce [11], [12].
The first component is the Hadoop Distributed File System (HDFS). HDFS is designed to
support very large file of data sets. It is also distributed, scalable and fault-tolerant. The Big Data
file uploaded into the HDFS is split into block file with specific size defined by the client and
replicated across the cluster nodes. The master node (NameNode) manages the distributed file
system, namespace and metadata. While the slave nodes (DataNode) manage the storage of
block files and periodically report the status to NameNode.
3. Each reducer iterates over the sorted intermediate data and passes each key-value pair to
the reduce function.
4. Each reducer writes its result to the distributed file system
5 7,20
6 3,6 7,2 8,15 11,10
7 11,7
8 3,4 4,18 6,10 9,20
10 8,6
11 10,1 12,5
5
3.2. Find the shortest path algorithm
Find the shortest path algorithm from a vertex to z vertex in the graph as following:
Example 3: The graph is showed in figure 3. Applying BFS algorithm finding spanning tree of
G
8
7
2 4
8
1 6
3
12
9
13 11 10
So far, parallel algorithms finding shortest path have been implemented on multi-core
processors, with shared external memory. What is new in this approach is to implement the
parallel shortest path algorithm on Map Reduce structure. In case of Hadoop framework, to make
9
IMPROVE THE SHORTEST PATH COMPUTATION
the algorithm efficient for running it parallel on several machines. Algorithm to find the shortest
path on MapReduce bases on Adjacent list
4.1. Proposed MapReduce of find shortest path algorithms
The problem investigated in this section the set of shortest paths from the source node to all
other nodes in the graph on MapReduce architecture.
Map stage:
The mapper class takes the entire file an input and parses it line by line.
Reduce stage:
The output of the mapper will be the input to the reducer class. The reducer class takes the
minimum of all the path weights and adds it to the adjacency list of the keyId node.
Data representation:
A connected graph G=(V, E, w), w(i, j) ≥ 0 ∀ (i, j) ∈E and a specified source node v.
Initialize: Adjacency – List representation as follows:
{Node i| ∀ i ∈V,Node Label,Node Status} TAB {Node j | ∀ j Adjacent i,w(i,j) }
There in:
- Node Label: is L, Assign L(v): = 0. ∀ x ≠ a Assign L(x) := ∞ (INF)
- Node Status: Assign Node Status=Unmarked ∀ x ∈V
Example 4: The graph is showed in figure 3, Adjacency–List representation as follows (Table
2)
Table 2. Initialize: Adjacency – List
{ Node i| ∀ i ∈V,Node {Node j | ∀ j adjacent i, w(i,j) }
Label,Node Marked }
{1,0,UNMARKED} {2,7} {3,5}
{2,INF,UNMARKED(thu {3,7} {4,6}
ộc tính)}
{3,INF,UNMARKED} {4,11} {5,10} {6,10}
{4,INF,UNMARKED} {1,4} {2,1} {8,5}
{5,INF,UNMARKED} {7,20}
{6,INF,UNMARKED} {3,6} {7,2} {8,15} {11,10}
10
{7,INF,UNMARKED} {11,7}
{8,INF,UNMARKED} {3,4} {4,18} {6,10} {9,20}
{10,INF,UNMARKED} {8,6}
{11,INF,UNMARKED} {10,15} {12,5}
2. Sort: Sort (Key, Value) with field Node i| i∈V (xắp xếp theo nút i)
END.
Theorem 2. The algorithm finding the shortest path from a vertex to many vertices in
12
MapReduce is true
Proof:
The entire graph is read from the HDFS, transferred from Mappers to Reducers, and then, with
updated distance values, written to the HDFS
- Maper:
Mapper processes a single vertex u, emitting Key have weight Node label +w(u,v) for each
vertex v in u’s adjacency list is computed and sent to the Reducers. (step 1.2 of algorithm 3).
Once a vertex v has been tested and mapped (mapper), that vertex v is marked as Marked and
will not be mapped in subsequent iterations. The same task is assigned T=T\{v} in step 2 in
algorithm 1.
In iterations next, the Maper keeps re-computing paths for vertices whose shortest path was
already found by statement Path=Path+”-“+node i in step 1.2 of algorithm 3.
Like BFS, the program distinguishes between “Marked” and “Unmarked” vertices. Marked
vertices are those that could potentially help reduce the distance for another vertex. I define a
vertex to be marked if and only if its distance value changed in the previous iteration. The only
exception to this rule is source vertex a, which is set to “Marked” before the first iteration. Note
that a vertex that was marked in one iteration could become unmarked in the next, and vice
versa.
- Reducer:
For every vertex v, no matter if its shortest path was already found in previous iterations or not
in Reducer, the statement (Key, Value) = {Node k|k=j, Node labelmin, Node status, [path]} in
algorithm 4 creates a (Key, Value) pair with a Node label having the smallest value and this is
reflected in formula Node labelmin= min {Node labelj ∀j=si}. So after each iteration of the
Redecer function, algorithm 4 will update the shortest path of the vertices for every vertex v, no
matter if its shortest path was already found in previous iterations or not, the Reduce function is
executed to recompute the shortest distance.
From the above analysis, it can be seen through each iteration, the algorithms will update the
longest path, shortest path and mark high-level points as Marked or Unmarked. Thus, Mapreduce
13
IMPROVE THE SHORTEST PATH COMPUTATION
will execute iteratively until all the ranks have been marked as “Marked”. The final output is the
end of the problem.
Mapping and reducing processes on (Key, Value) are independent between processes, so
multiple processes can be assigned to execute simultaneously on the Hodoop system to reduce
calculation time
■
The next part is the implementation of Mapper and Reducer for a specific graph
4.2. How to perform MapReduce on a specific graph
Table 3. Input and Output for MapReduce
Mapper Input: Table 2
Mapper Output (sorted)
Key Value
{1,0,MARKED,1} {2,7} {3,5}
{2,7,UNMARKED,1}
{2,INF,UNMARKED} {3,7} {4,6}
{3,5,UNMARKED,1}
{3,INF,UNMARKED} {4,11} {5,10} {6,10}
{4,INF,UNMARKED} {1,4} {2,1} {8,5}
{5,INF,UNMARKED} {7,20}
{6,INF,UNMARKED} {3,6} {7,2} {8,15} {11,10}
{7,INF,UNMARKED} {11,7}
{8,INF,UNMARKED} {3,4} {4,18} {6,10} {9,20}
{10,INF,UNMARKED} {8,6}
{11,INF,UNMARKED} {10,15} {12,5}
The output of the mapper will be the input to the reducer class
The output emitted by the reducer is
Key Value
{1,0,MARKED,1} {2,7} {3,5}
14
{10,INF,UNMARKED} {8,6}
{11,INF,UNMARKED} {10,15} {12,5}
The output of the mapper will be the input to the reducer class
The MapReduce Job is looped until all nodes are marked and then stopped
The output emitted by the reducer next iteration is
Key Value
{1,0,MARKED,1} {2,7} {3,5}
{2,7,MARKED,1-2} {3,7} {4,6}
{3,5,MARKED,1-3} {4,11} {5,10} {6,10}
{4,13,MARKED,1-2-4} {1,4} {2,1} {8,5}
{5,15,MARKED,1-3-5} {7,20}
{6,15,MARKED,1-3-6} {3,6} {7,2} {8,15} {11,10}
{7,17,MARKED,1-3-6-7} {11,7}
{8,18,MARKED,1-2-4-8} {3,4} {4,18} {6,10} {9,20}
{9,38,MARKED,1-2-4-8-9}
{10,39,MARKED,1-3-6-7-11-10} {8,6}
{11,24,MARKED,1-3-6-7-11} {10,15} {12,5}
{12,29,MARKED,1-3-6-7-11-12}
16
200
180 174
160 141
140 123
120
100
80
60
40
20
0
10000 nodes 15000 nodes 20000 nodes
This paper presents new parallel algorithms (algorithm 3, 4 and 5) based on the actual
requirements, proving soundness. In addition, thesis also does parallelization for existing
algorithms, then indicates the advantages of the new ones over previous algorithms.
In particularly, this work develops experimental programs on Hadoop 3.3.0 parallel system,
then offers specific data to evaluate and compare the results of new parallel algorithms with
sequential algorithms
There's a certain novelty value of the algorithms compared to other papers
- The algorithms create a random graph
- The algorithms are generalized
- The algorithms demo on Hadoop 3.3.0 systems
- The algorithms are proven.
As part of future work:
- Proving complexity of the algorithms by MapReduce find shortest path algorithm for a
given graph size.
- Applying MapReduce to find shortest path algorithm approach on a real road network.
21
IMPROVE THE SHORTEST PATH COMPUTATION
[2] Robert Sedgewick, Algorithms in C part 5: graph algorithms (third edition), Addison-Wesley, (2000).
[3] V. Dragomir, All-pair shortest path modified matrix multiplication based algorithm for a one-chip MapReduce
[4] Voichiţa DRAGOMIR, Gheorghe M. ŞTEFAN, All-pair shortest path on a hybrid Map-Reduce based
architecture, Proceedings of The Romanian Academy, Series A, the publishing House of the Romanian
[5] Sabeur Aridhi, Vincent Benjamin, Philippe Lacomme, Libo Ren, Shortest path resolutionusing hadoop,
[6] Wilfried Yves Hamilton Adoni, Tarik Nahhal, Brahim Aghezzaf* and Abdeltif Elbyed, The MapReduce-based
approach to improve the shortest path computation in large-scale road networks: the case of A* algorithm,
[7] Wilfried Yves Hamilton Adoni, Tarik Nahhal, Brahim Aghezzaf, and Abdeltif Elbyed, MRA*: Parallel and
Distributed Path in Large-Scale Graph Using MapReduce-A* Based Approach, Springer International
[8] Sabeur Aridhi, Philippe Lacomme, Libo Ren, Benjamin Vincent, A MapReduce-based approach for shortest
path problem in large-scale networks, Elsevier, Journal of Engineering Applications of Artificial Intelligence 41
(2015) 151–165.
22
[10] Ghemawat S, Gobioff H, Leung ST. The google file system. In: ACM SIGOPS operating systems review, vol.
[11] Dean J, Ghemawat S. Mapreduce: simplified data processing on large clusters. Commun ACM. (2008), 51(1)
pp 107–13.
[12] Vavilapalli VK, Seth S, Saha B, Curino C, O’Malley O, Radia S, Reed B, Baldeschwieler E, Murthy AC,
Douglas C, Agarwal S, Konar M, Evans R, Graves T, Lowe J, Shah H. Apache hadoop YARN: yet another
resource negotiator. In: Proceedings of the 4th annual symposium on cloud computing. Santa Clara: ACM
[13] Nguyen Dinh Lau, Tran Quoc Chien, Le Manh Thanh, Improved Computing Performance for Algorithm
Finding the Shortest Path in Extended Graph, proceedings of the 2014 international conference on foundations
[14] M. Hena, N. Jeyanthi, A Three-Tier Authentication Scheme for Kerberized Hadoop Environment, Cybernetics
[15] Davit Petrosyan, Hrachya Astsatryan, Serverless (2022) High-Performance Computing over Cloud,