5 RK - MapReduce - v3
5 RK - MapReduce - v3
(2021-2022, I-SEMESTER)
MapReduce
By
Dr. Tene Ramakrishnudu
Assistant Professor
Department of Computer Science &Engineering
National Institute of Technology(NIT), Warangal, TS, India
MapReduce
11-11-2021 RK-CSE-NITW 3
MapReduce
11-11-2021 RK-CSE-NITW 4
MapReduce
❖MapReduce created
▪ MapReduce: Simplified Data Processing on Large Clusters
11-11-2021 RK-CSE-NITW 5
MapReduce
11-11-2021 RK-CSE-NITW 6
MapReduce
11-11-2021 RK-CSE-NITW 7
MapReduce
11-11-2021 RK-CSE-NITW 8
MapReduce
❖Programming Model:
❖This allows us to handle lists of values that are too large to fit
in memory.
11-11-2021 RK-CSE-NITW 10
MapReduce
Souce [3][4]
11-11-2021 RK-CSE-NITW 12
MapReduce
❖Map-Reduce programming: (When the user program calls the
MapReduce function, the following sequence of actions occurs)
11-11-2021 RK-CSE-NITW 15
MapReduce
❖7. When all map tasks and reduce tasks have been
completed, the master wakes up the user program.
❖At this point, the MapReduce call in the user program
returns back to the user code.
11-11-2021 RK-CSE-NITW 16
MapReduce
Big document
MAP:
Read input and
produces a set of
key-value pairs
Group by key:
Collect all pairs with
same key
(Hash merge, Shuffle,
Sort, Partition)
Reduce:
Collect all values
belonging to the key
and output
11-11-2021 RK-CSE-NITW 17
MapReduce: Example
reduce(key, values):
// key: a word; value: an iterator over counts
result = 0
for each count v in values:
result += v
emit(key, result)
11-11-2021 RK-CSE-NITW 19
MapReduce: Master Data Structures
❖For each map task and reduce task, it stores the state
(idle, in-progress, or completed) and the identity of the
worker machine.
11-11-2021 RK-CSE-NITW 20
MapReduce: Fault Tolerance
❖Any reduce task that has not already read the data from
worker A will read the data from worker B.
11-11-2021 RK-CSE-NITW 22
MapReduce: Fault Tolerance
11-11-2021 RK-CSE-NITW 23
❖Master Failure:
❖The master write periodic checkpoints of the master data
structures described above.
❖If the master task dies, a new copy can be started from the last
checkpointed state.
❖Clients can check for this condition and retry the MapReduce
operation if they desire
11-11-2021 RK-CSE-NITW 24
Algorithms Using MapReduce
11-11-2021 RK-CSE-NITW 26
Algorithms Using MapReduce
11-11-2021 RK-CSE-NITW 27
Algorithms Using MapReduce
11-11-2021 RK-CSE-NITW 29
Algorithms Using MapReduce
❖The vector v is so large that it will not fit in its entirety in main memory.
❖Divide the matrix into vertical stripes of equal width and divide the
vector into an equal number of horizontal stripes, of the same height.
❖Our goal is to use enough stripes so that the portion of the vector in
one stripe can fit conveniently into main memory at a compute node.
❖The partition looks like if the matrix and vector are each divided into
five stripes.
11-11-2021 RK-CSE-NITW 30
?
11-11-2021 RK-CSE-NITW 31
References
❖2. Jimmy Lin and Chris Dyer “Data-Intensive Text Processing with
MapReduce”
11-11-2021 RK-CSE-NITW 33