Lecture MapReduce
Lecture MapReduce
Master node
Also known as Name Nodes in HDFS
Stores metadata
Might be replicated
Key Value
Welcome 1
Welcome Everyone Everyone 1
Hello Everyone Hello 1
Everyone 1
Input <filename, file text>
MAP TASK 1
Welcome 1
Welcome Everyone
Everyone 1
Hello Everyone
Hello 1
Everyone 1
Input <filename, file text>
MAP TASK 2
Welcome 1
Welcome Everyone
Everyone 1
Hello Everyone
Hello 1
Why are you here
I am also here Everyone 1
They are also here Why 1
Yes, it’s THEM!
Are 1
The same people we were thinking of
You 1
…….
Here 1
…….
Key Value
Welcome 1
Everyone 2
Everyone 1
Hello 1
Hello 1
Welcome 1
Everyone 1
map(key, value):
// key: document name; value: text of document
for each word w in value:
emit(w, 1)
reduce(key, values):
// key: a word; values: an iterator over counts
result = 0
for each count v in values:
result += v
emit(key, result)
Sort
Input: Series of (key, value) pairs
Output: Sorted <value>s
A A I
2
3
4 B B II
5
6 III
7 C C
Blocks Servers Servers
from DFS
(Local write, remote read)
• If server fails, RM lets all affected AMs know, and AMs take
appropriate action
– NM keeps track of each task running at its server
Intermediate
Input Files Map phase Files on Disk Reduce phase Output Files
7. When all map tasks and reduce tasks have been completed,
the master wakes up the user program.
At this point, the MapReduce call in the user program returns
back to the user code.
map(key, value):
// key: document name; value: text of document
for each word w in value:
emit(w, 1)
reduce(key, values):
// key: a word; values: an iterator over counts
result = 0
for each count v in values:
result += v
emit(key, result)
map(key=url, val=contents):
For each word w in contents, emit (w, “1”)
reduce(key=word, values=uniq_counts):
Sum all “1”s in values list
Emit result “(word, sum)”
see 1 bob 1
see bob run
bob 1 run 1
see spot throw
run 1 see 2
see 1 spot 1
spot 1 throw 1
throw 1
3:3
4:3
5:2
8:2
A -> B C D
B -> A C D E
C -> A B D E
D -> A B C E
E -> B C D
(A C) -> A B D E
(B C) -> A B D E And finally for map(E -> B C D):
(C D) -> A B D E
(C E) -> A B D E (B E) -> B C D
(C E) -> B C D
For map(D -> A B C E) :
(D E) -> B C D
(A D) -> A B C E
(B D) -> A B C E
(C D) -> A B C E
(D E) -> A B C E
(A B) -> (A C D E) (B C D)
(A C) -> (A B D E) (B C D)
(A D) -> (A B C E) (B C D)
(B C) -> (A B D E) (A C D E)
(B D) -> (A B C E) (A C D E)
(B E) -> (A C D E) (B C D)
(C D) -> (A B C E) (A B D E)
(C E) -> (A B D E) (B C D)
(D E) -> (A B C E) (B C D)
http://labs.google.com/papers/mapreduce.html