CTBD Sol02
CTBD Sol02
Hadoop MapReduce
Concepts and Technologies for Distributed Systems and Big Data Processing – SS 2017
Solution 2 Implementation
You can download the code for the solution for this task from the course website.
Solution 3 Completion
Complete the following code for WordLength, which should count how many words belong to each of the following four
length categories:
tiny: 1 letter — small: 2–4 letters — medium: 5–9 letters — big: more than 10 letters
1
Solution 4 Comprehension
Understand and explain what the following code does. What is the output of the program for the following input?
file1.txt: Hello World Bye World
file2.txt: Hello Hadoop Goodbye Hadoop
The code computes the inverted index for the given documents, i.e., a list of references to documents for each word. It
produces the following output:
Bye file01
Goodbye file02
Hadoop file02,file02
Hello file02,file01
World file01,file01