Assignment-3 IBDA
Assignment-3 IBDA
VIDHYADEEP INSTITUTE OF
ENGINEERING AND TECHNOLOGY
Vidhyadeep Campus, Anita (Kim), Ta. Olpad,
Dist. Surat
Assignment_3
UNIT: III
1. What is MapReduce? Explain its role in distributed data processing and how it helps in
processing large datasets in a parallel and fault-tolerant manner.
2. Describe the steps involved in developing a MapReduce application. What are the key elements
of a MapReduce program in terms of input, map function, reduce function, and output?
3. How does the MapReduce framework work? Explain the flow of data from the Map phase to the
Reduce phase, including how data is split and processed in parallel.
4. Discuss the anatomy of a MapReduce job run. What happens at each stage, from job submission
to completion, including the setup, execution, and cleanup phases?
5. Explain how MapReduce handles failures. What are the common failure scenarios, and how
does MapReduce ensure job recovery and fault tolerance in distributed environments?
6. What is job scheduling in MapReduce? Discuss the process of scheduling tasks and how it
ensures that tasks are executed efficiently on available resources in a distributed system.
7. Describe the Shuffle and Sort process in MapReduce. How do the system’s Map and Reduce
tasks benefit from the Shuffle and Sort steps during the data processing pipeline?
8. What are the different types of MapReduce jobs and input/output formats? Provide examples of
how different input/output formats can be used in MapReduce applications (e.g.,
TextInputFormat, KeyValueTextInputFormat).
9. What are the key features of MapReduce that make it suitable for large-scale data processing?
Discuss its scalability, fault tolerance, and its ability to handle huge datasets.
10. Explain how MapReduce can be customized with different types of formats and partitioners.
How can MapReduce be optimized to work with specific data and use cases?