0% found this document useful (0 votes)
60 views3 pages

Real-World Applications - Coursera

This document contains questions and answers about real-world applications of MapReduce. It discusses: 1. Choosing a MapReduce join when one dataset fits in memory to find the intersection of two datasets. 2. Choosing a MapReduce join type to find the union of two datasets, with possible records from one, both, or neither datasets. 3. Distinguishing records from two datasets on the Reduce phase, possibly using tags added in the Map phase based on the filename. 4. When secondary sorting is useful, such as for reduce-side joins where one dataset has many repeating keys. 5. The filename _SUCCESS is generated in the output directory of a succeeded MapReduce

Uploaded by

Rupesh Kumar Sah
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
60 views3 pages

Real-World Applications - Coursera

This document contains questions and answers about real-world applications of MapReduce. It discusses: 1. Choosing a MapReduce join when one dataset fits in memory to find the intersection of two datasets. 2. Choosing a MapReduce join type to find the union of two datasets, with possible records from one, both, or neither datasets. 3. Distinguishing records from two datasets on the Reduce phase, possibly using tags added in the Map phase based on the filename. 4. When secondary sorting is useful, such as for reduce-side joins where one dataset has many repeating keys. 5. The filename _SUCCESS is generated in the output directory of a succeeded MapReduce

Uploaded by

Rupesh Kumar Sah
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

Real-World Applications

LATEST SUBMISSION GRADE

100%

1. There are two datasets: A is the large one, B is small enough to fit in the memory of the 1 / 1 point
cluster node. What type of join do you choose to make their intersection A&B?

Records in A: keyA, valueA

Records in B: keyB, valueB

Records in the result:

key (=keyA=keyB), valueA, valueB

Map

Reduce

Correct

Yes, it's possible to find each keyA in B dataset in the memory on Map phase

2. There are two datasets: A is the large one, B is small enough to fit in the memory of the 1 / 1 point
cluster node. What type of join do you choose to make the union A U B (records from A or
from B or from the both datasets)?

A: keyA, valueA

B: keyB, valueB

Result has three types of records:

keyA, valueA, null

keyB, null, valueB

key (=keyA=keyB), valueA, valueB

Map

Reduce

Correct

Yes, you can perform any joins with Reduce-side join


3. How do you distinguish records of two datasets on the Reduce phase? 1 / 1 point

By format of the values

Correct

Yes, it's possible if the formats of two datasets are different (for example, their values contain different number of
fields)

By the filename of dataset obtained from the environment variable

By a some tag added to the records on the Map phase; tags are selected by the filename from the environment

Correct

Yes, the filenames are known on the Map phase, use them to select a tag for each record in the mapper

4. When is Secondary Sort really useful? 1 / 1 point

Always with a Reduce-side join

When you join two datasets with a Reduce-side join and one of them has many records with repeating keys

Correct

Yes, because of Secondary Sort you know the order of the records from different datasets. It allows not to store them
in memory of the reducer

When you want to avoid containers in memory on the reducers and therefore decrease the memory required by your
tasks.

Correct

Yes, Secondary Sort defines the order of input records on the reducers. So it allows to avoid using containers (trees,
hash-tables) to calculate some aggregation functions (for example, 'uniq')

5. What file is in the output directory of the succeeded MapReduce job (input the exact 1 / 1 point
filename)?
_SUCCESS

Correct

Yes, a hidden (started with underscore) success file

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy