Real-World Applications - Coursera
Real-World Applications - Coursera
100%
1. There are two datasets: A is the large one, B is small enough to fit in the memory of the 1 / 1 point
cluster node. What type of join do you choose to make their intersection A&B?
Map
Reduce
Correct
Yes, it's possible to find each keyA in B dataset in the memory on Map phase
2. There are two datasets: A is the large one, B is small enough to fit in the memory of the 1 / 1 point
cluster node. What type of join do you choose to make the union A U B (records from A or
from B or from the both datasets)?
A: keyA, valueA
B: keyB, valueB
Map
Reduce
Correct
Correct
Yes, it's possible if the formats of two datasets are different (for example, their values contain different number of
fields)
By a some tag added to the records on the Map phase; tags are selected by the filename from the environment
Correct
Yes, the filenames are known on the Map phase, use them to select a tag for each record in the mapper
When you join two datasets with a Reduce-side join and one of them has many records with repeating keys
Correct
Yes, because of Secondary Sort you know the order of the records from different datasets. It allows not to store them
in memory of the reducer
When you want to avoid containers in memory on the reducers and therefore decrease the memory required by your
tasks.
Correct
Yes, Secondary Sort defines the order of input records on the reducers. So it allows to avoid using containers (trees,
hash-tables) to calculate some aggregation functions (for example, 'uniq')
5. What file is in the output directory of the succeeded MapReduce job (input the exact 1 / 1 point
filename)?
_SUCCESS
Correct