Module 3
Module 3
MODULE -3
Department of
Computer Science & Engineering
www.cambridge.edu.in
Pre-Requestion
Hadoop Distributed File System
(HDFS)
•Concurrent processing
MapReduce splits large amounts of data into smaller chunks
and processes
them in parallel.
•Consolidated output
MapReduce aggregates all the data from multiple servers to
return a
consolidated output.
In order to get the product revenue report, you’ll have to visit every machine
in the cluster and examine many records on each machine.
The first stage in a map-reduce job is the map. A map is a function whose input
is a single aggregate and whose output is a bunch of key-value pairs.
INPUT MAPPING
Black tea
Partitioning
Combining
Composing Map-Reduce Calculations
One simple limitation is that you have to structure your calculations around
operations that fit in well with the notion of a reduce operation.
Combine with reduce calculation suppose we want to know
the average ordered
quantity of each product.
An important property of
averages is that they are
not composable—that is,
if I take two groups of
orders, I can’t combine
their averages alone.
Instead, I need to take
total amount and the
count of orders from each
group, combine those, and
then calculate the average
from the combined sum
and count
Mapping with reduce calculation
7.3.1. A Two Stage Map-Reduce Example
As map-reduce calculations get more complex, it’s useful to break them down into stages
using a pipes-and-filters approach, with the output of one stage serving as input to the
next, rather like the pipelines in UNIX.
This stage is similar to the map-reduce examples we’ve seen so far. The only new feature is using a
composite key so that we can reduce records based on the values of multiple fields.
The second-stage mappers: The second stage mapper creates base records
for year-on-year comparisons.
Chapter 8.
Key-Value Databases
Implement
A key-value store is a simple hash table,
primarily used when all access to the database is via primary key.
Think of a table in a traditional RDBMS with two columns, such as
ID and NAME,
the ID column being the key and NAME column storing the value.
8.2.1. Consistency:
8.2.2. Transactions
8.2.3. Query Features
8.2.4. Structure of Data
8.2.5. Scaling
8.3. Suitable Use Cases