0% found this document useful (0 votes)

28 views79 pages

Module 3

Uploaded by

vishnu priya

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

28 views79 pages

Module 3

Uploaded by

vishnu priya

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 79

NOSQL DATABASE

MODULE -3

Department of
Computer Science & Engineering

www.cambridge.edu.in
Pre-Requestion
Hadoop Distributed File System
(HDFS)
•Concurrent processing
MapReduce splits large amounts of data into smaller chunks
and processes
them in parallel.

•Consolidated output
MapReduce aggregates all the data from multiple servers to
return a
consolidated output.

MapReduce is often used in data warehouses to analyze large

volumes of data and build specialized business logic.
What is MapReduce?

A MapReduce is a data processing tool which is used to process

the data parallelly in a distributed form. It was developed in
2004, on the basis of paper titled as "MapReduce: Simplified
Data Processing on Large Clusters," published by Google.
The MapReduce is a paradigm which has two phases,
1. the mapper phase, and the reducer phase. In the Mapper, the input is given in the
form of a key-value pair.
2. The output of the Mapper is fed to the reducer as input. The reducer runs only after
the Mapper is over. The reducer too takes input in key-value format, and the output of
reducer is the final output.
A Word Count Example of
MapReduce
• Let us understand, how a MapReduce works by taking
an example where I have a text file called example.txt
whose contents are as follows:
Dear, Bear, River,
Car, Car, River,
Deer, Car and Bear
• Now, suppose, we have to perform a word count on the
sample.txt using MapReduce. So, we will be finding
unique words and the number of occurrences of those
unique words.
• First, we divide the input into three splits as shown in the
figure. This will
distribute the work among all the map nodes.

• Then, we tokenize the words in each of the mappers and

give a hardcoded
value (1) to each of the tokens or words. The rationale
behind giving a
hardcoded value equal to 1 is that every word, in itself, will
occur once.

• Now, a list of key-value pair will be created where the key is

nothing but
the individual words and value is one. So, for the first line
• After the mapper phase, a partition process
takes place
where sorting and shuffling happen so that all
the tuples
with the same key are sent to the
corresponding reducer.

• So, after the sorting and shuffling phase, each

reducer will
have a unique key and a list of values
• Now, each Reducer counts the values which are
present in that list of
values. As shown in the figure, reducer gets a list of
values which is
[1,1] for the key Bear. Then, it counts the number of
ones in the very
list and gives the final output as — Bear, 2.

• Finally, all the output key/value pairs are then collected

and written in the output file.
Basic Map-Reduce

To explain the basic idea,

Let’s assume we have chosen orders as our
aggregate, with each order having line items.
Each line item has a product ID, quantity, and
the price charged. This aggregate makes a lot
of sense as usually people want to see the
whole order in one access.

This is exactly the kind of situation that calls for map-reduce.

 However, sales analysis people want to see a product and its total revenue for
the last seven days. This report doesn’t fit the aggregate structure that we
have—which is the downside of using aggregates.

 In order to get the product revenue report, you’ll have to visit every machine
in the cluster and examine many records on each machine.

DAY 1 DAY 2 DAY 3 DAY 4 DAY 5 DAY 6 DAY 7

MAP FUNCTION

The first stage in a map-reduce job is the map. A map is a function whose input
is a single aggregate and whose output is a bunch of key-value pairs.
INPUT MAPPING

Black tea

Brown rice tea

Drag well tea

reduce function
A map operation only operates on a single record; the reduce function takes
multiple map outputs with the same key and combines their values. So, a map
function might yield 1000 line items from orders for “Database Refactoring”; the
reduce function would reduce down to one, with the totals for the quantity and
revenue. While the map function is limited to working only on data from a single
aggregate, the reduce function can use all values emitted for a single key (see
Figure 7.2)
Partitioning and Combining

Partitioning
Combining
Composing Map-Reduce Calculations

One simple limitation is that you have to structure your calculations around
operations that fit in well with the notion of a reduce operation.
Combine with reduce calculation suppose we want to know
the average ordered
quantity of each product.
An important property of
averages is that they are
not composable—that is,
if I take two groups of
orders, I can’t combine
their averages alone.
Instead, I need to take
total amount and the
count of orders from each
group, combine those, and
then calculate the average
from the combined sum
and count
Mapping with reduce calculation
7.3.1. A Two Stage Map-Reduce Example

As map-reduce calculations get more complex, it’s useful to break them down into stages
using a pipes-and-filters approach, with the output of one stage serving as input to the
next, rather like the pipelines in UNIX.

Consider an example where we want to compare the sales of products

for each month in 2011 to the prior year.
To do this, we’ll break the calculations down into two stages.
The first stage will produce records showing the aggregate figures for a single product in a
single month of the year.
The second stage then uses these as inputs and produces the result for a single product by
comparing one month’s results with the same month in the prior year
A first stage: Creating records for monthly sales of a different product

This stage is similar to the map-reduce examples we’ve seen so far. The only new feature is using a
composite key so that we can reduce records based on the values of multiple fields.
The second-stage mappers: The second stage mapper creates base records
for year-on-year comparisons.
Chapter 8.
Key-Value Databases
Implement
 A key-value store is a simple hash table,
 primarily used when all access to the database is via primary key.
Think of a table in a traditional RDBMS with two columns, such as
ID and NAME,
 the ID column being the key and NAME column storing the value.

In an RDBMS, the NAME column is restricted to storing data of type String.

The application can provide an ID and VALUE and persist the pair; if the ID
already exists the current value is overwritten, otherwise a new entry is
created.
Let’s look at how terminology compares in Oracle and Riak
8.1. What Is a Key-Value Store ?
 Key-value stores are the simplest NoSQL data stores to use from
an API perspective.
 The client can either
• get the value for the key,
• put a value for a key, or
• delete a key from the data store.
 The value is a blob that the data store just stores, without caring
or knowing what’s inside; it’s the responsibility of the
application to understand what was stored.
Some of the popular key-value databases
1. Riak Data Structure server) [Riak],
2. Redis (often referred to as [Redis],
3. Memcached DB and its flavors DB [Memcached],
4. Berkeley [Berkeley DB],
5. HamsterDB (especially suited for embedded use)
6. Amazon DynamoDB [HamsterDB], [Amazon’s Dynamo] (not open-source),
and
7. Project Voldemort [Project Voldemort] (an open-source implementation of
Amazon DynamoDB).
Riak Databases
Riak lets us store keys into buckets
If we wanted to store user session
data, shopping cart information, and
user preferences in Riak, we could
just store all of them in the same
bucket with a single key and single
value for all of these objects.
In Riak, they are known as domain buckets allowing the
serialization and deserialization to be handled by the client
driver.
8.2. Key-Value Store Features

8.2.1. Consistency:
8.2.2. Transactions
8.2.3. Query Features
8.2.4. Structure of Data
8.2.5. Scaling
8.3. Suitable Use Cases

8.3.1. STORING SESSION INFORMATION:

1. Storing Session Information Generally, every web session is unique and is
assigned a unique session id value.
2. Applications that store the session id on disk or in an RDBMS will greatly
benefit from moving to a key-value store, since everything about the
session can be stored by a single PUT request or retrieved using GET.
3. This single-request operation makes it very fast, as everything about the
session is stored in a single object.
4. Solutions such as Memcached are used by many web applications, and
Riak can be used when availability is important
8.3.2. User Profiles,
1. Preferences Almost every user has a unique user Id,
username, or some other attribute, as well as preferences
such as language, color, timezone, which products the user
has access to, and so on.
2. This can all be put into an object, so getting preferences of a
user takes a single GET operation. Similarly, product profiles
can be stored.
8.3.3. Shopping Cart Data E-commerce websites have shopping
carts tied to the user. As we want the shopping carts to be available
all the time, across browsers, machines, and sessions, all the
shopping information can be put into the value where the key is
the user id. A Riak cluster would be best suited for these kinds of
applications
8.4. When Not to Use

8.4.1. Relationships among Data If you need to have relationships

between different sets of data, or correlate the data between
different sets of keys, key-value stores are not the best solution to
use, even though some key-value stores provide link-walking
features.
8.4.2. Multioperation Transactions If you’re saving
multiple keys and there is a failure to save any one of
them, and you want to revert or roll back the rest of the
operations, key-value stores are not the best solution to
be used.
8.4.3. Query by Data
If you need to search the keys based on something found in the value part
of the key value pairs, then key-value stores are not going to perform well
for you. There is no way to inspect the value on the database side, with
the exception of some products like Riak Search or indexing engines like
Lucene [Lucene] or Solr

8.4.4. Operations by Sets [Solr].

Since operations are limited to one key at a time, there is no way to
operate upon multiple keys at the same time. If you need to operate upon
multiple keys, you have to handle this from the client side

OIC Questions
No ratings yet
OIC Questions
24 pages
2025 s4 Kabs 840 Ict 1 Guide
No ratings yet
2025 s4 Kabs 840 Ict 1 Guide
11 pages
LSG Signed EoI - 20220227 - 0001
No ratings yet
LSG Signed EoI - 20220227 - 0001
1 page
Key Value Database
No ratings yet
Key Value Database
25 pages
NOSQL Databases
No ratings yet
NOSQL Databases
19 pages
Key-Value Databases
No ratings yet
Key-Value Databases
17 pages
Dumpsys ANR WindowManager
No ratings yet
Dumpsys ANR WindowManager
3,370 pages
Gcru 2 Nosql
No ratings yet
Gcru 2 Nosql
52 pages
NOSQLDB 21CS745 Assignment 1
No ratings yet
NOSQLDB 21CS745 Assignment 1
1 page
Nosql Prepared
No ratings yet
Nosql Prepared
60 pages
MODULE 1 - PPT - 7B
No ratings yet
MODULE 1 - PPT - 7B
70 pages
Paper Map Reduce
No ratings yet
Paper Map Reduce
16 pages
4a MapReduce
No ratings yet
4a MapReduce
47 pages
Module 4
No ratings yet
Module 4
36 pages
Module 2 Final
No ratings yet
Module 2 Final
39 pages
DBMS Presentation
No ratings yet
DBMS Presentation
6 pages
Customer Risk Calculation Diagram 1
No ratings yet
Customer Risk Calculation Diagram 1
1 page
Prerequisites: R Installation
No ratings yet
Prerequisites: R Installation
11 pages
Bhavana Python Report
No ratings yet
Bhavana Python Report
55 pages
Vitamin Deficiency Detection (Base Paper)
No ratings yet
Vitamin Deficiency Detection (Base Paper)
3 pages
NOSQL
No ratings yet
NOSQL
2 pages
Mapreduce Example
No ratings yet
Mapreduce Example
9 pages
Assignment 20 Stringfunctions Ans
No ratings yet
Assignment 20 Stringfunctions Ans
7 pages
Openstack Interview Questions and Answers
No ratings yet
Openstack Interview Questions and Answers
26 pages
Unit 2 - BA
100% (1)
Unit 2 - BA
51 pages
Comparison Between NoSQL and RDBMS
No ratings yet
Comparison Between NoSQL and RDBMS
6 pages
Unit 3 Map Reduce
No ratings yet
Unit 3 Map Reduce
3 pages
QB 1
No ratings yet
QB 1
9 pages
3 Module NOSQL Preparation
No ratings yet
3 Module NOSQL Preparation
12 pages
DRKP Module 3
No ratings yet
DRKP Module 3
44 pages
X-Sight Alpha v2022.1 UserManual
No ratings yet
X-Sight Alpha v2022.1 UserManual
111 pages
ECE Cafeteria Management System Proposal
100% (1)
ECE Cafeteria Management System Proposal
13 pages
NOSQL
No ratings yet
NOSQL
55 pages
Introduction To MapReduce
No ratings yet
Introduction To MapReduce
9 pages
No SQL
No ratings yet
No SQL
12 pages
Explain The Update Consistency - Update (Write-Write Conflict), Read (Read-Write Conflict) With An Example and A Neat Diagram
No ratings yet
Explain The Update Consistency - Update (Write-Write Conflict), Read (Read-Write Conflict) With An Example and A Neat Diagram
6 pages
Map Reduce
No ratings yet
Map Reduce
25 pages
User Manager
No ratings yet
User Manager
3 pages
NEFOUSSI Farah - CV
No ratings yet
NEFOUSSI Farah - CV
1 page
Lecture 3 - MapReduce
No ratings yet
Lecture 3 - MapReduce
9 pages
E-Poster Terms and Conditions in 6th FSSM and 3rd Ina Sleep 2021
No ratings yet
E-Poster Terms and Conditions in 6th FSSM and 3rd Ina Sleep 2021
2 pages
CC Unit-7
No ratings yet
CC Unit-7
16 pages
Ir MR 1
No ratings yet
Ir MR 1
34 pages
Chapter2 - M-Review of Octave
No ratings yet
Chapter2 - M-Review of Octave
12 pages
Map Reduce
No ratings yet
Map Reduce
35 pages
Tips Sap Abap
No ratings yet
Tips Sap Abap
9 pages
C Programming Operators
No ratings yet
C Programming Operators
11 pages
User Manual of Registration Management System For Overseas Manufacturers of Imported Food (For Overseas Manufacturers)
No ratings yet
User Manual of Registration Management System For Overseas Manufacturers of Imported Food (For Overseas Manufacturers)
74 pages
09b - MapReduce
No ratings yet
09b - MapReduce
44 pages
Unit4 Fos
No ratings yet
Unit4 Fos
7 pages
Module 3 Nosql
No ratings yet
Module 3 Nosql
12 pages
Da Unit 5 Data Analytics
No ratings yet
Da Unit 5 Data Analytics
43 pages
Bda Unit 1
No ratings yet
Bda Unit 1
32 pages
A4 Worksheet - As Seasons Roll On by
No ratings yet
A4 Worksheet - As Seasons Roll On by
5 pages
Unit II
No ratings yet
Unit II
83 pages
Faq Clicks Digital Token
No ratings yet
Faq Clicks Digital Token
7 pages
Module2 C MapReduceParadigm
No ratings yet
Module2 C MapReduceParadigm
74 pages
Unit 5
No ratings yet
Unit 5
27 pages
SWE 202: Introduction To Software Engineering: Chapter 9 (Part1) : Software Evolution Lecturer: Rand Albrahim
No ratings yet
SWE 202: Introduction To Software Engineering: Chapter 9 (Part1) : Software Evolution Lecturer: Rand Albrahim
16 pages
BDA List of Experiments For Practical Exam
No ratings yet
BDA List of Experiments For Practical Exam
21 pages
Map Reduce
No ratings yet
Map Reduce
42 pages
Medha 8059
No ratings yet
Medha 8059
4 pages
3 Key Value
No ratings yet
3 Key Value
32 pages
NBSC - NBSC Software Help
No ratings yet
NBSC - NBSC Software Help
3 pages
Revision Sheet 03
No ratings yet
Revision Sheet 03
2 pages
Bda Ia1 Scheme
No ratings yet
Bda Ia1 Scheme
7 pages
Gartner Magic Quadrant & Critical Capabilities - Gartner
No ratings yet
Gartner Magic Quadrant & Critical Capabilities - Gartner
21 pages
MongoDB Top 7 NoSQL Considerations
100% (1)
MongoDB Top 7 NoSQL Considerations
18 pages
CS687 - Access Control 1 - Spring 2020
No ratings yet
CS687 - Access Control 1 - Spring 2020
41 pages
Big Data Notes (All Lectures)
No ratings yet
Big Data Notes (All Lectures)
44 pages
Map Reduce
No ratings yet
Map Reduce
39 pages
Leftover Details
No ratings yet
Leftover Details
13 pages
BHT-6000 WIFI-english
No ratings yet
BHT-6000 WIFI-english
2 pages
Module 3 (Part-1) - Big Data
No ratings yet
Module 3 (Part-1) - Big Data
46 pages
Technical Publications: Invenia ABUS 2.0 Version 2.0.x Dicom Conformance Statement
No ratings yet
Technical Publications: Invenia ABUS 2.0 Version 2.0.x Dicom Conformance Statement
59 pages
Sem 7 - COMP - BDA
No ratings yet
Sem 7 - COMP - BDA
16 pages
More on C# in Front Office
From Everand
More on C# in Front Office
Xing Zhou
No ratings yet
NOSQL Module-3
100% (2)
NOSQL Module-3
67 pages
Mapreduce: A Major Step Backwards: Permalink Comments (42) Trackbacks
No ratings yet
Mapreduce: A Major Step Backwards: Permalink Comments (42) Trackbacks
6 pages
Chapter4 - MapReduce
No ratings yet
Chapter4 - MapReduce
29 pages
NoSQL Big Data Management
No ratings yet
NoSQL Big Data Management
36 pages
Map Reduce Report
No ratings yet
Map Reduce Report
16 pages
Practical 1: Data Mining and Business Intelligence Practical-1
No ratings yet
Practical 1: Data Mining and Business Intelligence Practical-1
10 pages
An To Nosql Data Management For Big Data: David Loshin
No ratings yet
An To Nosql Data Management For Big Data: David Loshin
7 pages
Pivot Tables In Depth For Microsoft Excel 2016
From Everand
Pivot Tables In Depth For Microsoft Excel 2016
Suljan Qeska
3.5/5 (3)
Hadoop: A Report Writing On
No ratings yet
Hadoop: A Report Writing On
13 pages
Hadoop For Dummies: Mapreduce To The Rescue
No ratings yet
Hadoop For Dummies: Mapreduce To The Rescue
17 pages
Term Paper Java
No ratings yet
Term Paper Java
14 pages
Nosql What Does It Mean
No ratings yet
Nosql What Does It Mean
15 pages
Map Reduce Examples
No ratings yet
Map Reduce Examples
7 pages
MapReduce - A Flexible DP Tool
No ratings yet
MapReduce - A Flexible DP Tool
6 pages
CSE 12 The Map Abstract Data Type
No ratings yet
CSE 12 The Map Abstract Data Type
25 pages
Introduction To Nosql: Gabriele Pozzani
No ratings yet
Introduction To Nosql: Gabriele Pozzani
49 pages
Map-Reduce For Parallel Computing: Amit Jain
No ratings yet
Map-Reduce For Parallel Computing: Amit Jain
72 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Module 3

Uploaded by

Module 3

Uploaded by

NOSQL DATABASE

MapReduce is often used in data warehouses to analyze large

A MapReduce is a data processing tool which is used to process

• Then, we tokenize the words in each of the mappers and

• Now, a list of key-value pair will be created where the key is

• So, after the sorting and shuffling phase, each

• Finally, all the output key/value pairs are then collected

To explain the basic idea,

This is exactly the kind of situation that calls for map-reduce.

DAY 1 DAY 2 DAY 3 DAY 4 DAY 5 DAY 6 DAY 7

Brown rice tea

Drag well tea

Consider an example where we want to compare the sales of products

In an RDBMS, the NAME column is restricted to storing data of type String.

8.3.1. STORING SESSION INFORMATION:

8.4.1. Relationships among Data If you need to have relationships

8.4.4. Operations by Sets [Solr].

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.