0% found this document useful (0 votes)

2 views46 pages

NoSQL DBs

The document provides an introduction to NoSQL databases, comparing them with relational databases and discussing their strengths and weaknesses. It highlights the evolution of NoSQL, various types of NoSQL databases, and the importance of the aggregate data model. Additionally, it covers Hadoop and its role in processing large data volumes through distributed applications and the Map-Reduce framework.

Uploaded by

YASWANTH P 717822I163

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

2 views46 pages

NoSQL DBs

Uploaded by

YASWANTH P 717822I163

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 46

Introduction to NoSQL

Lecture Plan
• Introductions
• What is NoSQL?
• Relational vs. NoSQL databases
• Aggregate data model
• Map-Reduce and Hadoop
Relational databases: strengths
• Persistence: large amounts of data can be safely and
securely kept on storage device(s)
– ability to get small bits of information quickly and easily
• Concurrency: many applications may look at the same
body of data at once, possibly modifying that data:
– RDBs handle concurrency by controlling the access to their
data through transactions
– if an error occurs during the processing of changes,
transactions can be rolled back
• Integration: several applications need to communicate
and collaborate to solve a complex task:
– concurrency control automatically handles multiple
applications
Relational databases: weaknesses
• Impedance mismatch: difference between the
relational model and in-memory data structures
– RDBs organize data into structure of relations and
tuples (tables and rows)
– values in a relational tuple have to be simple (i.e. no
structures, such as nested records or lists)
– in-memory data structures can be more complex than
simple relations
– as a result, in-memory data structures need to be
translated into a relational representation in order to
be stored on disk
Relational data model
Relational databases: major weakness
• RDBs are designed to be run on a single machine
• Sharding: RDBs could be run as separate servers for
different sets of data
– sharding is controlled by an application, which keeps track
of which RDB server to talk to for each bit of data
– …but querying, referential integrity, transactions and
consistency control across shards still need to be
implemented
Why NoSQL?
• Relational DBMSs have been a successful technology for more than
twenty years, since they provided reliable persistence, concurrency
control and integration mechanisms
• RDBs are designed to run on a single machine and do not scale up
horizontally
• However, the need to process large volumes of data led to a shift
from scaling vertically to scaling horizontally on clusters
• Cluster: large number of commodity machines connected with a
network
History of NoSQL
• Early efforts were focused on proprietary systems by
Amazon and Google in 2000s:
– BigTable from Google
– Dynamo from Amazon
• The term “NoSQL” traces back to a meetup on June
11, 2009 in San Francisco, after which NoSQL DBMs
have become an open-source phenomenon
Relational DBs
KEY-VALUE STORES
Document Stores
Graph Databases
Wide Column Database
Types of NoSQL databases
• Key-value: BerkeleyDB, LevelDB, Memcached, Project
Voldemort, Redis, Riak
• Document: OrientDB, RavenDB, Terrastore, CouchDB,
MongoDB
• Column-family: Amazon SimpleDB, Cassandra, Hypertable,
HBase
• Graph: FlockDB, HyperGraphDB, Infinite Graph, Neo4J
DB-Engines Ranking
https://db-engines.com/en/ranking
NoSQL: aggregate data model
• Explicit storage of a rich structure of closely related
data that is accessed as a unit (called aggregates)
• Aggregates provide a natural unit of interaction for
many applications
• Suitable for distributed environment
• Downside: difficulty in handling relationships
between entities in different aggregates
Aggregate
• Complex record allowing lists and other record
structures to be nested inside it
• Collection of related objects that are treated
as a unit
Relational schema
Relational data model
Example of aggregates
Aggregate vs. relational data model
• No normalization:
– instead of using IDs, some records may be duplicated and
copied with an aggregate
– minimize the number of aggregates we access during data
interaction
– minimizing the number of nodes to query for data and data
transfer overhead when gathering the data
• Relations between aggregates are still possible:
– e.g., between orders and customers
– aggregate boundaries are context-specific (i.e. depend on the
task and how the data is manipulated by the application)
• Relational databases are aggregate-ignorant:
– and so are NoSQL graph databases
Relational vs. NoSQL DBs: atomicity
• RDBs allow to manipulate any combination of
rows from any tables in a single ACID (Atomic,
Consistent, Isolated and Durable) transaction:
– many rows spanning many tables are updated as a
single atomic operation
– atomic operations succeed or fail entirely
• NoSQL databases support atomic manipulation of
single aggregate at a time:
– cross-aggregate atomic operations need to be
implemented programmatically
• Aggregate-ignorant NoSQL DBs support ACID
transactions similar to relational DBs
CAP theorem
The CAP theorem
• Many database systems forgo transactions
entirely, because the performance impact is
too high
• MySQL was popular since it was lightweight
and didn’t support transactions
• Consistency can and should often be relaxed
The CAP theorem
Choose DBs
https://www.dataversity.net/choose-right-nosql-
database-application/#
Map-Reduce and Hadoop
What is Hadoop?
• A software framework that supports data-intensive distributed
applications.

• It enables applications to work with thousands of nodes and petabytes of

data.

• Hadoop was inspired by Google's MapReduce and Google File System

(GFS).

• Hadoop is a top-level Apache project being built and used by a global

community of contributors, using the Java programming language.

• Yahoo! has been the largest contributor to the project, and uses Hadoop
extensively across its businesses.
Who uses Hadoop?

http://wiki.apache.org/hadoop/PoweredBy
Who uses Hadoop?
• Yahoo!
– More than 100,000 CPUs in >36,000 computers.

• Facebook
– Used in reporting/analytics and machine learning and also
as storage engine for logs.
– A 1100-machine cluster with 8800 cores and about 12 PB
raw storage.
– A 300-machine cluster with 2400 cores and about 3 PB raw
storage.
– Each (commodity) node has 8 cores and 12 TB of storage.
Very Large Storage Requirements
• Facebook has Hadoop clusters with 15 PB of raw storage
(15,000,000 GB).
• No single storage can handle this amount of data.

• We need a large set of nodes each storing part of the data.

HDFS: Hadoop Distributed File System

1. filename, index Namenode

Client 2. Datanodes, Blockid

3. Read data

1 3 1 3 1 3
2
2 2

Data Nodes
Terabyte Sort Benchmark
• http://sortbenchmark.org/
• Task: Sorting 100TB of data and writing results
on disk (10^12 records each 100 bytes).

• Yahoo’s Hadoop Cluster is the current winner:

– 173 minutes
– 3452 nodes x (2 Quadcore Xeons, 8 GB RAM)

This is the first time that a Java program has won this competition.
Example: word count
Counting Words by MapReduce

Hello World
Bye World
Hello World
Bye World
Split
Hello Hadoop
Goodbye Hadoop
Hello Hadoop
Goodbye Hadoop
Counting Words by MapReduce

Hello, <1>
Hello World World, <1>
Mapper
Bye World Bye, <1>
World, <1>

Bye, <1>
Sort & Merge Hello, <1>
World, <1, 1>

Bye, <1>
Combiner Hello, <1>
World, <2>

Node 1
Counting Words by MapReduce

Bye, <1>
Bye, <1>
Hello, <1>
Goodbye, <1>
World, <2> Bye, <1> Hadoop, <2>
Goodbye, <1>
Sort & Merge Hadoop, <2> Split
Hello, <1, 1>
Goodbye, <1> World, <2> Hello, <1, 1>
Hadoop, <2>
World, <2>
Hello, <1>
Counting Words by MapReduce
Node 1

part-00000
Bye, <1> Bye, <1>
Goodbye, <1> Reducer Goodbye, <1> Bye 1
Hadoop, <2> Hadoop, <2> Goodbye 1
Hadoop 2

Write on Disk
Node 2
part-00001
Hello 2
Hello, <1, 1> Hello, <2>
Reducer World 2
World, <2> World, <2>
High Level Architecture of MapReduce
Master Node

Client
JobTracker
Computer

TaskTracker TaskTracker TaskTracker

Task Task Task Task Task

Slave Node Slave Node Slave Node

High Level Architecture of Hadoop
Master Node Slave Node Slave Node

TaskTracker TaskTracker TaskTracker

MapReduce layer JobTracker

HDFS layer NameNode

DataNode DataNode DataNode

Hadoop Job Scheduling
• FIFO queue matches incoming jobs to
available nodes
– No notion of fairness
– Never switches out running job
Distributed File Cache
• The Distributed Cache facility allows you to
transfer files from the distributed file system
to the local file system (for reading only) of all
participating nodes before the beginning of a
job.
References
• Hadoop Project Page:
http://hadoop.apache.org/

Introduction To NoSQL
No ratings yet
Introduction To NoSQL
29 pages
Unit 4: Big Data Tehnology Landscape Two Inportant Technologies
No ratings yet
Unit 4: Big Data Tehnology Landscape Two Inportant Technologies
42 pages
Introduction to NoSQL
No ratings yet
Introduction to NoSQL
13 pages
Fdocuments - in Nosql-Seminar
No ratings yet
Fdocuments - in Nosql-Seminar
40 pages
Lecture 6 - NoSQL
No ratings yet
Lecture 6 - NoSQL
28 pages
Lecture 1
No ratings yet
Lecture 1
31 pages
Module_1
No ratings yet
Module_1
69 pages
Lecture 1 - NoSQL
No ratings yet
Lecture 1 - NoSQL
31 pages
Bcse302l Dbms Module-7 Nosql
No ratings yet
Bcse302l Dbms Module-7 Nosql
30 pages
NOSQL Lecture 1 Notes
No ratings yet
NOSQL Lecture 1 Notes
31 pages
unit 4 BDA
No ratings yet
unit 4 BDA
22 pages
NoSQL
No ratings yet
NoSQL
18 pages
Cs 620 / Dasc 600 Introduction To Data Science & Analytics: Lecture 6-Nosql
No ratings yet
Cs 620 / Dasc 600 Introduction To Data Science & Analytics: Lecture 6-Nosql
31 pages
Lec 6 - Big Data Storage Technologies II - NoSQL
No ratings yet
Lec 6 - Big Data Storage Technologies II - NoSQL
20 pages
NoSQL
No ratings yet
NoSQL
29 pages
R23-IDS-Unit3-PPT
No ratings yet
R23-IDS-Unit3-PPT
36 pages
BDA Unit2 Complete
No ratings yet
BDA Unit2 Complete
56 pages
NoSQL Databases
No ratings yet
NoSQL Databases
20 pages
NoSQL (1)
No ratings yet
NoSQL (1)
12 pages
CC - Lecture 6-Data
No ratings yet
CC - Lecture 6-Data
44 pages
Unit 6
No ratings yet
Unit 6
143 pages
nosql
No ratings yet
nosql
64 pages
NoSQL MongoDB HBase Cassandra
100% (1)
NoSQL MongoDB HBase Cassandra
142 pages
NOsql Presentation
No ratings yet
NOsql Presentation
20 pages
2 Big Data Analytics-Hadoop R21 A7902 ABP
No ratings yet
2 Big Data Analytics-Hadoop R21 A7902 ABP
16 pages
777 1651399819 BD Module 5
No ratings yet
777 1651399819 BD Module 5
75 pages
NoSQL M1
No ratings yet
NoSQL M1
48 pages
Bda CHP 3
No ratings yet
Bda CHP 3
75 pages
Dbms Presentation
No ratings yet
Dbms Presentation
22 pages
unit_2
No ratings yet
unit_2
41 pages
UNIT II
No ratings yet
UNIT II
70 pages
No SQL
No ratings yet
No SQL
109 pages
2 BDA A6515 Hadoop
No ratings yet
2 BDA A6515 Hadoop
55 pages
CT113H Lecture 1_ Introduction to NoSQL
No ratings yet
CT113H Lecture 1_ Introduction to NoSQL
51 pages
NoSQL D
No ratings yet
NoSQL D
26 pages
NoSQL Database
No ratings yet
NoSQL Database
64 pages
NO SQL Unit 1
No ratings yet
NO SQL Unit 1
66 pages
Introduction To NoSQL
No ratings yet
Introduction To NoSQL
43 pages
BIG DATA UNIT-II NOTES
No ratings yet
BIG DATA UNIT-II NOTES
7 pages
NoSQL Big Data Management
No ratings yet
NoSQL Big Data Management
36 pages
Big Data
No ratings yet
Big Data
53 pages
BIG - DATA - Unit 4
No ratings yet
BIG - DATA - Unit 4
99 pages
Databases in Computer World
No ratings yet
Databases in Computer World
71 pages
Introduction To Big Data and NoSQL
No ratings yet
Introduction To Big Data and NoSQL
52 pages
UNIT 2 - Part1
No ratings yet
UNIT 2 - Part1
53 pages
MODULE 3
No ratings yet
MODULE 3
37 pages
Introduction To Nosql: Gabriele Pozzani
No ratings yet
Introduction To Nosql: Gabriele Pozzani
49 pages
Nosql Databases: P.Krishna Reddy Iiit Hyderabad
No ratings yet
Nosql Databases: P.Krishna Reddy Iiit Hyderabad
30 pages
Unit Ii - Nosql Databases
No ratings yet
Unit Ii - Nosql Databases
112 pages
Nosql and Data Scalability: Getting Started With
100% (1)
Nosql and Data Scalability: Getting Started With
6 pages
No SQL & RDBMS
No ratings yet
No SQL & RDBMS
39 pages
Module 1
No ratings yet
Module 1
34 pages
PPT 2.2.1
No ratings yet
PPT 2.2.1
26 pages
Unit 2
No ratings yet
Unit 2
26 pages
41 NoSQL Introduction.pptx
No ratings yet
41 NoSQL Introduction.pptx
18 pages
Unit 2(Big Data Analytics)
No ratings yet
Unit 2(Big Data Analytics)
11 pages
Introduction To: Nosql
No ratings yet
Introduction To: Nosql
27 pages
Unit 2 BDA
No ratings yet
Unit 2 BDA
32 pages
Learn Hive in 24 Hours
From Everand
Learn Hive in 24 Hours
Alex Nordeen
No ratings yet
Big Data Analytics
From Everand
Big Data Analytics
Nitin Kumar Yadav
No ratings yet
Hdag Using HBase to Store and Access Data
No ratings yet
Hdag Using HBase to Store and Access Data
46 pages
Hbase Tutorial
No ratings yet
Hbase Tutorial
21 pages
h Base Tutorial
No ratings yet
h Base Tutorial
38 pages
Machine Translation and Encoder Ppt
No ratings yet
Machine Translation and Encoder Ppt
13 pages
Computer Architecture and Organization
No ratings yet
Computer Architecture and Organization
6 pages
File Organization
No ratings yet
File Organization
19 pages
Chapter 13: Wired Lans - Ethernet: Week-04
No ratings yet
Chapter 13: Wired Lans - Ethernet: Week-04
3 pages
RAIN Reader Communication Interface Guideline
No ratings yet
RAIN Reader Communication Interface Guideline
55 pages
How To Backup and Restore A VM Using VEEAM
No ratings yet
How To Backup and Restore A VM Using VEEAM
16 pages
Sharp Pcap
No ratings yet
Sharp Pcap
21 pages
Computer Science and The Foundation of Knowledge Model
No ratings yet
Computer Science and The Foundation of Knowledge Model
95 pages
Bus and Memory Transfers
No ratings yet
Bus and Memory Transfers
14 pages
Dps International School: Subject: Computer Applications Topic: File Handling Name: Grade:11 ISC
100% (1)
Dps International School: Subject: Computer Applications Topic: File Handling Name: Grade:11 ISC
7 pages
Creating and Configuring Projects in AutoPLANT V8i
No ratings yet
Creating and Configuring Projects in AutoPLANT V8i
18 pages
Winbase H
No ratings yet
Winbase H
41 pages
SCP 500
No ratings yet
SCP 500
62 pages
PIC18F to PIC24F Migration and Performance Enhancement Guide DS00002991A
No ratings yet
PIC18F to PIC24F Migration and Performance Enhancement Guide DS00002991A
30 pages
Aegyptus: Egyptian Hierogglyphs, Oogpti Iand Merogiti
No ratings yet
Aegyptus: Egyptian Hierogglyphs, Oogpti Iand Merogiti
6 pages
Presentation On Run Time Stack
No ratings yet
Presentation On Run Time Stack
8 pages
Pi (1998) (720p-2A2S) (Media Info)
No ratings yet
Pi (1998) (720p-2A2S) (Media Info)
2 pages
Data Crawler Usage
100% (1)
Data Crawler Usage
3 pages
Creare Baza de Date Din Cod:: Public Static Void New String New
No ratings yet
Creare Baza de Date Din Cod:: Public Static Void New String New
2 pages
Forward Error Correction: For Optics Professionals
No ratings yet
Forward Error Correction: For Optics Professionals
63 pages
PPT.UNICODE
No ratings yet
PPT.UNICODE
9 pages
Hardware: - Prof. (MS.) Avani Rachh
No ratings yet
Hardware: - Prof. (MS.) Avani Rachh
53 pages
Ping Google - Com 100x 64byte
No ratings yet
Ping Google - Com 100x 64byte
2 pages
Data structure unit 2 notes
No ratings yet
Data structure unit 2 notes
20 pages
Data Integrity Integrity Rules Codd's 12 Rules
No ratings yet
Data Integrity Integrity Rules Codd's 12 Rules
24 pages
I Mca RDBMS Lab
No ratings yet
I Mca RDBMS Lab
63 pages
Capturing A Bulk TCP Transfer From Your Computer To A Remote Server
No ratings yet
Capturing A Bulk TCP Transfer From Your Computer To A Remote Server
5 pages
Iso8583 - ISO 8583 C Library Unpack Message - Stack Overflow
No ratings yet
Iso8583 - ISO 8583 C Library Unpack Message - Stack Overflow
3 pages
CV Eng Barcode
No ratings yet
CV Eng Barcode
9 pages
Transaction Concept: Unit - Iv Transaction Management
No ratings yet
Transaction Concept: Unit - Iv Transaction Management
47 pages
Laboratory Manual Rec 451 - Microprocessors & Microcontrollers Lab
No ratings yet
Laboratory Manual Rec 451 - Microprocessors & Microcontrollers Lab
28 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

NoSQL DBs

Uploaded by

NoSQL DBs

Uploaded by

Introduction to NoSQL

• It enables applications to work with thousands of nodes and petabytes of

• Hadoop was inspired by Google's MapReduce and Google File System

• Hadoop is a top-level Apache project being built and used by a global

• We need a large set of nodes each storing part of the data.

1. filename, index Namenode

Client 2. Datanodes, Blockid

• Yahoo’s Hadoop Cluster is the current winner:

TaskTracker TaskTracker TaskTracker

Task Task Task Task Task

Slave Node Slave Node Slave Node

TaskTracker TaskTracker TaskTracker

MapReduce layer JobTracker

HDFS layer NameNode

DataNode DataNode DataNode

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.