Bda Module 3
Bda Module 3
NoSQL databases (AKA "not only SQL") store data differently than
relational tables. NoSQL databases come in a variety of types based on
their data model. The main types are
document
, key-value, wide-column, and graph. They provide flexible schemas and
scale easily with large amounts of big data and high user loads.
RDBMS OLAP NoSQL
Relational Online Not only
Database Analytical SQL in 1998
Manageme Processing
nt System
History of NoSQL Databases
• 1988- Carlo Stroozzi use the term NoSQL for his light weight .open
source realtional db.
• 2000- Graph database Neo4j is launched.
• 2004- Google BigTable is launched.
• 2005- CouchDB is launched.
• 2007- The research pape on Amazon Dynamo is released.
• 2008-Facebooks open sources the Cassandra project.
• 2009 the term NoSQL was reintroduced.
WHY NoSQL
• A NoSQL database is best for handling indeterminate, unrelated, or
rapidly changing data.
• It is intuitive to use for developers when the application dictates the
database schema.
• You can use it for applications that: Need flexible schemas that enable
faster and more iterative development.
CAP THEOREM
• The CAP theorem, originally introduced as the CAP principle, can be
used to explain some of the competing requirements in a distributed
system with replication. It is a tool used to make system designers
aware of the trade-offs while designing networked shared-data
systems.
• The three letters in CAP refer to three desirable properties of
distributed systems with replicated data: consistency (among
replicated copies), availability (of the system for read and write
operations) and partition tolerance (in the face of the nodes in the
system being partitioned by a network fault).
• The CAP theorem states that it is not possible to guarantee all three of
the desirable properties – consistency, availability, and partition
tolerance at the same time in a distributed system with data replication.
Schema free /
relaxed schemas
No definition req.
Non Relational heterogenous structures
flat fixed column records
self contained aggregates
dont req Obj Relational
Simple API
Mapping and Data
easy interface
normalization and ACID NoSQL Text based protocols
No std based query lang
web enabled dbsrunning
Open source
No Expensive
licensing/H/w
Distributed
reqirements
Multiple DBs
Auto scaling and fail over capabilities
only provides eventual consistency
Advantages and disadvantages of
NoSQL
R
e
l
a
t
i
o
n N
a o
l S
Q
D L
a
t
a
b
a
s
e
I I
t t
i i
s s
u u
s s
e e
d d
t t
o o
h h
a a
n n
d d
l l
e e
d d
a a
t t
a a
c c
o o
m m
i i
n n
g g
i i
n n
l h
o i
w g
h
v v
e e
l l
o o
c c
i i
t t
y y
. .
I
t
g
i
I v
t e
s
g
i b
v o
e t
s h
o r
n e
l a
y d
r a
e n
a d
d
w
s r
c i
a t
l e
a
b s
i c
l a
i l
t a
y b
. i
l
i
t
y
.
I I
t t
m m
a a
n n
a a
g g
e e
s s
s a
t l
r l
u t
c y
t p
u e
r
e o
d f
d d
a a
t t
a a
. .
D
a
t
a D
a
a t
r a
r
i a
v r
e r
s i
v
f e
r s
o
m f
r
o o
n m
e
m
o a
r n
y
f
e l
w o
c
l a
o t
c i
a o
t n
i s
o .
n
s
.
I I
t t
s s
u u
p p
p p
o o
r r
t t
s s
c s
o i
m m
p p
l l
e e
x
t t
r r
a a
n n
s s
a a
c c
t t
i i
o o
n n
s s
. .
I
t
N
h o
a
s s
i
s n
i g
n l
g e
l
e p
o
p i
o n
i t
n
t o
f
o
f f
a
f i
a l
i u
l r
u e
r .
e
.
I I
t t
h h
a a
n n
d d
l l
e e
s s
d d
a a
t t
a a
i i
n n
l h
e i
s g
s h
v v
o o
l l
u u
m m
e e
. .
T
T r
r a
a n
n s
s a
a c
c t
t i
i o
o n
n s
s
w
w r
r i
i t
t t
t e
e n
n
i
i n
n
m
o a
n n
e y
l l
o o
c c
a a
t t
i i
o o
n n
. s
.
s
u d
p o
p e
o s
r n
t ’
A t
C s
I u
D p
p p
r o
o r
p t
e A
r C
t I
i D
e
s p
c r
o o
m p
p e
l r
i t
a i
n e
c s
e
I
t
s
d
i
f
f E
i n
c a
u b
l l
t e
t s
o e
m a
a s
k y
e a
c n
h d
a f
n r
g e
e q
s u
i e
n n
t
d c
a h
t a
a n
b g
a e
s s
e
o t
n o
c d
e a
i t
t a
b
i a
s s
e
d
e
f
i
n
e
d
s
c
h
e
m s
a c
h
i e
s m
a
m d
a e
n s
d i
a g
t n
o
r i
y s
t n
o o
s t
t r
o e
r q
e u
t i
h r
e e
d
NoSQL Business Drivers
NoSQL Data Architecture Patterns
• Architecture Pattern is a logical way of categorizing data that will be stored on the Database.
NoSQL is a type of database which helps to perform operations on big data and store it in a
valid format. It is widely used because of its flexibility and a wide variety of services.
• Advantages:
• Can handle large amounts of data and heavy load,
• Easy retrieval of data by keys.
• Limitations:
• Complex queries may attempt to involve multiple key-value pairs which may delay
performance.
• Data can be involving many-to-many relationships which may collide.