0% found this document useful (0 votes)
27 views47 pages

No SQLMongo DB

notes

Uploaded by

khisakipsang
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
27 views47 pages

No SQLMongo DB

notes

Uploaded by

khisakipsang
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 47

Introduction

to NoSQL and
MongoDB
Dorcas Mwigereri
JKUAT University

1
Outline for
today
• Introduction to NoSQL
• Architecture
• Sharding
• Replica sets
• NoSQL Assumptions and the CAP
Theorem
• Strengths and weaknesses of NoSQL
• MongoDB
• Functionality
• Examples

2
Taxonomy of
NoSQL
•Key-value

•Graph database

•Document-oriented

•Column family
3
Typical NoSQL
architecture

Hashing
K
function
maps each
key to a
server
(node)

4
CAP theorem for
NoSQL
What the CAP theorem really says:
• If you cannot limit the number of faults and requests can be
directed to any server and you insist on serving every request you
receive then you cannot possibly be consistent Eric Brewer 2001

How it is interpreted:
• You must always give something up: consistency, availability or
tolerance to failure and reconfiguration

5
Theory of NOSQL:
CAP
GIVEN:
• Many nodes
C
• Nodes contain replicas of partitions
of the data

• Consistency
• All replicas contain the same
version of data
• Client always has the same view
of the data (no matter what
node)
• Availability
• System remains operational on A
failing nodes
• All clients can always read and write P
• Partition tolerance
• multiple entry points CAP Theorem:
• System remains operational
on system split
satisfying all three
(communication malfunction)
• System works well across at the same time is 6

physical
network partitions impossible
Available, Partition-
Tolerant (AP)
Systems achieve
"eventual
consistency" through
replication and
verification

Consistent,
Available (CA)
Systems
have trouble
Consistent, Partition-Tolerant (CP)
with
Systems have trouble with
partitions
availability while keeping data
and typically consistent across partitioned
deal with it nodes
with
replication
7

http://blog.nahurst.com/visual-guide-to-nosql-s
Sharding of
data
• Distributes a single logical database system across a cluster of
machines
• Uses range-based partitioning to distribute documents based
on a specific shard key
• Automatically balances the data associated with each shard
• Can be turned on and off per collection (table)

8
Replica Host1:10000

Sets
• Redundancy and Failover
Host2:10001

• Zero downtime for


Host3:10002
upgrades and
maintenance
replica1

• Master-slave
Clien
replication t

• Strong Consistency
• Delayed Consistency
• Geospatial features 9
How does NoSQL vary
from RDBMS?
• Looser schema definition
• Applications written to deal with specific documents/ data
• Applications aware of the schema definition as opposed to the data
• Designed to handle distributed, large databases
• Trade offs:
• No strong support for ad hoc queries but designed for speed
and growth of database
• Query language through the API
• Relaxation of the ACID properties

1
0
Benefits of
NoSQL
Elastic Scaling Big Data
• RDBMS scale up – bigger • Huge increase in data
load , bigger server RDMS: capacity and
• NO SQL scale out – constraints of data
distribute data across volumes at its limits
multiple hosts • NoSQL designed for big
seamlessly data
DBA Specialists
• RDMS require highly
trained expert to
monitor DB
• NoSQL require less
management, automatic
repair and simpler data
models 1
1
Benefits of
NoSQL
Flexible data models Economics
• Change management to • RDMS rely on expensive
schema for RDMS have proprietary servers to
to be carefully managed manage data
• NoSQL databases more • No SQL: clusters of
relaxed in structure of cheap commodity
data servers to manage the
• Database schema data and transaction
changes do not have to volumes
be managed as one • Cost per gigabyte or
complicated change unit
transaction/second for
• Application already
written to address an
NoSQL can be lower
amorphous schema than the cost for a
RDBMS 1
2
Drawbacks of
NoSQL
• Support • Maturity
• RDBMS vendors • RDMS mature
provide a high level of product: means stable
support to clients and dependable
• Stellar reputation • Also means old no
• NoSQL – are open longer cutting edge nor
interesting
source projects
with startups • NoSQL are still
supporting them implementing
• Reputation not their basic feature
yet established set
1
3
Drawbacks of
NoSQL
• Administration • Analytics and
• RDMS administrator well Business Intelligence
defined role • RDMS designed to
• No SQL’s goal: no address this niche
administrator necessary • NoSQL designed to meet
however NO SQL still the needs of an Web 2.0
requires effort to application - not
maintain designed for ad hoc
• Lack of Expertise query of the data
• Whole workforce of • Tools are being
developed to address
trained and seasoned this need
RDMS developers
• Still recruiting
developers to the NoSQL
camp 1
4
RDB ACID to NoSQL
BASE
Atomicity Basically

Consistency Available (CP)

Isolation Soft-state
(State of system may change
over time)

Durability Eventually
consistent 1
(Asynchronous propagation) 5

Pritchett, D.: BASE: An Acid Alternative (queue.acm.org/detail.cfm?


id=1394128)
First example:

1
6
What is
MongoDB?
• Developed by 10gen
• Founded in 2007
• A document-oriented, NoSQL database
• Hash-based, schema-less database
• No Data Definition Language
• In practice, this means you can store hashes with any keys and
values that you choose
• Keys are a basic data type but in reality stored as strings
• Document Identifiers (_id) will be created for each document, field name
reserved by system
• Application tracks the schema and mapping
• Uses BSON format
• Based on JSON – B stands for Binary
• Written in C++
• Supports APIs (drivers) in many computer languages 1
7
• JavaScript, Python, Ruby, Perl, Java, Java Scala, C#, C++, Haskell,
Erlang
Functionality of
MongoDB
• Dynamic schema
• No DDL
• Document-based database
• Secondary indexes
• Query language via an API
• Atomic writes and fully-consistent reads
• If system configured that way
• Master-slave replication with automated failover (replica sets)
• Built-in horizontal scaling via automated range-based
partitioning of data (sharding)
• No joins nor transactions
18
Why use
MongoDB?
• Simple queries
• Functionality provided applicable to most web applications
• Easy and fast integration of data
• No ERD diagram
• Not well suited for heavy and complex transactions systems

Where to Use MongoDB?


• Big Data
• Content Delivery and Management
• Data Hub
• Mobile and Social Infrastructure 19

• User Data Management


MongoDB: CAP
approach
Focus on Consistency C
and Partition tolerance
• Consistency
• all replicas contain the
same version of the data
• Availability
• system remains operational A
on failing nodes P
• Partition tolarence
CAP Theorem:
• multiple entry points satisfying all three at the same time is
• system remains operational impossible
on 2
system split 0
MongoDB: Hierarchical
Objects
• A MongoDB instance may have zero or more ‘databases’
• A database may have
zero or more ‘collections’.
• A collection may have zero or more ‘documents’.
• A document may have one or more ‘fields’.
• MongoDB ‘Indexes’ function much like their RDBMS
counterparts.

2
1
RDB Concepts to NO
SQL
RDBMS MongoDB
Database Database Collection is
not strict about
Table, View Collection what it Stores

Schema-less
Row Document (BSON)
Hierarchy is
Column Field evident in the
design
Index Index
Embedde
Join Embedded Document
d
Foreign Key Reference Document
?
2
2
Partition Shard
MongoDB Processes
and configuration
• Mongod – Database instance
• Mongos - Sharding processes
• Analogous to a database router.
• Processes all requests
• Decides how many and which mongods should receive the
query
• Mongos collates the results, and sends it back to the client.
• Mongo – an interactive shell ( a client)
• Fully functional JavaScript environment for use with a MongoDB
• You can have one mongos for the whole system no matter
how many mongods you have
• OR you can have one local mongos for every client if 2
3
you wanted to minimize network latency.
Choices made for Design
of MongoDB
• Scale horizontally over commodity hardware
• Lots of relatively inexpensive servers
• Keep the functionality that works well in RDBMSs
– Ad hoc queries
– Fully featured indexes
– Secondary indexes
• What doesn’t distribute well in RDB?
– Long running multi-row transactions
– Joins
– Both artifacts of the relational data model (row x column)

24
BSON
format
• Binary-encoded serialization of JSON-like documents
• Zero or more key/value pairs are stored as a single entity
• Each entry consists of a field name, a data type, and a value
• Large elements in a BSON document are prefixed with a
length field to facilitate scanning

25
Schema
Free
• MongoDB does not need any pre-defined data schema
• Every document in a collection could have different data
• Addresses NULL data fields

{name: name: “jeff”, {name: “brendan”,


“will”, eyes: “blue”, aliases: [“el
eyes: loc: [40.7, 73.4], diablo”]}
“blue”, boss: “ben”}
birthplace:
“NY”,
aliases: {name: “matt”,
[“bill”, “la pizza: “DiGiorno”,
ciacco”], height: 72,
loc: [32.7, name: “ben”, loc: [44.6, 71.3]}
63.4], hat: ”yes”}
boss:
”ben”}
JSON
format
• Data is in name / value pairs
• A name/value pair consists of a field name
followed by a colon, followed by a value:
• Example: “name”: “R2-D2”
• Data is separated by commas
• Example: “name”: “R2-D2”, race : “Droid”
• Curly braces hold objects
• Example: {“name”: “R2-D2”, race : “Droid”, affiliation:
“rebels”}
• An array is stored in brackets []
• Example [ {“name”: “R2-D2”, race : “Droid”,
affiliation: “rebels”},
• {“name”: “Yoda”, affiliation: “rebels”} ]
MongoDB
Features
• Document-Oriented storage
• Full Index Support
• Replication & High Agile
Availability
• Auto-Sharding
• Querying
• Fast In-Place Scalable
Updates
• Map/Reduce 2
functionality 8
Index
Functionality
• B+ tree indexes
• An index is automatically created on the _id field (the primary
key)
• Users can create other indexes to improve query performance
or to enforce Unique values for a particular field
• Supports single field index as well as Compound index
• Like SQL order of the fields in a compound index matters
• If you index a field that holds an array value, MongoDB
creates
separate index entries for every element of the array
• Sparse property of an index ensures that the index only
contain entries for documents that have the indexed field. (so
ignore records that do not have the field defined)
• If an index is both unique and sparse – then the system 2
will reject records that have a duplicate key value but allow 9

records that do not have the indexed field defined


CRUD
operations
• Create
• db.collection.insert( <document> )
• db.collection.save( <document> )
• db.collection.update( <query>, <update>, { upsert: true } )
• Read
• db.collection.find( <query>, <projection> )
• db.collection.findOne( <query>, <projection> )
• Update
• db.collection.update( <query>, <update>, <options> )
• Delete
• db.collection.remove( <query>, <justOne> )

Collection specifies the collection or


the ‘table’ to store the document 3
0
Create
Operations
Db.collection
document
specifies the collection or the ‘table’ to store the

• db.collection_name.insert( <document> )
• Omit the _id field to have MongoDB generate a unique key
• Example db.parts.insert( {{type: “screwdriver”, quantity: 15 } )
• db.parts.insert({_id: 10, type: “hammer”, quantity: 1 })
• db.collection_name.update( <query>, <update>, { upsert:
true } )
• Will update 1 or more records in a collection satisfying query
• db.collection_name.save( <document> )
• Updates an existing record or creates a new record

31
Read
Operations
• db.collection.find( <query>, <projection> ).cursor modified
• Provides functionality similar to the SELECT command
• <query> where condition , <projection> fields in result set
• Example: var PartsCursor = db.parts.find({parts:
“hammer”}).limit(5)
• Has cursors to handle a result set
• Can modify the query to impose limits, skips,
and sort orders.
• Can specify to return the ‘top’ number of records from the
result set
• db.collection.findOne( <query>, <projection> )

32
Query
Operators
Name Description
$eq Matches value that are equal to a specified value
$gt, $gte Matches values that are greater than (or equal to a specified value
$lt, $lte Matches values less than or ( equal to ) a specified value
$ne Matches values that are not equal to a specified value
$in Matches any of the values specified in an array
$nin Matches none of the values specified in an array
$or Joins query clauses with a logical OR returns all
$and Join query clauses with a loginal AND
$not Inverts the effect of a query expression
$nor Join query clauses with a logical NOR
$exists Matches documents that have a specified field 3
3

https://docs.mongodb.org/manual/reference/
operator/query/
Update
Operations
• db.collection_name.insert( <document> )
• Omit the _id field to have MongoDB generate a unique key
• Example db.parts.insert( {{type: “screwdriver”, quantity: 15 } )
• db.parts.insert({_id: 10, type: “hammer”, quantity: 1 })
• db.collection_name.save( <document> )
• Updates an existing record or creates a new record
• db.collection_name.update( <query>, <update>, { upsert: true } )
• Will update 1 or more records in a collection satisfying query
• db.collection_name.findAndModify(<query>, <sort>,
<update>,<new>, <fields>,<upsert>)
• Modify existing record(s) – retrieve old or new version of the record

34
Delete
Operations
• db.collection_name.remove(<query>, <justone>)
• Delete all records from a collection or matching a criterion
• <justone> - specifies to delete only 1 record matching the
criterion
• Example: db.parts.remove(type: /^h/ } ) - remove all parts starting
with h
• Db.parts.remove() – delete all documents in the parts collections

35
CRUD
examples
> db.user.insert({ > db.user.find ()
first: "John", { "_id" : ObjectId("51"),
"first" : "John",
last : "Doe", "last" : "Doe",
age: 39 "age" : 39
}) }

> db.user.update(
{"_id" : ObjectId(“51")},
{ > db.user.remove({
$set: {
age: 40, "first": /^J/
salary: 7000} }) 3
6
}
)
SQL vs. Mongo DB
entities
My SQL Mongo
DB
START TRANSACTION; db.contacts.save( { u
INSERT INTO contacts serName:
VALUES (NULL, “joeblow”,
emailAddresses: [
‘joeblow’);
“joe@blow.com”,
INSERT INTO
contact_emails
“joseph@blow.com }
VALUES ); ” ]
( NULL, ”joe@blow.com Similar to IDS from the
70’s Bachman’s
”, brainchild
LAST_INSERT_ID() ), DIFFERENCE: 3
MongoDB separates physical 7
( NULL, structure from logical structure
“joseph@blow.com”, Designed to deal with large
Aggregated
functionality
Aggregation framework provides SQL-like aggregation
functionality
• Pipeline documents from a collection pass through an
aggregation pipeline, which transforms these objects as they
pass through
• Expressions produce output documents based on
calculations performed on input documents
• Example db.parts.aggregate ( {$group : {_id: type,
totalquantity
: { $sum: quanity} } } )

38
Map reduce
functionality
• Performs complex aggregator functions given a collection of
keys, value pairs
• Must provide at least a map function, reduction function
and a
name of the result set
• db.collection.mapReduce( <mapfunction>,
<reducefunction>,
{ out: <collection>, query: <document>, sort: <document>,
limit: <number>, finalize: <function>, scope: <document>,
jsMode: <boolean>, verbose: <boolean> } )
• More description of map reduce next lecture

39
Indexes: High
performance read
• Typically used for frequently used queries
• Necessary when the total size of the documents exceeds the
amount of available RAM.
• Defined on the collection level
• Can be defined on 1 or more fields
• Composite index (SQL)  Compound index (MongoDB)
• B-tree index
• Only 1 index can be used by the query optimizer when
retrieving data
• Index covers a query - match the query conditions
and return
the results using only the index; 4
0
• Use index to provide the results.
Replication of
data
• Ensures redundancy, backup, and automatic failover
• Recovery manager in the RDMS
• Replication occurs through groups of servers known as replica
sets
• Primary set – set of servers that client tasks direct updates to
• Secondary set – set of servers used for duplication of data
• At the most can have 12 replica sets
• Many different properties can be associated with a secondary set
i.e. secondary-only, hidden delayed, arbiters, non-voting
• If the primary set fails the secondary sets ‘vote’ to elect the
new primary set

41
Consistency of
data
• All read operations issued to the primary of a replica set are
consistent with the last write operation
• Reads to a primary have strict consistency
• Reads reflect the latest changes to the data
• Reads to a secondary have eventual
consistency
• Updates propagate gradually
• If clients permit reads from secondary sets – then client may read a
previous state of the database
• Failure occurs before the secondary nodes are updated
• System identifies when a rollback needs to occur
• Users are responsible for manually applying rollback changes
42
Provides Memory
Mapped Files
• „A memory-mapped file is a segment of virtual memory which has
been assigned a direct byte-for-byte correlation with some portion
of a file or file-like resource.”1
• mmap()

43

1
:
Other additional
features
• Supports geospatial data of type
• Spherical
• Provides longitude and latitude
• Flat
• 2 dimensional points on a
plane
• Geospatial indexes

44
Interactive session: query through
API
Summar
y• NoSQL built to address a distributed database system
• Sharding
• Replica sets of data
• CAP Theorem: consistency, availability and partition
tolerant
• MongoDB
• Document oriented data, schema-less database, supports
secondary indexes, provides a query language, consistent reads
on primary sets
• Lacks transactions, joins

46
Limited BNF of a BSON
document
document

e_list
::=

::=
int32 e_list "\x00"

element e_list
BSON Document

Sequence of elements

"\x01" e_name data


element ::= type Specific data type

e_name ::= cstring Key name


string ::= int32 (byte*) "\x00" String

cstring ::= (byte*) "\x00" CString

binary ::= int32 subtype (byte*) Binary

subtype ::= "\x00" Binary / Generic


| "\x01" Function
| "\x02" Binary (Old)
| "\x03" UUID (Old)
| "\x04" UUID
| "\x05" MD5
47
| "\x80" User defined

code_w_s ::= int32 string document Code w/ scope

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy