0% found this document useful (0 votes)

5 views20 pages

Lec 6 - Big Data Storage Technologies II - NoSQL

This document discusses big data storage concepts, focusing on the integration of Hadoop, Spark, and NoSQL databases for analytics. It covers key topics such as MapReduce, the characteristics and types of NoSQL databases, and the emergence of NewSQL databases that combine ACID properties with NoSQL scalability. The lecture highlights the advantages of these technologies in handling large datasets and real-time processing needs.

Uploaded by

amirosama21

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

5 views20 pages

Lec 6 - Big Data Storage Technologies II - NoSQL

Uploaded by

amirosama21

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 20

Big Data Storage Concepts II

Lecture 6
A General View

▪ Hadoop, Spark, and NoSQL databases can work

together to create a powerful big-data analytics platform.

▪ Hadoop can be used for data storage and batch

processing,

▪ Spark can be used for real-time processing and analysis,

▪ NoSQL databases can be used for storing and querying

large volumes of unstructured data.
▪ In order to understand the underlying
mechanisms behind Big Data storage
technology, the following topics are
introduced in this lecture.
Outline ▪ MapReduce – Brief

▪ What is Spark.

▪ NoSQL Databases
▪ Characteristics

▪ NoSQL and CAP

▪ Four types of NoSQL Datastores

▪ Who Uses NoSQL

▪ NewSQL Databases

▪ Distributed SQL
3
MapReduce – Brief

▪ MapReduce is a batch-oriented processing engine used to process

large datasets using parallel processing deployed over clusters of
commodity hardware.

▪ It is highly scalable, reliable and is based on the principle of divide-

and-conquer, which provides built-in fault tolerance.

▪ MapReduce does not require that the input data conform to any
particular data model.

Processing in Batch Mode

Ref. No. [3] - Chapter 6
MapReduce: Simple
Programming for Big Data

▪ A dataset is broken down into multiple

smaller parts

▪ Operations are performed on each

part independently and in parallel.

▪ The MapReduce system sends

computation code (map and reduce
functions) to where the data resides.

▪ Favouring data locality and cluster

rack affinity rather than bringing data
to your application.
MapReduce: Simple
Programming for Big Data

https://www.todaysoftmag.com/article/1358/hadoop-mapreduce-deep-diving-and-tuning
▪ Forces your data processing into Map and
Reduce
▪ Based on “Acyclic Data Flow” from Disk to
Disk (HDFS)
▪ Read and write to Disk before and after Map
Shortcoming and Reduce
▪ Not efficient for iterative tasks. i.e. Machine
of Learning
MapReduce ▪ The Implementation is primarily written in Java
▪ Only for Batch processing
What Is Apache Spark?
▪ Apache Spark is a cluster-computing platform that provides an API for distributed
programming like the MapReduce model but is designed to be fast for interactive
queries and iterative algorithms.
▪ Spark provides in-memory storage for intermediate computations, where programs
can checkpoint data and refer back to it without reloading it from disk.
▪ It incorporates libraries for machine learning (MLlib), SQL for interactive queries
(Spark SQL), stream processing (Structured Streaming) for interacting with real-
time data, and graph processing (GraphX).

The Spark stack

Spark Uses Memory instead of Disk
Hadoop: Use Disk for Data Sharing

HDFS HDFS HDFS

HDFS
read Write read
Write
Iteration1 Iteration2

Spark: In-Memory Data Sharing

HDFS read

Iteration1 Iteration2

Afzal Godil, Apache Hadoop and Spark: Introduction and Use Cases for Data Analysis, Information Access Division, ITL, NIST
NoSQL Datastores

▪ The emergence of NoSQL datastores can primarily be attributed to the

volume, velocity and variety characteristics of Big Data datasets.
▪ NoSQL databases (aka "not only SQL") are non-tabular databases and store
data differently than relational tables.
▪ They have better horizontal scaling capability, fault-tolerant , and improved
performance for big data at the cost of having less rigorous consistency
models.
▪ These systems are optimized for fast retrieval and appending operations on
records where real-time performance is more important than consistency.
▪ NoSQL databases come in a variety of types based on their data model.

NoSQL vs SQL- 4 Reasons Why NoSQL is better

for Big Data applications
Feb 2023
NoSQL Datastores Characteristics

▪ This list should only be considered a general guide, as not all

NoSQL storage devices exhibit all of these features:
• Schema-less data model – Data can exist in its raw form.
• Scale out rather than scale up – More nodes can be added to efficiently meet
the needs for varying workloads.
• Highly available – This is built on cluster-based technologies that provide fault
tolerance out of the box.
• Lower operational costs – Many NoSQL databases are built on Open-Source
platforms with no licensing costs. They can often be deployed on commodity
hardware.
• BASE not ACID – Maintain high availability in the event of network/node failure,
while not requiring the database to be in a consistent state whenever an update
occurs. The database can be in a soft/inconsistent state until it eventually attains
consistency.
NoSQL Datastores Characteristics

• Auto sharding and replication – To support horizontal scaling and provide

high availability, a NoSQL storage device automatically employs sharding
and replication techniques where the dataset is partitioned horizontally and
then copied to multiple nodes.
• Distributed query support – NoSQL storage devices maintain consistent
query behaviour across multiple shards.
• Polyglot persistence – An approach of persisting data using different types
of storage technologies within the same solution architecture.
• Aggregate-oriented – NoSQL storage devices store de-normalized
aggregated data thereby eliminating the need for joins
• One exception, however, is that graph database storage devices are not aggregate-focused.
The relational model
divides the information
into tables of tuples.
This simple structure for
data is one of the key
aspects of its success and Aggregate Data
dominance
Models
▪ An aggregate is a collection of data that
we manipulate and manage as a unit.
Aggregate oriented
➢ complex record with simple fields, arrays, records
models take a different nested inside
approach.
They tend to operate on ▪ Aggregate-oriented databases work best
data in units that have a when most data interaction is done with
more complex the same aggregate (intra)
structure. ▪ Aggregate-ignorant databases are better
when interactions use data organized in
many different formations (inter)
Freedom and flexibility
Schemeless Databases double-edged sword

Schemaless allows for more flexibility than schema-based databased.

▪ However, there is less opportunity to automatically enforce data integrity
rules.
▪ There should be an implicit schema expected by users of the data.
A set of assumptions about the structure of the data in the code that
manipulates it.
▪ Schemaless database shifts the schema into the application code that
accesses it.
▪ This becomes problematic if multiple applications, developed by different
people, access the same database.
▪ In order to understand the structure of the data, you have to understand
the application code.
▪ Having a schemaless affects the efficiency of storing and retrieving the
data.
Polyglot Persistence

▪ Using multiple specialized persistent stores rather than one single general-
purpose database.

▪ “Monoglot” was (and still is) fine for simple application (one type of workload)

▪ But… applications become complex.

▪ A simple E-commerce platform must have:

• Session data (Add to Basket)

• Search Engine (Search for products)

• Recommendation engine.

• Payment platform.

• Geo Location service

NoSQL
Systems
and CAP
Four Types of NoSQL Databases
Who Uses NoSQL

▪ Over the last few years, NoSQL database technology has experienced
explosive growth and accelerating use by large enterprises. For example:

• Tesco uses NoSQL to support its catalogue, pricing, inventory, and

coupon applications.

• McGraw-Hill uses NoSQL to power its online learning platform

• Sky uses NoSQL to manage user profiles for 20 million subscribers

The Top 10 Enterprise NoSQL Use Cases

2015
NewSQL Databases

▪ NewSQL storage devices combine the ACID properties with the scalability
and fault tolerance offered by NoSQL storage devices.
▪ They generally support SQL compliant syntax for data definition and data
manipulation operations, and they often use a logical relational data model
for data storage.
▪ NewSQL databases can be used for developing OLTP systems with very
high volumes of transactions as they leverage in-memory storage.
▪ E.g. example a banking system. They can also be used for realtime analytics,

▪ Compared to a NoSQL storage device, a NewSQL storage device provides

an easier transition from a traditional RDBMS to a highly scalable database.
▪ Examples of NewSQL databases include VoltDB, NuoDB and InnoDB.

Fundamental Concepts of A Database System
100% (2)
Fundamental Concepts of A Database System
23 pages
NoSQL DBs
No ratings yet
NoSQL DBs
46 pages
Chap 1 Dbms
0% (2)
Chap 1 Dbms
13 pages
Prepared by Meseret Hailu (2021) 1
No ratings yet
Prepared by Meseret Hailu (2021) 1
34 pages
358 33 Powerpoint Slides DSC Chapter 16
No ratings yet
358 33 Powerpoint Slides DSC Chapter 16
49 pages
AMSC Brochure PDF
No ratings yet
AMSC Brochure PDF
16 pages
E Commerce N Big Data
No ratings yet
E Commerce N Big Data
13 pages
BI Unit 1
No ratings yet
BI Unit 1
143 pages
NDMP Configuration Overview
No ratings yet
NDMP Configuration Overview
34 pages
Ict 105 Assessment
No ratings yet
Ict 105 Assessment
6 pages
Chapter - 1
No ratings yet
Chapter - 1
25 pages
JAVA Development: Databases - SQL
No ratings yet
JAVA Development: Databases - SQL
38 pages
(SM) Chapter 4
No ratings yet
(SM) Chapter 4
19 pages
Tutorial 1 - Exploring Arcgis: Objectives
No ratings yet
Tutorial 1 - Exploring Arcgis: Objectives
11 pages
DB Lec 7
No ratings yet
DB Lec 7
25 pages
Exploring Indexing Systems and Techniques New1
No ratings yet
Exploring Indexing Systems and Techniques New1
20 pages
BDA (2) Merged
No ratings yet
BDA (2) Merged
29 pages
ERP Data Migration & Deployment
No ratings yet
ERP Data Migration & Deployment
20 pages
Digital Forensic Analysis Methodology
100% (1)
Digital Forensic Analysis Methodology
1 page
Difference Between Hardware and Software
No ratings yet
Difference Between Hardware and Software
35 pages
OmniPCX RECORD - Database Management Utitlity R2.4 Edition 05
No ratings yet
OmniPCX RECORD - Database Management Utitlity R2.4 Edition 05
32 pages
Smart India Hackathon Real Time Monitoring and Evaluation Software For Fire Department
No ratings yet
Smart India Hackathon Real Time Monitoring and Evaluation Software For Fire Department
10 pages
Quiz 9
No ratings yet
Quiz 9
3 pages
Eswis Guide
No ratings yet
Eswis Guide
17 pages
Unit V Big Data Frameworks
No ratings yet
Unit V Big Data Frameworks
42 pages
Cse299 Final Presentation
No ratings yet
Cse299 Final Presentation
12 pages
Sample Examination: CS 403/534 - Distributed Systems Spring 2003
No ratings yet
Sample Examination: CS 403/534 - Distributed Systems Spring 2003
3 pages
Kumar 2021
No ratings yet
Kumar 2021
8 pages
New 11g Features in Oracle Developer Tools For Visual Studio
No ratings yet
New 11g Features in Oracle Developer Tools For Visual Studio
13 pages
Senior ETL Consultant - Informatica - 10+yrs Exp - NARENDRAKUMAR JAYAVARAPU - Resume
No ratings yet
Senior ETL Consultant - Informatica - 10+yrs Exp - NARENDRAKUMAR JAYAVARAPU - Resume
2 pages
Oracle Database No SQL-1
No ratings yet
Oracle Database No SQL-1
28 pages
Unit 6
No ratings yet
Unit 6
143 pages
Revised Distance Education Professional Time Table Session 2015-16 Exam Dec. 20163279
No ratings yet
Revised Distance Education Professional Time Table Session 2015-16 Exam Dec. 20163279
5 pages
NoSQL MongoDB HBase Cassandra
100% (1)
NoSQL MongoDB HBase Cassandra
142 pages
R23 IDS Unit3
No ratings yet
R23 IDS Unit3
36 pages
Chap8 Basic Cluster Analysis
100% (1)
Chap8 Basic Cluster Analysis
104 pages
BDA Unit 2
No ratings yet
BDA Unit 2
30 pages
NoSQL Database
No ratings yet
NoSQL Database
8 pages
N SQL D: Y D: O Atabases Earning For Isambiguation
No ratings yet
N SQL D: Y D: O Atabases Earning For Isambiguation
18 pages
BDA Unit2 Complete
No ratings yet
BDA Unit2 Complete
56 pages
Nosql Database: New Era of Databases For Big Data Analytics-Classification, Characteristics and Comparison
No ratings yet
Nosql Database: New Era of Databases For Big Data Analytics-Classification, Characteristics and Comparison
13 pages
777 1651399819 BD Module 5
No ratings yet
777 1651399819 BD Module 5
75 pages
Nosql in The Enterprise: Sourav Mazumder
No ratings yet
Nosql in The Enterprise: Sourav Mazumder
14 pages
Quiz 8 On 10
No ratings yet
Quiz 8 On 10
4 pages
Introduction To NoSQL
No ratings yet
Introduction To NoSQL
29 pages
CHAPTER 03: Big Data Technology Landscape
No ratings yet
CHAPTER 03: Big Data Technology Landscape
81 pages
Nosql Tricks
No ratings yet
Nosql Tricks
34 pages
CT113H Lecture 1 - Introduction To NoSQL
No ratings yet
CT113H Lecture 1 - Introduction To NoSQL
51 pages
Nosql, Mongodb
No ratings yet
Nosql, Mongodb
18 pages
NoSql 2024 Assign2
No ratings yet
NoSql 2024 Assign2
189 pages
BDA Module 5 - Part1 (No SQL) 2023
No ratings yet
BDA Module 5 - Part1 (No SQL) 2023
32 pages
CloudComputing DATABASE
No ratings yet
CloudComputing DATABASE
27 pages
Module 1
No ratings yet
Module 1
34 pages
No SQL
No ratings yet
No SQL
12 pages
BDA Unit-3
No ratings yet
BDA Unit-3
13 pages
Unit VI Big Data
No ratings yet
Unit VI Big Data
19 pages
Lecture 6 - NoSQL
No ratings yet
Lecture 6 - NoSQL
28 pages
NoSQL M1
No ratings yet
NoSQL M1
48 pages
Fdocuments - in Nosql-Seminar
No ratings yet
Fdocuments - in Nosql-Seminar
40 pages
Bda Chapter 3 This Is The Notes of Bda
No ratings yet
Bda Chapter 3 This Is The Notes of Bda
14 pages
Unit Ii - Nosql Databases
No ratings yet
Unit Ii - Nosql Databases
112 pages
Module 3 Bigdata Analytics
No ratings yet
Module 3 Bigdata Analytics
19 pages
Unit VI - 1
No ratings yet
Unit VI - 1
31 pages
CC - Lecture 6-Data
No ratings yet
CC - Lecture 6-Data
44 pages
NO SQL Unit 1
No ratings yet
NO SQL Unit 1
66 pages
Unit 4 BDA
No ratings yet
Unit 4 BDA
22 pages
NOSQL Lecture 1 Notes
No ratings yet
NOSQL Lecture 1 Notes
31 pages
2 Big Data Analytics-Hadoop R21 A7902 ABP
No ratings yet
2 Big Data Analytics-Hadoop R21 A7902 ABP
16 pages
Nosql Module 1
No ratings yet
Nosql Module 1
23 pages
Massively Parallel Cloud Data Storage Systems: S. Sudarshan IIT Bombay
No ratings yet
Massively Parallel Cloud Data Storage Systems: S. Sudarshan IIT Bombay
17 pages
Big Data
No ratings yet
Big Data
53 pages
NoSQL D
No ratings yet
NoSQL D
26 pages
Dbms Presentation
No ratings yet
Dbms Presentation
22 pages
No SQL
No ratings yet
No SQL
19 pages
Introduction To NoSQL
No ratings yet
Introduction To NoSQL
13 pages
41 NoSQL Introduction
No ratings yet
41 NoSQL Introduction
18 pages
Bcse302l Dbms Module-7 Nosql
No ratings yet
Bcse302l Dbms Module-7 Nosql
30 pages
Learning Guide 2.1 - CloudDatabase - NOSQL PDF
No ratings yet
Learning Guide 2.1 - CloudDatabase - NOSQL PDF
44 pages
NoSQL
No ratings yet
NoSQL
18 pages
NOsql Presentation
No ratings yet
NOsql Presentation
20 pages
Lecture 1 - NoSQL
No ratings yet
Lecture 1 - NoSQL
31 pages
Module 5 - NoSQL Databases
No ratings yet
Module 5 - NoSQL Databases
33 pages
Lecture 1
No ratings yet
Lecture 1
31 pages
BDA CW Chapter 3
No ratings yet
BDA CW Chapter 3
9 pages
No SQL
No ratings yet
No SQL
12 pages
NoSQL Notes
No ratings yet
NoSQL Notes
11 pages
Big Data Unit-Ii Notes
No ratings yet
Big Data Unit-Ii Notes
7 pages
Cs 620 / Dasc 600 Introduction To Data Science & Analytics: Lecture 6-Nosql
No ratings yet
Cs 620 / Dasc 600 Introduction To Data Science & Analytics: Lecture 6-Nosql
31 pages
DBA's Guide to NoSQL
From Everand
DBA's Guide to NoSQL
The Enlightened DBA
5/5 (1)
DBMS MASTER: Become Pro in Database Management System
From Everand
DBMS MASTER: Become Pro in Database Management System
Ummed Singh
No ratings yet

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Lec 6 - Big Data Storage Technologies II - NoSQL

Uploaded by

Lec 6 - Big Data Storage Technologies II - NoSQL

Uploaded by

Big Data Storage Concepts II

▪ Hadoop, Spark, and NoSQL databases can work

▪ Hadoop can be used for data storage and batch

▪ Spark can be used for real-time processing and analysis,

▪ NoSQL databases can be used for storing and querying

▪ NoSQL and CAP

▪ Four types of NoSQL Datastores

▪ Who Uses NoSQL

▪ MapReduce is a batch-oriented processing engine used to process

▪ It is highly scalable, reliable and is based on the principle of divide-

Processing in Batch Mode

▪ A dataset is broken down into multiple

▪ Operations are performed on each

▪ The MapReduce system sends

▪ Favouring data locality and cluster

The Spark stack

HDFS HDFS HDFS

Spark: In-Memory Data Sharing

▪ The emergence of NoSQL datastores can primarily be attributed to the

NoSQL vs SQL- 4 Reasons Why NoSQL is better

▪ This list should only be considered a general guide, as not all

• Auto sharding and replication – To support horizontal scaling and provide

Schemaless allows for more flexibility than schema-based databased.

▪ But… applications become complex.

▪ A simple E-commerce platform must have:

• Session data (Add to Basket)

• Search Engine (Search for products)

• Geo Location service

• Tesco uses NoSQL to support its catalogue, pricing, inventory, and

• McGraw-Hill uses NoSQL to power its online learning platform

• Sky uses NoSQL to manage user profiles for 20 million subscribers

The Top 10 Enterprise NoSQL Use Cases

▪ Compared to a NoSQL storage device, a NewSQL storage device provides

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.