0% found this document useful (0 votes)

20 views7 pages

Big Data Unit-Ii Notes

Uploaded by

t88699857

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

20 views7 pages

Big Data Unit-Ii Notes

Uploaded by

t88699857

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 7

NoSQL Data Management

NoSQL databases are designed to handle large volumes of data, scalability, and unstructured or
semi-structured data. Unlike traditional relational databases, NoSQL databases do not rely on
fixed schemas, making them more flexible and suitable for modern applications such as web
services, IoT, and big data.

Introduction to NoSQL

 Definition: NoSQL stands for "Not Only SQL." It encompasses a wide variety of
database technologies designed to overcome limitations of relational databases.
 Characteristics:
o Schema-less
o Horizontal scalability
o High availability and fault tolerance
o Support for unstructured, semi-structured, and structured data
 Applications: Big Data analytics, real-time web applications, content management, etc.
 Types:
o Key-Value Stores
o Document Stores
o Column-Family Stores
o Graph Databases

Aggregate Data Models

 Focuses on grouping related data into a single unit, called an aggregate.

 Example: A document in MongoDB that contains a user's profile, preferences, and
purchase history.
 Benefits:
o Simplifies data access patterns.
o Reduces the need for complex joins.
 Types:
o Key-Value Models: Simple key-value pairs.
o Document Models: JSON or BSON-like objects.
o Column-Family Models: Data stored in rows and grouped by columns.
 Use Cases: Aggregates make it easy to replicate and distribute data.

Aggregates

 Definition: Aggregates are collections of related data treated as a single unit.

 Importance:
o Defines boundaries for transactions.
o Improves scalability and data locality.
 Examples:
o Shopping cart (key-value)
o Blog post with comments (document model)
Key-Value and Document Data Models

 Key-Value Stores:
o Simplest NoSQL model.
o Data stored as key-value pairs.
o Examples: Redis, DynamoDB.
o Use Cases: Session management, caching.
 Document Stores:
o Data stored as JSON, BSON, or XML documents.
o Examples: MongoDB, CouchDB.
o Use Cases: Content management, real-time analytics.

Relationships

 Relational databases handle relationships using foreign keys and joins.

 NoSQL manages relationships differently:
o Embedding: Nest related data within a single document.
o Referencing: Link data using identifiers.
o Graph Databases: Use edges and nodes to model relationships explicitly.
 Examples:
o Embedded documents in MongoDB.
o Relationships in Neo4j.

Graph Databases

Graph databases are a type of database designed to store and query data structured as a graph,
where entities (nodes) are connected by relationships (edges). They are particularly suited for
applications that require modeling and querying complex, interconnected data efficiently.

Key Concepts in Graph Databases

1. Nodes: Represent entities, such as people, places, or objects.

2. Edges: Represent relationships or connections between nodes. For example,
"FRIENDS_WITH" or "LIKES."
3. Properties: Metadata attached to nodes or edges, such as a person's name, age, or the
date a relationship was established.
4. Labels: Tags assigned to nodes to classify them (e.g., "Person" or "Movie").

Advantages of Graph Databases

1. Efficient Querying of Relationships: Ideal for traversing and querying relationships in

highly interconnected datasets.
2. Flexible Schema: No fixed schema, allowing for changes to the data model without
disrupting the database.
3. Performance: Queries involving relationships can be faster than in relational databases,
as relationships are stored directly in the database.
4. Visualization: Data and relationships are easy to visualize for better understanding.

Common Use Cases

 Social Networks: Modeling friendships, followers, or group memberships.

 Recommendation Engines: Suggesting products, movies, or content based on user
preferences and behaviors.
 Fraud Detection: Identifying patterns and anomalies in transactions.
 Network Analysis: Telecommunications, transport, or logistics optimization.
 Knowledge Graphs: Representing and querying knowledge bases.

Popular Graph Databases

1. Neo4j: The most popular graph database, with Cypher as its query language.
2. Amazon Neptune: A cloud-based graph database service.
3. ArangoDB: A multi-model database supporting graph, document, and key-value data.
4. JanusGraph: Open-source, scalable graph database optimized for large graphs.
5. TigerGraph: Focused on real-time analytics and scalability.

Query Languages

 Cypher: Used by Neo4j, known for its SQL-like syntax for querying graphs.
 Gremlin: A traversal language used with Apache TinkerPop-compliant databases like
JanusGraph.
 SPARQL: A query language for querying RDF (Resource Description Framework) data,
often used in semantic web applications.

Schema-Less Databases

Types of Schema-less Databases

1. Key-Value Stores
o Data is stored as key-value pairs.
o Example: Redis, Amazon DynamoDB, Riak.
o Use Case: Session management, caching, and user preferences.
2. Document Stores
o Data is stored as documents (e.g., JSON, BSON, or XML).
o Example: MongoDB, Couchbase, RavenDB.
o Use Case: Content management systems, e-commerce, and real-time analytics.
3. Column-Family Stores
oData is stored in columns grouped into families.
oExample: Apache Cassandra, HBase.
oUse Case: Time-series data, log analysis, and IoT applications.
4. Graph Databases
o Focus on storing data as nodes and edges (relationships).
o Example: Neo4j, Amazon Neptune, TigerGraph.
o Use Case: Social networks, fraud detection, and recommendation engines.

Advantages of Schema-less Databases

1. Rapid Development: Developers can iterate quickly without worrying about schema
changes.
2. Adaptability: Supports semi-structured and unstructured data, such as JSON or
multimedia.
3. Scalability: Suited for distributed architectures with high availability and fault tolerance.
4. Cost-Effective: Handles large data volumes without expensive, high-end hardware.

Disadvantages of Schema-less Databases

1. Complexity in Queries: May lack the rich querying capabilities of SQL.

2. Data Integrity: Schema enforcement must often be handled at the application level.
3. Consistency: May sacrifice strict consistency (in favor of eventual consistency) for better
performance and availability.

Popular Schema-less Databases

1. MongoDB: A document-oriented database widely used for modern web applications.

2. Apache Cassandra: A column-family database optimized for high availability.
3. Redis: An in-memory key-value store known for its speed.
4. Amazon DynamoDB: A cloud-native key-value and document database.
5. Couchbase: A distributed, document-oriented NoSQL database.

When to Use Schema-less Databases

 Rapidly changing data models.

 Applications with large-scale or distributed requirements.
 Use cases involving unstructured or semi-structured data.
 High-performance, low-latency requirements (e.g., caching, real-time analytics).
Materialized Views

 Definition: Precomputed query results stored for faster access.

 Benefits:
o Improves performance for frequently run queries.
o Reduces computation overhead.
 Examples: Used in Cassandra for denormalized queries.
 Challenges: Keeping materialized views up-to-date.

Introduction to Distribution Models

 Definition:
Distribution models refer to architectures and strategies for distributing data and
computational tasks across multiple nodes or servers.
 Key Goals:
o Scalability: Handle growing data and workload.
o Fault Tolerance: Maintain reliability despite failures.
o Efficiency: Maximize resource utilization and minimize latency.
 Common Models:
o Centralized vs. Decentralized
o Master-Slave Architectures
o Peer-to-Peer Systems

Sharding
 Definition: Sharding is a database architecture pattern that splits large datasets into
smaller, more manageable pieces called shards.
 Purpose:
o Improves scalability and performance.
o Distributes load evenly across servers.
 Key Techniques:
o Horizontal Partitioning: Split rows across shards.
o Vertical Partitioning: Split columns across shards.
 Examples in Practice:
o Database sharding in NoSQL systems (e.g., MongoDB, Cassandra).
o URL shortening services.
MapReduce: A Distributed Computing Paradigm

 Overview:
o Introduced by Google to process large-scale data.
o Works by dividing tasks into smaller sub-tasks that are processed in parallel.
 Core Phases:
1. Map: Processes input data to generate intermediate key-value pairs.
2. Shuffle and Sort: Groups intermediate data by keys.
3. Reduce: Aggregates and combines data for the final output.
 Advantages:
o Fault tolerance through re-execution of failed tasks.
o Scalability to handle petabytes of data.
 Examples of Use Cases:
o Word count
o Log analysis
o Machine learning tasks

Partitioning and Combining

 Partitioning in MapReduce:
o Divides input data into chunks, ensuring balanced workload.
o Controlled by a partition function (e.g., hash-based partitioning).
 Combining:
o A local Reduce step performed on intermediate data before the shuffle phase.
o Optimizes the process by reducing the volume of data transferred.
o Example: Pre-aggregating word counts before the reduce phase.

Composing MapReduce Calculations

 Concept of Composition:
o Breaking down complex problems into multiple MapReduce jobs.
o Each job’s output serves as the input for the next.
 Techniques:
o Chaining Jobs: A pipeline of dependent MapReduce operations.
o Directed Acyclic Graph (DAG): Frameworks like Apache Spark generalize
MapReduce with DAGs for more complex workflows.
 Practical Applications:
o Data transformations (e.g., ETL pipelines).
o Multi-stage machine learning workflows.
Practical Examples and Exercises

 Sharding Example:
o Design a sharding strategy for a social media application with billions of users.
 MapReduce Example:
o Implement a MapReduce job to count the frequency of words in a large text
dataset.
 Partitioning and Combining Example:
o Optimize a MapReduce job by adding a Combiner to pre-aggregate intermediate
results.
 Composing MapReduce Example:
o Build a multi-stage pipeline to compute the PageRank of web pages.

TS460 - 2 Sales in SAP S/4HANA Academy Part I 2/2
No ratings yet
TS460 - 2 Sales in SAP S/4HANA Academy Part I 2/2
20 pages
Graph Databases: Key Points: 1. Definition & Basics
No ratings yet
Graph Databases: Key Points: 1. Definition & Basics
20 pages
Unit II No-SQL DB Managment
No ratings yet
Unit II No-SQL DB Managment
33 pages
Unit 2
No ratings yet
Unit 2
41 pages
Types of NoSQL Databases - GeeksforGeeks
No ratings yet
Types of NoSQL Databases - GeeksforGeeks
9 pages
Unit 2 Handouts
No ratings yet
Unit 2 Handouts
11 pages
NoSQL Databases
No ratings yet
NoSQL Databases
20 pages
Module 3 Bigdata Analytics
No ratings yet
Module 3 Bigdata Analytics
19 pages
No SQL
No ratings yet
No SQL
12 pages
No SQL
No ratings yet
No SQL
38 pages
Unit 6
No ratings yet
Unit 6
143 pages
Nosql Module 1
No ratings yet
Nosql Module 1
23 pages
BDA (2) Merged
No ratings yet
BDA (2) Merged
29 pages
Bda Notes (Unit-2)
No ratings yet
Bda Notes (Unit-2)
26 pages
Unit-3 BDA
No ratings yet
Unit-3 BDA
21 pages
Overview of NoSQL
No ratings yet
Overview of NoSQL
17 pages
Dbms Presentation
No ratings yet
Dbms Presentation
22 pages
Bda Unit12
No ratings yet
Bda Unit12
9 pages
NoSQL Unit 1 & 2 QnA
No ratings yet
NoSQL Unit 1 & 2 QnA
18 pages
Unit Ii - Nosql Databases
No ratings yet
Unit Ii - Nosql Databases
112 pages
Lec 15 Notes
No ratings yet
Lec 15 Notes
3 pages
Chap 4
No ratings yet
Chap 4
18 pages
BIG Data 2
No ratings yet
BIG Data 2
18 pages
Cs 620 / Dasc 600 Introduction To Data Science & Analytics: Lecture 6-Nosql
No ratings yet
Cs 620 / Dasc 600 Introduction To Data Science & Analytics: Lecture 6-Nosql
31 pages
Lecture 3.1.2
No ratings yet
Lecture 3.1.2
47 pages
Lecture 1 - NoSQL
No ratings yet
Lecture 1 - NoSQL
31 pages
NoSQL Big Data Management
No ratings yet
NoSQL Big Data Management
36 pages
NOSQL Lecture 1 Notes
No ratings yet
NOSQL Lecture 1 Notes
31 pages
NOsql Presentation
No ratings yet
NOsql Presentation
20 pages
41 NoSQL Introduction
No ratings yet
41 NoSQL Introduction
18 pages
Lecture 1
No ratings yet
Lecture 1
31 pages
Chapter14 BigData&NoSQLDatabases
No ratings yet
Chapter14 BigData&NoSQLDatabases
39 pages
Nosql, Mongodb
No ratings yet
Nosql, Mongodb
18 pages
BIG - DATA - Unit 4
No ratings yet
BIG - DATA - Unit 4
99 pages
Unit 2
No ratings yet
Unit 2
65 pages
Introduction To NoSQL
No ratings yet
Introduction To NoSQL
29 pages
Full Stack UNIT3
No ratings yet
Full Stack UNIT3
57 pages
NOSQL
No ratings yet
NOSQL
25 pages
Bcse302l Dbms Module-7 Nosql
No ratings yet
Bcse302l Dbms Module-7 Nosql
30 pages
Module 5 - NoSQL Databases
No ratings yet
Module 5 - NoSQL Databases
33 pages
CloudComputing DATABASE
No ratings yet
CloudComputing DATABASE
27 pages
No SQL
No ratings yet
No SQL
32 pages
Assignment 4 Rdbms
No ratings yet
Assignment 4 Rdbms
18 pages
Unit No - 6 Bda
No ratings yet
Unit No - 6 Bda
16 pages
Types No-Sql
No ratings yet
Types No-Sql
3 pages
Unit 2
No ratings yet
Unit 2
26 pages
No SQL
No ratings yet
No SQL
38 pages
06 NoSQL
No ratings yet
06 NoSQL
80 pages
NOSQL
No ratings yet
NOSQL
15 pages
Unit 3 NoSQL
No ratings yet
Unit 3 NoSQL
98 pages
NoSQL Database
No ratings yet
NoSQL Database
8 pages
Nosql 20240103 114025 0000
No ratings yet
Nosql 20240103 114025 0000
24 pages
NoSQL Database Comprehensive Report
No ratings yet
NoSQL Database Comprehensive Report
75 pages
BDS Session 5 - NoSQL DB
No ratings yet
BDS Session 5 - NoSQL DB
51 pages
Lecture NoSqlIntro
No ratings yet
Lecture NoSqlIntro
30 pages
BDA Module 5 - Part1 (No SQL) 2023
No ratings yet
BDA Module 5 - Part1 (No SQL) 2023
32 pages
Learning Guide 2.1 - CloudDatabase - NOSQL PDF
No ratings yet
Learning Guide 2.1 - CloudDatabase - NOSQL PDF
44 pages
Bigdata Unit 4
No ratings yet
Bigdata Unit 4
97 pages
Nosql
No ratings yet
Nosql
64 pages
DBMS MASTER: Become Pro in Database Management System
From Everand
DBMS MASTER: Become Pro in Database Management System
Ummed Singh
No ratings yet
Databases: System Concepts, Designs, Management, and Implementation
From Everand
Databases: System Concepts, Designs, Management, and Implementation
Jonathan Rigdon
No ratings yet
6.6.7 Packet Tracer - Configure PAT - ILM
No ratings yet
6.6.7 Packet Tracer - Configure PAT - ILM
5 pages
Splunk Fundamentals 1 Lab Exercises: Lab Module 6 - Using Fields in Searches
No ratings yet
Splunk Fundamentals 1 Lab Exercises: Lab Module 6 - Using Fields in Searches
3 pages
Fiori Space and Pages
No ratings yet
Fiori Space and Pages
26 pages
Service-Now: Types of Support Tools
No ratings yet
Service-Now: Types of Support Tools
4 pages
Class 7 Comprehensive
No ratings yet
Class 7 Comprehensive
3 pages
Wa0004.
No ratings yet
Wa0004.
9 pages
Message
No ratings yet
Message
2 pages
User Manual: Semikron Skiip - Tester Manual Control Unit
100% (2)
User Manual: Semikron Skiip - Tester Manual Control Unit
20 pages
IT Audit 4ed SM Ch7
No ratings yet
IT Audit 4ed SM Ch7
10 pages
M365 CSP Masters Professional Badge Instructions
No ratings yet
M365 CSP Masters Professional Badge Instructions
5 pages
Chatgpt Developer Cheatsheet
100% (1)
Chatgpt Developer Cheatsheet
56 pages
Lucas-Kanade in A Nutshell: 1 Motivation
No ratings yet
Lucas-Kanade in A Nutshell: 1 Motivation
5 pages
Google Starttls Domains
No ratings yet
Google Starttls Domains
3,395 pages
License Plate Detection Using Yolov8X and Easy OCR: Abstract
No ratings yet
License Plate Detection Using Yolov8X and Easy OCR: Abstract
9 pages
Snowflake Tables
No ratings yet
Snowflake Tables
4 pages
Tutorial DVD - To - Avi (Avidemux) - 1584054360621 PDF
No ratings yet
Tutorial DVD - To - Avi (Avidemux) - 1584054360621 PDF
4 pages
1 Development of Monitoring Robot System For Tomato
No ratings yet
1 Development of Monitoring Robot System For Tomato
14 pages
Educonnect (Student Web Interface)
No ratings yet
Educonnect (Student Web Interface)
25 pages
ICDL Presentation 2016 6.0 - QRG
No ratings yet
ICDL Presentation 2016 6.0 - QRG
4 pages
6-7 Expansion of Function Taylors
No ratings yet
6-7 Expansion of Function Taylors
5 pages
Building and Managing System
No ratings yet
Building and Managing System
5 pages
Schneider Electric Altivar Process ATV9xx DTM Library V3.8.2 ReleaseNotes
No ratings yet
Schneider Electric Altivar Process ATV9xx DTM Library V3.8.2 ReleaseNotes
9 pages
Implementing EHR in Nigeria: Potential Challenge and Benefits
No ratings yet
Implementing EHR in Nigeria: Potential Challenge and Benefits
5 pages
IMS Roaming, Interconnection and Interworking Guidelines 02 December 2020
No ratings yet
IMS Roaming, Interconnection and Interworking Guidelines 02 December 2020
74 pages
DINLect 1
No ratings yet
DINLect 1
69 pages
Logging Cookbook: Table Des Matières
No ratings yet
Logging Cookbook: Table Des Matières
48 pages
97.1.2 Relationship of 1000BASE-T1 To Other Standards
No ratings yet
97.1.2 Relationship of 1000BASE-T1 To Other Standards
6 pages
J2ME - FileReader and FileWriter Example Program in Java
100% (1)
J2ME - FileReader and FileWriter Example Program in Java
2 pages
MANUSCRIPT
No ratings yet
MANUSCRIPT
7 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Big Data Unit-Ii Notes

Uploaded by

Big Data Unit-Ii Notes

Uploaded by

NoSQL Data Management

Aggregate Data Models

 Focuses on grouping related data into a single unit, called an aggregate.

 Definition: Aggregates are collections of related data treated as a single unit.

 Relational databases handle relationships using foreign keys and joins.

Key Concepts in Graph Databases

1. Nodes: Represent entities, such as people, places, or objects.

Advantages of Graph Databases

1. Efficient Querying of Relationships: Ideal for traversing and querying relationships in

Common Use Cases

 Social Networks: Modeling friendships, followers, or group memberships.

Popular Graph Databases

Types of Schema-less Databases

Advantages of Schema-less Databases

Disadvantages of Schema-less Databases

1. Complexity in Queries: May lack the rich querying capabilities of SQL.

Popular Schema-less Databases

1. MongoDB: A document-oriented database widely used for modern web applications.

When to Use Schema-less Databases

 Rapidly changing data models.

 Definition: Precomputed query results stored for faster access.

Introduction to Distribution Models

Partitioning and Combining

Composing MapReduce Calculations

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.