0% found this document useful (0 votes)

123 views7 pages

Unit 4

Big Data solutions require distributed computing architectures like HDFS and NoSQL databases to store and access large, distributed data in a scalable way. HDFS allows for sequential access of data files while NoSQL databases like HBase allow for random read/write access and are more flexible in their data schemas. MongoDB and Cassandra are examples of NoSQL databases that can store data in HDFS-compatible distributed data stores and include query languages. MongoDB is an open-source, document-oriented NoSQL database that provides high performance, high availability, and automatic scaling for distributed data.

Uploaded by

manik

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

123 views7 pages

Unit 4

Uploaded by

manik

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 7

Unit-4

Big Data solutions require a scalable distributed computing model with shared-nothing
architecture. A solution is Big Data store in HDFS files. NoSQL data also store Big Data,
and facilitate random read/write accesses. The accesses are sequential in HDFS data.
HBase is a NoSQL solution (Section 2.6.3). Examples of other solutions are MongoDB
and Cassandra. MongoDB and Cassandra DBMSs create HDFS compatible distributed
data stores and include their specific query processing languages.

NoSQL is an altogether new approach of thinking about databases, such as schema

flexibility, simple relationships, dynamic schemas, auto sharding, replication, integrated
caching, horizontal scalability of shards, distributable tuples, semi-structures data and
flexibility in approach. Issues with NoSQL data stores are lack of standardization in
approaches, processing difficulties for complex queries, dependence on eventually
consistent results in place of consistency in all states.

Schema-Less Models
NoSQL data not necessarily have a fixed table schema. The systems do not use the
concept of Join (between distributed datasets). A cluster-based highly distributed node
manages a single large data store with a NoSQL DB. Data written at one node replicates
to multiple nodes. Therefore, these are identical, fault-tolerant and partitioned into
shards.
Distributed databases can store and process a set of information on more than one
computing nodes. NoSQL data model offers relaxation in one or more of the ACID
properties (Atomicity, consistence, isolation and durability) of the database. Distribution
follows CAP theorem. CAP theorem states that out of the three properties, two must at
least be present for the application/service/process.

NOSQL DATA ARCHITECTURE PATTERNS

Key-Value Store

The simplest way to implement a schema-less data store is to use key-value pairs. The
data store characteristics are high performance, scalability and flexibility. Data retrieval
is fast in key-value pairs data A simple string called, key maps to a large data string or
BLOB (Basic Large Object). Key-value store accesses use a primary key for accessing
the values. Therefore, the store can be easily scaled up for very large data.

Typical uses of key-value store are: (i) Image store, (ii) Document or file store, (iii) Lookup
table, and (iv) Query-cache.
Redis, Dynamo, Riak are some NoSQL examples of key-value store DataBases. They are
all based on Amazon’s Dynamo paper.

Document Data Store

Characteristics of Document Data Store are high performance and flexibility. Scalability
varies, depends on stored contents. Complexity is low compared to tabular, object and
graph data stores. Following are the features in Document Store: 1. Document stores
unstructured data. 2. Storage has similarity with object store. 3. Data stores in nested
hierarchies. For example, inJSON formats data model [Example 3.3(ii)], XML document
object model (DOM), or machine-readable data as one BLOB. Hierarchical information
stores in a single unit called document tree. Logical data stored together in a unit.

CSV and JSON File Formats CSV data store is a format for records [Example 1.9 and
Example 3.3(i)]. CSV does not represent object-oriented databases or hierarchical data
records. ]SON and XML represent semistructured data, object• oriented records and
hierarchical data records.

Amazon SimpleDB, CouchDB, MongoDB, Riak, Lotus Notes, MongoDB, are popular
Document originated DBMS systems.

Tabular Data

Tabular data stores use rows and columns. Row-head field may be used as a key which
accesses and retrieves multiple values from the successive columns in that row. The
OLTP is fast on in-memory row-format data.

Oracle DBs provide both options: columnar and row format storages.

Generally, relational DB stores are in-memory row-based data, in which a key in the first
column of the row is at a memory address, and values in successive columns at
successive memory addresses. That makes OLTP easier. All fields of a row are accessed
at a time together during OLTP. Different rows are stored in different addresses in the
memory or disk. In-memory row-based DB stores a row as a consecutive memory or disk
entry. This strategy makes data searching and accessing faster during transactions
processing.

In-memory column-based data has the keys (row-head keys) in the first column of each
row at successive memory addresses. The next column of each row after the key has the
values at successive memory addresses.
Column Family Store Columnar Data Store A way to implement a schema is the divisions
into columns. Storage of each column, successive values is at the successive memory
addresses. Analytics processing (AP) In-memory uses columnar storage in memory. A
pair of row-head and column-head is a key-pair. The pair accesses a field in the table.

Column-Family Data Store Column-family data-store has a group of columns as a column

family. A combination of row-head, column-family head and table• column head can also
be a key to access a field in a column of the table during querying.

Columnar family data store imbibes characteristics of very high performance and
scalability, moderate level of flexibility and lower complexity when compared to the
object and graph databases.

Examples of widely used column-family data store are Google's BigTable, HBase and
Cassandra.

Graph Database

One way to implement a data store is to use a graph database. A characteristic of graphs
is high flexibility. Any number of nodes and any number of edges can be added to
expand a graph. The complexity is high and the performance is variable with scalability.

Graph databases enable fast network searches. Graph uses linked datasets, such as
social media data. Data store uses graphs with nodes and edges connecting each other
through relations, associations and properties. Querying for data uses graph traversal
along the paths. Traversal may use single-step, path expressions or full recursion. A
relationship represents a key. A node possesses property including ID. An edge may
have a label which may specify a role. Characteristics of graph databases are:

1. Use specialized query languages, such as RDF uses SPARQL

2. Create a database system which models the data in a completely different way than the
key-values, document, columnar and object data store models.

3. Can have hyper-edges. A hyper-edge is a set of vertices of a hypergraph. A hypergraph

is a generalization of a graph in which an edge can join any number of vertices (not only
the neighboring vertices).

Typical uses of graph databases are: (i) link analysis, (ii) friend of friend queries, (iii)
Rules and inference, (iv) rule induction and (v) Pattern matching.
Examples of graph DBs are Neo4J, AllegroGraph, HyperGraph, Infinite Graph, Titan and
FlockDB.Neo4Jgraph database enable easy usages by Java developers. Neo4J can be
designed fully ACID rules compliant.

Using NoSQL to manage big data

Characteristics of Big Data NoSQL solution are:

1. High and easy scalability: NoSQL data stores are designed to expand horizontally.
Horizontal scaling means that scaling out by adding more machines as data nodes
(servers) into the pool of resources (processing, memory, network connections). The
design scales out using multi-utility cloud services.

2. Support to replication: Multiple copies of data store across multiple nodes of a cluster.
This ensures high availability, partition, reliability and fault tolerance.

3. Distributable: Big Data solutions permit sharding and distributing of shards on

multiple clusters which enhances performance and throughput.

4. Usages of NoSQL servers which are less expensive. NoSQL data stores require less
management efforts. It supports many features like automatic repair, easier data
distribution and simpler data models that makes database administrator (DBA) and
tuning requirements less stringent.

5. Usages of open-source tools: NoSQL data stores are cheap and open source.
Database implementation is easy and typically uses cheap servers to manage the
exploding data and transaction while RDBMS databases are expensive and use big
servers and storage systems. So, cost per gigabyte data store and processing of that
data can be many times less than the cost ofRDBMS.

6. Support to schema-less data model: NoSQL data store is schema less, so data can be
inserted in a NoSQL data store without any predefined schema. So, the format or data
model can be changed any time, without disruption of application. Managing the changes
is a difficult problem in SQL.

7. Support to integrated caching: NoSQL data store support the caching in system
memory. That increases output performance. SQL database needs a separate
infrastructure for that.

8. No inflexibility unlike the SQL/RD

MongoDB
MongoDB is an open-source document database that provides high performance, high
availability, and automatic scaling.
In simple words, you can say that - Mongo DB is a document-oriented database. It is an
open source product, developed and supported by a company named 10gen.
MongoDB is available under General Public license for free, and it is also available under
Commercial license from the manufacturer.
The manufacturing company 10gen has defined MongoDB as:
"MongoDB is a scalable, open source, high performance, document-oriented database." -
10gen
MongoDB was designed to work with commodity servers. Now it is used by the company
of all sizes, across all industries.

Features of MongoDB

1. Support ad hoc queries

In MongoDB, you can search by field, range query and it also supports regular expression searches.
2. Indexing
You can index any field in a document.
3. Replication
MongoDB supports Master Slave replication.
A master can perform Reads and Writes and a Slave copies data from the master and can only be used for
reads or back up (not writes)
4. Duplication of data
MongoDB can run over multiple servers. The data is duplicated to keep the system up and also keep its
running condition in case of hardware failure.
5. Load balancing
It has an automatic load balancing configuration because of data placed in shards.
6. Supports map reduce and aggregation tools.
7. Uses JavaScript instead of Procedures.
8. It is a schema-less database written in C++.
9. Provides high performance.

Cassandra

Apache Cassandra is an open source, distributed and decentralized/distributed storage

system (database), for managing very large amounts of structured data spread out
across the world. It provides highly available service with no single point of failure.
Listed below are some of the notable points of Apache Cassandra −
It is scalable, fault-tolerant, and consistent.
It is a column-oriented database.
Its distribution design is based on Amazon’s Dynamo and its data model on Google’s
Bigtable.
Created at Facebook, it differs sharply from relational database management systems.
Cassandra implements a Dynamo-style replication model with no single point of failure,
but adds a more powerful “column family” data model.
Cassandra is being used by some of the biggest companies such as Facebook, Twitter,
Cisco, Rackspace, ebay, Twitter, Netflix, and more.

Features of Cassandra
Cassandra has become so popular because of its outstanding technical features. Given
below are some of the features of Cassandra:

Elastic scalability − Cassandra is highly scalable; it allows to add more hardware to

accommodate more customers and more data as per requirement.

Always on architecture − Cassandra has no single point of failure and it is continuously

available for business-critical applications that cannot afford a failure.

Fast linear-scale performance − Cassandra is linearly scalable, i.e., it increases your

throughput as you increase the number of nodes in the cluster. Therefore it maintains a
quick response time.
Flexible data storage − Cassandra accommodates all possible data formats including:

structured, semi-structured, and unstructured. It can dynamically accommodate changes

to your data structures according to your need.
Easy data distribution − Cassandra provides the flexibility to distribute data where you

need by replicating data across multiple data centers.

Transaction support − Cassandra supports properties like Atomicity, Consistency,

Isolation, and Durability (ACID).

Fast writes − Cassandra was designed to run on cheap commodity hardware. It performs

blazingly fast writes and can store hundreds of terabytes of data, without sacrificing the
read efficiency.
GraphQL

GraphQL is an open-source query language and runtime for APIs, developed by

Facebook in 2012. It is designed to provide a more efficient, powerful, and flexible
alternative to traditional REST APIs. it allows clients to request only the data they need,
reducing the amount of data transferred over the network and improving performance.

Some major features of GraphQL are:

1. It allows for more flexible and dynamic APIs, as clients can modify their
queries as needed without requiring changes to the server-side code.

2. GraphQL provides a structured way to define the schema of an API and the
types of data that can be queried which allows for easy documentation and better tooling
support for clients and servers.

3. Is its ability to handle nested and complex queries, which can be

challenging with traditional REST APIs. With GraphQL, clients can specify complex
nested queries with multiple levels of depth, and the server can respond with the
requested data in a single response.

4. GraphQL enables a consistent and predictable API interface, regardless of

changes in the underlying data model. This is made possible by using a version-less API,
allowing for easier evolution and iteration of APIs over time without breaking existing
client applications.

AZ-900 Cheatsheet
100% (13)
AZ-900 Cheatsheet
22 pages
Unit 2
No ratings yet
Unit 2
26 pages
Unit 2
No ratings yet
Unit 2
65 pages
Unit 2 Handouts
No ratings yet
Unit 2 Handouts
11 pages
Dbms Presentation
No ratings yet
Dbms Presentation
22 pages
Unit No - 6 Bda
No ratings yet
Unit No - 6 Bda
16 pages
NOSQL, Graph Databases & Cypher
No ratings yet
NOSQL, Graph Databases & Cypher
78 pages
Unit 3 NoSQL
No ratings yet
Unit 3 NoSQL
98 pages
NoSQL Database
No ratings yet
NoSQL Database
10 pages
CS8091-BIG DATA ANALYTICS UNIT V Notes
100% (4)
CS8091-BIG DATA ANALYTICS UNIT V Notes
31 pages
Big Data Unit 3
No ratings yet
Big Data Unit 3
374 pages
Lecture 3.1.2
No ratings yet
Lecture 3.1.2
47 pages
Nosql
No ratings yet
Nosql
20 pages
MongoDB Slides Until ClassTest
No ratings yet
MongoDB Slides Until ClassTest
221 pages
No SQL
No ratings yet
No SQL
12 pages
BD Unit 4
No ratings yet
BD Unit 4
45 pages
Lecture 1 - NoSQL
No ratings yet
Lecture 1 - NoSQL
31 pages
NOsql Presentation
No ratings yet
NOsql Presentation
20 pages
Lecture 1
No ratings yet
Lecture 1
31 pages
Cs 620 / Dasc 600 Introduction To Data Science & Analytics: Lecture 6-Nosql
No ratings yet
Cs 620 / Dasc 600 Introduction To Data Science & Analytics: Lecture 6-Nosql
31 pages
Nosql PDF
No ratings yet
Nosql PDF
21 pages
Bda Unit-5 PDF
No ratings yet
Bda Unit-5 PDF
83 pages
Case Study On Different Nosql Data Models
No ratings yet
Case Study On Different Nosql Data Models
6 pages
10 Nosql
No ratings yet
10 Nosql
23 pages
DBMS 11
No ratings yet
DBMS 11
13 pages
Unit V Big Data Frameworks
No ratings yet
Unit V Big Data Frameworks
42 pages
Module 3 Bigdata Analytics
No ratings yet
Module 3 Bigdata Analytics
19 pages
BDA Module 5 - Part1 (No SQL) 2023
No ratings yet
BDA Module 5 - Part1 (No SQL) 2023
32 pages
NOSQL Lecture 1 Notes
No ratings yet
NOSQL Lecture 1 Notes
31 pages
No SQL
No ratings yet
No SQL
38 pages
Bda Unit-2
No ratings yet
Bda Unit-2
29 pages
No SQL
No ratings yet
No SQL
32 pages
cp5293 Big Data Analytics Unit 5 PDF
No ratings yet
cp5293 Big Data Analytics Unit 5 PDF
28 pages
Nosql Module 1
No ratings yet
Nosql Module 1
23 pages
Nosql and Data Scalability 2.0: Amazon Dynamodb
No ratings yet
Nosql and Data Scalability 2.0: Amazon Dynamodb
7 pages
Lecture 6 - NoSQL
No ratings yet
Lecture 6 - NoSQL
28 pages
Module 3
No ratings yet
Module 3
39 pages
Full Stack UNIT3
No ratings yet
Full Stack UNIT3
57 pages
3.2NOSQL Categories
No ratings yet
3.2NOSQL Categories
7 pages
BIG Data - Storing Data
No ratings yet
BIG Data - Storing Data
40 pages
3.2NOSQL Categories
No ratings yet
3.2NOSQL Categories
7 pages
NoSQL Tutorial - New
No ratings yet
NoSQL Tutorial - New
10 pages
Types of NoSQL Databases - GeeksforGeeks
No ratings yet
Types of NoSQL Databases - GeeksforGeeks
9 pages
Unit 5 - BD - Storing Data
No ratings yet
Unit 5 - BD - Storing Data
48 pages
Bcse302l Dbms Module-7 Nosql
No ratings yet
Bcse302l Dbms Module-7 Nosql
30 pages
Lec 17 Nosql
No ratings yet
Lec 17 Nosql
19 pages
Bda QB 2
No ratings yet
Bda QB 2
15 pages
BDT Unit 4
No ratings yet
BDT Unit 4
93 pages
Module 5 - NoSQL Databases
No ratings yet
Module 5 - NoSQL Databases
33 pages
NoSQL Lec
No ratings yet
NoSQL Lec
45 pages
Chapter 5c
No ratings yet
Chapter 5c
18 pages
Nosql Tricks
No ratings yet
Nosql Tricks
34 pages
CloudComputing DATABASE
No ratings yet
CloudComputing DATABASE
27 pages
Chap 4
No ratings yet
Chap 4
18 pages
Lecture 9 Chapter 5 Part 5 Big Data Storage Concepts
No ratings yet
Lecture 9 Chapter 5 Part 5 Big Data Storage Concepts
15 pages
Big Data Analytics Unit-2
No ratings yet
Big Data Analytics Unit-2
30 pages
Nosql, Mongodb
No ratings yet
Nosql, Mongodb
18 pages
BDA (2) Merged
No ratings yet
BDA (2) Merged
29 pages
Chapter 3 NoSQL Database
No ratings yet
Chapter 3 NoSQL Database
47 pages
Bda CHP 3
No ratings yet
Bda CHP 3
75 pages
Databases: System Concepts, Designs, Management, and Implementation
From Everand
Databases: System Concepts, Designs, Management, and Implementation
Jonathan Rigdon
No ratings yet
Infix To Postfix
No ratings yet
Infix To Postfix
3 pages
Asian International Private School, Ruwais Physics-Class Test-4
No ratings yet
Asian International Private School, Ruwais Physics-Class Test-4
1 page
Asian International Private School, Ruwais Physics-Class Test-3 Time: 40 Min Class: Xii B Capacitance Marks: 15
No ratings yet
Asian International Private School, Ruwais Physics-Class Test-3 Time: 40 Min Class: Xii B Capacitance Marks: 15
1 page
Asian International Private School, Ruwais Physics-Class Test-4
No ratings yet
Asian International Private School, Ruwais Physics-Class Test-4
1 page
laser Security System
No ratings yet
laser Security System
13 pages
List
No ratings yet
List
1 page
Science Imp. Questions
No ratings yet
Science Imp. Questions
1 page
XtremIO XIOS Ver 6-0-1-27 RN 302-004-386 Rev-02
No ratings yet
XtremIO XIOS Ver 6-0-1-27 RN 302-004-386 Rev-02
25 pages
Percona Mongo-Upgrade Best Practices
No ratings yet
Percona Mongo-Upgrade Best Practices
17 pages
Operating Systems and Background
No ratings yet
Operating Systems and Background
56 pages
Mastering OpenStack - Sample Chapter
No ratings yet
Mastering OpenStack - Sample Chapter
43 pages
Final Hackathon
No ratings yet
Final Hackathon
9 pages
Critical Success Factors For BI Systems - Yeoh y Koronios 2010
No ratings yet
Critical Success Factors For BI Systems - Yeoh y Koronios 2010
10 pages
Proposal Draft 2
No ratings yet
Proposal Draft 2
7 pages
MODULE-3 Notes
100% (1)
MODULE-3 Notes
4 pages
Hadoop & HDFS Final
No ratings yet
Hadoop & HDFS Final
31 pages
Khushboo Komal FullStackPythonDeveloper
No ratings yet
Khushboo Komal FullStackPythonDeveloper
3 pages
Heatwave-En A4
No ratings yet
Heatwave-En A4
282 pages
CCSK Practice Questions
No ratings yet
CCSK Practice Questions
78 pages
Advance Computing Technology 170704 - ACT - CH - 1 - PPT
No ratings yet
Advance Computing Technology 170704 - ACT - CH - 1 - PPT
52 pages
Unit 1 Notes2
No ratings yet
Unit 1 Notes2
15 pages
Resume With Photo 16120
No ratings yet
Resume With Photo 16120
1 page
AI Powered Task Manger
100% (1)
AI Powered Task Manger
6 pages
AWS Cloud Practitioner
100% (2)
AWS Cloud Practitioner
147 pages
P D Group2-2
No ratings yet
P D Group2-2
6 pages
.Sree Lakshmi Kolagotla
No ratings yet
.Sree Lakshmi Kolagotla
2 pages
BP 2097 Sap Hana PDF
No ratings yet
BP 2097 Sap Hana PDF
17 pages
Aws Word File
No ratings yet
Aws Word File
33 pages
Blockchain Engineering Playbook 1686012884
100% (1)
Blockchain Engineering Playbook 1686012884
130 pages
Rekognition DG
No ratings yet
Rekognition DG
449 pages
Replies of Pre-Bid
No ratings yet
Replies of Pre-Bid
66 pages
DSSDI Case Study v1.0
No ratings yet
DSSDI Case Study v1.0
15 pages
StoreOnce Technical Overview - June 2015
No ratings yet
StoreOnce Technical Overview - June 2015
388 pages
Poweredge r640 Spec Sheet
No ratings yet
Poweredge r640 Spec Sheet
2 pages
Rohan
No ratings yet
Rohan
54 pages
Project Synopsis Major
No ratings yet
Project Synopsis Major
13 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Unit 4

Uploaded by

Unit 4

Uploaded by

Unit-4

NoSQL is an altogether new approach of thinking about databases, such as schema

NOSQL DATA ARCHITECTURE PATTERNS

Document Data Store

Column-Family Data Store Column-family data-store has a group of columns as a column

1. Use specialized query languages, such as RDF uses SPARQL

3. Can have hyper-edges. A hyper-edge is a set of vertices of a hypergraph. A hypergraph

Using NoSQL to manage big data

Characteristics of Big Data NoSQL solution are:

3. Distributable: Big Data solutions permit sharding and distributing of shards on

8. No inflexibility unlike the SQL/RD

1. Support ad hoc queries

Apache Cassandra is an open source, distributed and decentralized/distributed storage

Elastic scalability − Cassandra is highly scalable; it allows to add more hardware to

accommodate more customers and more data as per requirement.

available for business-critical applications that cannot afford a failure.

structured, semi-structured, and unstructured. It can dynamically accommodate changes

need by replicating data across multiple data centers.

Isolation, and Durability (ACID).

GraphQL is an open-source query language and runtime for APIs, developed by

Some major features of GraphQL are:

3. Is its ability to handle nested and complex queries, which can be

4. GraphQL enables a consistent and predictable API interface, regardless of

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Unit 4

Uploaded by

Unit 4

Uploaded by

Unit-4

NoSQL is an altogether new approach of thinking about databases, such as schema

NOSQL DATA ARCHITECTURE PATTERNS

Document Data Store

Column-Family Data Store Column-family data-store has a group of columns as a column

1. Use specialized query languages, such as RDF uses SPARQL

3. Can have hyper-edges. A hyper-edge is a set of vertices of a hypergraph. A hypergraph

Using NoSQL to manage big data

Characteristics of Big Data NoSQL solution are:

3. Distributable: Big Data solutions permit sharding and distributing of shards on

8. No inflexibility unlike the SQL/RD

1. Support ad hoc queries

Apache Cassandra is an open source, distributed and decentralized/distributed storage

​ Elastic scalability − Cassandra is highly scalable; it allows to add more hardware to

accommodate more customers and more data as per requirement.

available for business-critical applications that cannot afford a failure.

structured, semi-structured, and unstructured. It can dynamically accommodate changes

need by replicating data across multiple data centers.

Isolation, and Durability (ACID).

GraphQL is an open-source query language and runtime for APIs, developed by

Some major features of GraphQL are:

3. Is its ability to handle nested and complex queries, which can be

4. GraphQL enables a consistent and predictable API interface, regardless of

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Elastic scalability − Cassandra is highly scalable; it allows to add more hardware to