0% found this document useful (0 votes)

58 views

4.2 NoSQL Databases UNIT-1

nosql unit 1

Uploaded by

LIKHITH SAI RAM

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

58 views

4.2 NoSQL Databases UNIT-1

nosql unit 1

Uploaded by

LIKHITH SAI RAM

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 35

NOSQL DATABASES

VISWANADH. MNV
UNIT-1 NoSQL, AGGREGATE DATA MODEL, Data Models
1.1 NoSQL
1.1.1 Introduction to NOSQL and Why NoSQL

SQL:
SQL database is a digital system that stores and organizes information using tables,
columns, and rows. It allows users to easily manage, retrieve, and manipulate data
using a language called SQL (Structured Query Language). SQL databases are commonly
used in applications where structured data storage and retrieval are essential, such as
in websites, business applications, and data analysis systems.

NOSQL Databases
A NoSQL database is a type of database system that provides a flexible way to store and
manage data, diverging from the rigid structure of traditional SQL databases. NoSQL
databases can handle large volumes of unstructured or semi-structured data more
effectively, offering different data models such as document stores, key-value pairs,
column-family stores, and graph databases. They are commonly used in applications
requiring scalability, high availability, and dynamic data schemas, such as web
applications, big data processing, and real-time analytics.

SQL vs NOSQL Databases

Feature SQL NOSQL

DB Type Relational Non-Relational

Server Type Centralized Distributed

Data Integrity Yes No

Scaling Vertical Horizontal &

Vertical

Data Type Structured UnStructured /

Semistructured

Normalization Yes No

ACID Properties Yes No

High Availability No Yes

Eventual No Yes
Consistency

1
When to Use SQL Databases
● SQL databases are suited for applications where the integrity of the data is
important.
● If you have an application that handles critical data like financial information
then prefer SQL Database
● you should use a relational database in order to be sure that any query you
make will get you the correct response and that you will not accidentally lose any
data. In this case, you want to have the maximum consistency, possibly by
sacrificing a level of availability in comparison to NoSQL.
● When your data is structured enough and can be relatively easily organized into
schemas you can choose a SQL database because it is a natural fit for structured
data.
● If the database schema is well designed you can reduce data redundancy
through normalization by abstracting out duplicated information then SQL
Database is used. This will also help improve the quality of your data.

When to Use NoSQL Databases

● If you develop an application that handles a large number of simultaneous
requests from multiple clients and has to maintain a high level of availability,
then you might want to use a NoSQL database.
● Application like a live chat app or message queue that has to instantly respond to
the client without latency.
● By using a NoSQL database you can be confident that you can
● if you want to achieve a low response time and in the worst case you might lose
a message that has not been written to the database then NOSQL.
● If you can afford to sacrifice a level of data consistency in favor of high
availability, you should definitely choose a NoSQL database.
● When you have to deal with big data that is not so well structured you may also
choose to use a NoSQL database.
● This will allow you to easily scale horizontally, in contrast to relational databases
where this is not as easy. NoSQL databases offer you the choice to distribute
your data across multiple servers, a fact that allows some NoSQL systems to
easily recover from unexpected crashes by having no single point of failure.
● When you want to perform complex and fast search queries on huge amounts of
data, Elasticsearch is a NoSQL solution that fits that scenario. An example of such

2
an application might be a text search app that needs to perform search queries
to find a specific term through thousands or millions of documents.
● Given their flexibility, NoSQL databases may also be useful when developing
prototypes and MVPs. Time can be saved in not designing a schema, and
analyzing the collected data can help guide the schema design process for the
final product.

1.1.2 The Value of Relational Databases

Relational databases have become such an embedded part of our computing
culture that it’s easy to take them for granted. It’s therefore useful to revisit the benefits
they provide.

Getting at Persistent Data

● Persistence refers to the ability to store data permanently even after a system
restart.
● Probably the most obvious value of a database is keeping large amounts of
persistent data.
● Most computer architectures have the notion of two areas of memory: a fast
volatile “main memory” and a larger but slower “backing store.”
● Main memory is both limited in space and loses all data when you lose power or
something bad happens to the operating system.Therefore, to keep data
around, we write it to a backing store, commonly seen as a disk (although these
days that disk can be persistent memory).
● The backing store can be organized in all sorts of ways. For many productivity
applications (such as word processors), it’s a file in the file system of the
operating system.
● For most enterprise applications, however, the backing store is a database.The
database allows more flexibility than a file system in storing large amounts of
data in a way that allows an application program to get at small bits of that
information quickly and easily.

3
Concurrency
● Enterprise applications tend to have many people looking at the same body of
data at once, possibly modifying that data.
● Most of the time they are working on different areas of that data, but
occasionally they operate on the same bit of data. As a result, we have to worry
about coordinating these interactions to avoid such things as double booking of
hotel rooms.
● Concurrency is notoriously difficult to get right, with all sorts of errors that can
trap even the most careful programmers. Since enterprise applications can have
lots of users and other systems all working concurrently, there’s a lot of room
for bad things to happen. Relational databases help handle this by controlling all
access to their data through transactions.
● While this isn’t a cure-all (you still have to handle a transactional error when you
try to book a room that’s just gone), the transactional mechanism has worked
well to contain the complexity of concurrency.
● Transactions also play a role in error handling.With transactions, you can make a
change, and if an error occurs during the processing of the change you can roll
back the transaction to clean things up.
Example

4
Concurrent transactions (where few operations of transaction T1 are executed, the T2 and
again the remaining operations of T1).

Integration
● Enterprise applications live in a rich ecosystem that requires multiple
applications, written by different teams, to collaborate in order to get things
done.
● This kind of inter-application collaboration is awkward because it means pushing
the human organizational boundaries.Applications often need to use the same
data and updates made through one application have to be visible to others.
● A common way to do this is shared database integration where multiple
applications store their data in a single database. Using a single database allows
all the applications to use each others’ data easily, while the database’s
concurrency control handles multiple applications in the same way as it handles
multiple users in a single application.

A Standard Model
● Relational databases have succeeded because they provide the core benefits we
outlined earlier in a (mostly) standard way.
● As a result, developers and database professionals can learn the basic relational
model and apply it in many projects.
● Although there are differences between different relational databases, the core
mechanisms remain the same: Different vendors’ SQL dialects are similar,
transactions operate in mostly the same way.

1.1.3 Impedance Mismatch

5
● For application developers, the biggest problem has been what’s commonly
called the impedance mismatch:
● The difference between the relational model and the in-memory data structures.
● The relational data model organizes data into a structure of tables and rows, or
more properly, relations and tuples. In the relational model, a tuple is a set of
name-value pairs and a relation is a set of tuples.
● All operations in SQL consume and return relations, which leads to the
mathematically elegant relational algebra.
● The application layer of an application is typically written in an object-oriented
language. However, the object-oriented and the relational data model don't fit
well together.
● In the object-oriented world you have objects that are connected via references.
They build an object hierarchy or graph. Contrarily, the relational model saves
data in two-dimensional tables with rows for each entry and columns for the
entry’s properties.
● If you want to store your object graph in a relational database, you have to slice
and flatten your object graph until it fits into multiple normalized tables. This is
complex and unnatural (following the OO notion).
● Moreover, if you want to recover the objects you have to join several tables,
which can lead to complex queries and performance issues.
Consider the following example:

Comparing the object-oriented and the relational data model. These two worlds doesn’t fit
together naturally.

6
The Customer Karl has references to two BankAccount objects and to two
Address objects. In the schema, there are three tables for each class (Customers,
Addresses, BankAccounts) and each table is filled with the corresponding data.
Furthermore the entries for the addresses and the bank accounts have a foreign key
pointing to the entry in the Customer table. It is remarkable that the direction of the
relationship in the relational model is the reverse of the original one. That is why I call
the relation model unnatural for the object-oriented developer. Moreover, the data
distribution over several tables gets even more complicated when there are
intermediate tables necessary (for n:m relationships).

1.1.4 Application and Integration Databases

Integration Databases:
● There are downsides to shared database integration.
● A structure that’s designed to integrate many applications ends up being more
complex
● Furthermore, should an application want to make changes to its data storage, it
needs to coordinate with all the other applications using the database.
● Different applications have different structural and performance needs, so an
index required by one application may cause a problematic hit on inserts for
another.
● The fact that each application is usually a separate team also means that the
database usually cannot trust applications to update the data in a way that
preserves database integrity and thus needs to take responsibility for that within
the database itself.

7
Application Databases
● A different approach is to treat your database as an application database which
is only directly accessed by a single application codebase that’s looked after by a
single team.
● With an application database, only the team using the application needs to know
about the database structure, which makes it much easier to maintain and
evolve the schema.
● Since the application team controls both the database and the application code,
the responsibility for database integrity can be put in the application code.

1.1.5 Attack of the Clusters

8
● In 2000 several large web properties increased in large scale. This increase in
scale was happening along many dimensions. Websites started tracking activity
and structure in a very detailed way . Large sets of data appeared: links, social
networks, activity in logs, mapping data.
● With this growth in data came a growth in users—as the biggest websites grew to
be vast estates regularly serving huge numbers of visitors.
● Coping with the increase in data and traffic required more computing resources.
To handle this kind of increase, two choices: up or out.
● Scaling up implies bigger machines,more processors, disk storage, and memory.
● But bigger machines get more and more expensive, not to mention that there
are real limits as your size increases.
● Cluster: The alternative is to use lots of small machines in cluster. A cluster of
small machines can use commodity hardware and ends up being cheaper at
these kinds of scales. • It can also be more resilient—while individual machine
failures are common, the overall cluster can be built to keep going despite such
failures, providing high reliability.

● As large properties moved towards clusters, that revealed a new problem:

relational databases are not designed to be run on clusters.
● Clustered relational databases, such as the Oracle RAC or Microsoft SQL Server,
work on the concept of a shared disk subsystem.
● They use a cluster-aware file system that writes to a highly available disk
subsystem— but this means the cluster still has the disk subsystem as a single
point of failure. Relational databases could also be run as separate servers for
different sets of data, effectively sharding the database.

9
● While this separates the load, all the sharding has to be controlled by the
application which has to keep track of which database server to talk to for each
bit of data. Also, we lose any querying, referential integrity, transactions, or
consistency controls that cross shards.

1.1.6 The Emergence of NoSQL

● Origin of "NoSQL" - First appeared in the late 90s with the Strozzi NoSQL
database. - Named because it didn't use SQL; data was manipulated via shell
scripts.
● Modern Use of "NoSQL"- Popularized in 2009 during a meetup in San Francisco
organized by Johan Oskarsson. - Inspired by projects like BigTable and Dynamo.
- Named by Eric Evans as a Twitter-friendly hashtag.
● Characteristics of NoSQL Databases - Do not use SQL; may have SQL-like
query languages (e.g., Cassandra’s CQL).
● Applications and Benefits - Suitable for handling large-scale data that requires
clustering. - Enhances application development productivity through easier data
interaction.- Advocates for application databases over integration databases for
encapsulating data in services.
● Future of Databases - NoSQL has expanded the range of data storage options.
Organizations will likely use a mix of data storage technologies based on their
specific requirements.

Key Points
● Relational databases have been a successful technology for twenty years,
providing persistence, concurrency control, and an integration mechanism.
● Application developers have been frustrated with the impedance mismatch
between the relational model and the in-memory data structures.
● There is a movement away from using databases as integration points towards
encapsulating databases within applications and integrating through services.
● The vital factor for a change in data storage was the need to support large
volumes of data by running on clusters. Relational databases are not designed to
run efficiently on clusters.
● NoSQL is an accidental neologism. There is no prescriptive definition—all you can
make is an observation of common characteristics.
● The common characteristics of NoSQL databases are
○ Not using the relational model

10
○ Running well on clusters
○ Open-source
○ Built for the 21st century web estates
○ Schema less
● The most important result of the rise of NoSQL is Polyglot Persistence.

11
1.2 AGGREGATE DATA MODEL
1.2.1 Introduction
● Data Model: How we view and interact with database data.
● Types of Data Models
Relational Data Model**:
○ Uses tables (like spreadsheets).
○ Rows represent entities; columns represent attributes.
○ Relationships are defined by linking rows from different tables.
NoSQL Data Models or Aggregate Data Models
● Aggregate means a collection of objects that are treated as a unit. In NoSQL
Databases, an aggregate is a collection of data that interact as a unit. Moreover,
these units of data or aggregates of data form the boundaries for the ACID
operations.
● Aggregate Data Models in NoSQL make it easier for the Databases to manage
data storage over the clusters as the aggregate data or unit can now reside on
any of the machines. Whenever data is retrieved from the Database all the data
comes along with the Aggregate Data Models in NoSQL.
● Aggregate Data Models in NoSQL don’t support ACID transactions and sacrifice
one of the ACID properties. With the help of Aggregate Data Models in NoSQL,
you can easily perform OLAP operations on the Database.
● You can achieve high efficiency of the Aggregate Data Models in the NoSQL
Database if the data transactions and interactions take place within the same
aggregate.

1.2.2 Aggregates
● Definition: Aggregate means a collection of objects that are treated as a unit. In
NoSQL Databases, an aggregate is a collection of data that interact as a unit.
Moreover, these units of data or aggregates of data form the boundaries for the
ACID operations.
● Aggregate: A complex record with nested structures (lists, other records).
● The aggregate-Oriented database is the NoSQL database which does not support
ACID transactions and they sacrifice one of the ACID properties. Aggregate
orientation operations are different compared to relational database operations.
We can perform OLAP operations on the Aggregate-Oriented database.
● The efficiency of the Aggregate-Oriented database is high if the data transactions
and interactions take place within the same aggregate. Several fields of data can
be put in the aggregates such that they can be commonly accessed together. We

12
can manipulate only a single aggregate at a time. We can not manipulate
multiple aggregates at a time in an atomic way.
● Aggregate – Oriented databases are classified into four major data models. They
are as follows:
○ Key-value
○ Document
○ Column family
○ Graph-based

Example of Relations and Aggregates

This example of the E-Commerce Data Model has two main aggregates – customer and
order. The customer contains data related to billing addresses while the order
aggregate consists of ordered items, shipping addresses, and payments. The payment
also contains the billing address.

Figure 2.3. An aggregate data model

Here in the diagram have two Aggregate:
● Customer and Orders link between them represent an aggregate.
● The diamond shows how data fit into the aggregate structure.
● Customer contains a list of billing address
● Payment also contains the billing address
● The address appears three times and it is copied each time

13
● The domain is fit where we don’t want to change shipping and billing address.

If you notice a single logical address record appears 3 times in the data, but its value is
copied each time wherever used. The whole address can be copied into an aggregate as
needed. There is no pre-defined format to draw the aggregate boundaries. It solely
depends on whether you want to manipulate the data as per your requirements.

The Data Model for customer and order would look like this.

// in customers
{
"customer": {
"id": 1,
"name": "Martin",
"billingAddress": [{"city": "Chicago"}],
"orders": [
{
"id":99,
"customerId":1,
"orderItems":[
{
"productId":27,
"price": 32.45,
"productName": "NoSQL Distilled"
}
],
"shippingAddress":[{"city":"Chicago"}],
"orderPayment":[
{
"ccinfo":"1000-1000-1000-1000",
"txnId":"abelif879rft",
"billingAddress": {"city": "Chicago"}
}],
}]
}
}

14
In these Aggregate Data Models in NoSQL, if you want to access a customer along with
all customer’s orders at once. Then designing a single aggregate is preferable. But if you
want to access a single order at a time, then you should have separate aggregates for
each order. It is very content-specific.

1.2.3 Consequences of Aggregate Orientation

● Aggregation is not a logical data property It is all about how the data is being
used by applications.
● An aggregate structure may be an obstacle for others but help with some data
interactions.
● It has an important consequence for transactions.
● NoSQL databases don’t support ACID transactions thus sacrificing consistency.
● aggregate-oriented databases support the atomic manipulation of a single
aggregate at a time.

Advantage:
● It can be used as a primary data source for online applications.
● Easy Replication.
● No single point Failure.
● It provides fast performance and horizontal Scalability.
● It can handle Structured semi-structured and unstructured data with equal
effort.

Disadvantage:
● No standard rules.
● Limited query capabilities.
● Doesn’t work well with relational data.
● Not so popular in the enterprise.
● When the value of data increases it is difficult to maintain unique values.

1.2.4 Key-Value and Document Data Models

Key-Value Data Model in NoSQL
A key-value data model or database is also referred to as a key-value store. It is a
non-relational type of database. In this, an associative array is used as a basic database
in which an individual key is linked with just one value in a collection. For the values,
keys are special identifiers. Any kind of entity can be valued. The collection of key-value

15
pairs stored on separate records is called key-value databases and they do not have an
already defined structure.

Example of a key-value data model

How do key-value databases work?

A number of easy strings or even a complicated entity are referred to as a value
that is associated with a key by a key-value database, which is utilized to monitor the
entity. Like in many programming paradigms, a key-value database resembles a map
object or array, or dictionary, however, which is put away in a tenacious manner and
controlled by a DBMS.
An efficient and compact structure of the index is used by the key-value store to
have the option to rapidly and dependably find value using its key. For example, Redis is
a key-value store used to tracklists, maps, heaps, and primitive types (which are simple
data structures) in a constant database. Redis can uncover a very basic point of
interaction to query and manipulate value types, just by supporting a predetermined
number of value types, and when arranged, is prepared to do high throughput.

When to use a key-value database:

Here are a few situations in which you can use a key-value database:-
● User session attributes in an online app like finance or gaming, which is referred
to as real-time random data access.

16
● Caching mechanism for repeatedly accessing data or key-based design.
● The application is developed on queries that are based on keys.
Features:
● One of the most un-complex kinds of NoSQL data models.
● For storing, getting, and removing data, key-value databases utilize simple
functions.
● Querying language is not present in key-value databases.
● Built-in redundancy makes this database more reliable.
Advantages:
● It is very easy to use. Due to the simplicity of the database, data can accept any
kind, or even different kinds when required.
● Its response time is fast due to its simplicity, given that the remaining
environment near it is very much constructed and improved.
● Key-value store databases are scalable vertically as well as horizontally.
● Built-in redundancy makes this database more reliable.
Disadvantages:
● As querying language is not present in key-value databases, transportation of
queries from one database to a different database cannot be done.
● The key-value store database is not refined. You cannot query the database
without a key.
Some examples of key-value databases:
Here are some popular key-value databases which are widely used:
● Redis: The key-value database which is so popular mostly used
● Amazon DynamoDB: The key-value database which is mostly used in AWS is
Amazon DynamoDB.It can easily handle a large number of requests every day
and it also provides various security options.
● Riak: It is the database used to develop applications.
Document Data Model:
● A Document Data Model is a lot different than other data models because it
stores data in JSON, BSON, or XML documents.
● In this data model, we can move documents under one document and apart
from this, any particular elements can be indexed to run queries faster.
● Often documents are stored and retrieved in such a way that it becomes close to
the data objects which are used in many applications which means very less
translations are required to use data in applications. JSON is a native language
that is often used to store and query data too.

17
● So in the document data model, each document has a key-value pair below is an
example for the same.
{
"Name" : "abc",
"Address" : "Narsapur",
"Email" : "abc@gmail.com",
"Contact" : "12345"
}

Working of Document Data Model:

This is a data model which works as a semi-structured data model in which the records
and data associated with them are stored in a single document which means this data
model is not completely unstructured. The main thing is that data here is stored in a
document.
Features:
● Document Type Model: As we all know data is stored in documents rather than
tables or graphs, so it becomes easy to map things in many programming
languages.
● Flexible Schema: Overall schema is very much flexible to support this statement
one must know that not all documents in a collection need to have the same
fields.
● Distributed and Resilient: Document data models are very much dispersed
which is the reason behind horizontal scaling and distribution of data.
● Manageable Query Language: These data models are the ones in which query
language allows the developers to perform CRUD (Create Read Update Destroy)
operations on the data model.
Examples of Document Data Models :
● Amazon DocumentDB
● MongoDB
● CouchDB
Advantages:
● Schema-less: These are very good in retaining existing data at massive volumes
because there are absolutely no restrictions in the format and the structure of
data storage.
● Faster creation of document and maintenance: It is very simple to create a
document and apart from this maintenance requires almost nothing.

18
● Open formats: It has a very simple build process that uses XML, JSON, and its
other forms.
● Built-in versioning: It has built-in versioning which means as the documents
grow in size there might be a chance they can grow in complexity. Versioning
decreases conflicts.
Disadvantages:
● Weak Atomicity: It lacks in supporting multi-document ACID transactions. A
change in the document data model involving two collections will require us to
run two separate queries i.e. one for each collection. This is where it breaks
atomicity requirements.
● Consistency Check Limitations: One can search the collections and documents
that are not connected to an author collection but doing this might create a
problem in the performance of database performance.
● Security: Nowadays many web applications lack security which in turn results in
the leakage of sensitive data. So it becomes a point of concern, one must pay
attention to web app vulnerabilities.
Applications of Document Data Model :
● Content Management: These data models are very much used in creating
various video streaming platforms, blogs, and similar services Because each is
stored as a single document and the database here is much easier to maintain as
the service evolves over time.
● Book Database: These are very much useful in making book databases because
as we know this data model lets us nest.
● Catalog: When it comes to storing and reading catalog files these data models
are very much used because they have a fast reading ability if incase Catalogs
have thousands of attributes stored.
● Analytics Platform: These data models are very much used in the Analytics
Platform.

1.2.5 Column-Family Stores

A column store database is a type of database that stores data using a column oriented
model.
A column store database can also be referred to as a:
● Column database
● Column family database
● Column oriented database

19
● Wide column store database
● Wide column store
● Columnar database
● Columnar store

The Structure of a Column Store Database

Columns store databases use a concept called a keyspace. A keyspace is kind of like a
schema in the relational model. The keyspace contains all the column families (kind of
like tables in the relational model), which contain rows, which contain columns.
Like this:

Diagram of a keyspace containing column families.

Here’s a closer look at a column family:

20
Figure: column family containing 3 rows. Each row contains its own set of columns.
As the above diagram shows:
● A column family consists of multiple rows.
● Each row can contain a different number of columns to the other rows. And the
columns don’t have to match the columns in the other rows (i.e. they can have
different column names, data types, etc).
● Each column is contained in its row. It doesn’t span all rows like in a relational
database. Each column contains a name/value pair, along with a timestamp.
Note that this example uses Unix/Epoch time for the timestamp.
Here’s how each row is constructed:

Diagram of rows and columns in a wide column store database.

Here’s a breakdown of each element in the row:

● Row Key. Each row has a unique key, which is a unique identifier for that row.

21
● Column. Each column contains a name, a value, and timestamp.
● Name. This is the name of the name/value pair.
● Value. This is the value of the name/value pair.
● Timestamp. This provides the date and time that the data was inserted. This can
be used to determine the most recent version of data.
Some DBMSs expand on the column family concept to provide extra
functionality/storage ability. For example, Cassandra has the concept of composite
columns, which allow you to nest objects inside a column.

Benefits of Column Store Databases

Some key benefits of columnar databases include:
● Compression. Column stores are very efficient at data compression and/or
partitioning.
● Aggregation queries. Due to their structure, columnar databases perform
particularly well with aggregation queries (such as SUM, COUNT, AVG, etc).
● Scalability. Columnar databases are very scalable. They are well suited to
massively parallel processing (MPP), which involves having data spread across a
large cluster of machines – often thousands of machines.
● Fast to load and query. Columnar stores can be loaded extremely fast. A billion
row table could be loaded within a few seconds. You can start querying and
analyzing almost immediately.
Examples of Column Store DBMSs
● Bigtable
● Cassandra
● HBase
● Vertica
● Druid
● Accumulo
● Hypertable

1.2.6 Summarizing Aggregate-Oriented Databases

● Here three different styles of aggregate-oriented data models are explained and
how they differ.
● What they all share is the notion of an aggregate indexed by a key that you can
use for lookup.

22
● This aggregate is central to running on a cluster, as the database will ensure that
all the data for an aggregate is stored together on one node.
● The aggregate also acts as the atomic unit for updates, providing a useful, if
limited, amount of transactional control.
● Within that notion of aggregate, we have some differences.The key-value data
model treats the aggregate as an opaque whole, which means you can only do
key lookup for the whole aggregate— you cannot run a query nor retrieve a part
of the aggregate.

Key Points
● An aggregate is a collection of data that we interact with as a unit. Aggregates
form the boundaries for ACID operations with the database.
● Key-value, document, and column-family databases can all be seen as forms of
aggregate- oriented databases.
● Aggregates make it easier for the database to manage data storage over clusters.
● Aggregate-oriented databases work best when most data interaction is done
with the same aggregate; aggregate-ignorant databases are better when
interactions use data organized in many different formations.

23
1.3 More Details on Data Models
Introduction
● Aggregates are the main feature. The aggregate-oriented databases model data
using aggregates.
● Aggregates are central and there are also additional data modeling concepts that
exist in NoSQL databases and data is accessed in these models.
● Other data models are- Graph data model, schema less databases, materialized
views

1.3.2 Relationships
● Purpose of Aggregates: - Combine commonly accessed data.
Examples: Customer and their order history.
● Different Access Needs: - Some applications need customer data with order
history (single aggregate). - Others process orders individually (separate
aggregates).
● Linking Separate Aggregates: - Use customer ID in the order's data. - Read
order data to get customer ID, then fetch customer data. - Database won't know
the relationship by default.
● Database Relationship Visibility: - Some databases can show these links. -
Document stores index and query aggregate content. - Key-value stores like Riak
use metadata for links and partial retrieval.
● Handling Updates: - Aggregate-oriented databases: data retrieval unit is the
aggregate. - Atomicity within a single aggregate only. - Relational databases:
support multiple record transactions with ACID guarantees.
● Complexity in Multiple Aggregates: - Harder to operate across multiple
aggregates in aggregate-oriented databases. - Relational databases struggle with
complex relationships and many joins.
● Database Choice: - Relational databases for data with many relationships. -
Aggregate-oriented databases can be awkward with multiple aggregates. -
Consider other NoSQL databases for complex queries and relationships.

1.3.3 Graph Databases

Graph Based Data Model in NoSQL is a type of Data Model which tries to focus on
building the relationship between data elements. As the name suggests Graph-Based
Data Model, each element here is stored as a node, and the association between these
elements is often known as Links. Association is stored directly as these are the

24
first-class elements of the data model. These data models give us a conceptual view of
the data.

These are the data models which are based on topographical network structure.
Obviously, in graph theory, we have terms like Nodes, edges, and properties, let’s see
what it means here in the Graph-Based data model.
● Nodes: These are the instances of data that represent objects which are to be
tracked.
● Edges: As we already know edges represent relationships between nodes.
● Properties: It represents information associated with nodes.
The below image represents Nodes with properties from relationships represented by
edges.

Fig: Graph Based Data Model

Working of Graph Data Model :

In these data models, the nodes which are connected together are connected physically
and the physical connection among them is also taken as a piece of data. Connecting
data in this way becomes easy to query a relationship. This data model reads the
relationship from storage directly instead of calculating and querying the connection
steps. Like many different NoSQL databases these data models don’t have any schema
as it is important because schema makes the model well and good and easy to edit.

25
Examples of Graph Data Models :
● JanusGraph: These are very helpful in big data analytics. It is a scalable graph
database system open source too. JanusGraph has different features like:
○ Storage: Many options are available for storing graph data like Cassandra.
○ Support for transactions: There are many supports available like ACID
(Atomicity, Consistency, Isolation, and Durability) which can hold
thousands of concurrent users.
○ Searching options: Complex searching options are available and optional
support too.
● Neo4j: It stands for Network Exploration and Optimization 4 Java. As the name
suggests this graph database is written in Java with native graph storage and
processing. Neo4j has different features like:
○ Scalable: Scalable through data partitioning into pieces known as shards.
○ Higher Availability: Availability is very much high due to continuous
backups and rolling upgrades.
○ Query Language: Uses programmer-friendly query language Cypher graph
query language.DGraph main features are:
● DGraph: It is an open-source distributed graph database system designed with
scalability.
○ Query Language: It uses GraphQL, which is solely made for APIs.
○ open-source system: support for many open standards.
Advantages of Graph Data Model :
● Structure: The structures are very agile and workable too.
● Explicit Representation: The portrayal of relationships between entities is
explicit.
● Real-time O/P Results: Query gives us real-time output results.
Disadvantages of Graph Data Model :
● No standard query language: Since the language depends on the platform that
is used so there is no certain standard query language.
● Unprofessional Graphs: Graphs are very unprofessional for transactional-based
systems.
● Small User Base: The user base is small which makes it very difficult to get
support when running into a system.
Applications of Graph Data Model:
● Graph data models are very much used in fraud detection which itself is very
much useful and important.

26
● It is used in Digital asset management which provides a scalable database model
to keep track of digital assets.
● It is used in Network management which alerts a network administrator about
problems in a network.
● It is used in Context-aware services by giving traffic updates and many more.
● It is used in Real-Time Recommendation Engines which provide a better user
experience.

1.3.4 Schema less Databases

A schemaless database manages information without the need for a blueprint. The
onset of building a schemaless database doesn’t rely on conforming to certain fields,
tables, or data model structures. There is no Relational Database Management System
(RDBMS) to enforce any specific kind of structure. In other words, it’s a non-relational
database that can handle any database type, whether that be a key-value store,
document store, in-memory, column-oriented, or graph data model. NoSQL databases’
flexibility is responsible for the rising popularity of a schemaless approach and is often
considered more user-friendly than scaling a schema or SQL database.

How does a schemaless database work?

● With a schemaless database, you don’t need to have a fully-realized vision of
what your data structure will be. Because it doesn’t adhere to a schema, all data
saved in a schemaless database is kept completely intact.
● A relational database, on the other hand, picks and chooses what data it keeps,
either changing the data to fit the schema, or eliminating it altogether.
● Going schemaless allows every bit of detail from the data to remain unaltered
and be completely accessible at any time.
● For businesses whose operations change according to real-time data, it’s
important to have that untouched data as any of those points can prove to be
integral to how the database is later updated.
● Without a fixed data structure, schemaless databases can include or remove
data types, tables, and fields without major repercussions, like complex schema
migrations and outages.
● Because it can withstand sudden changes and parse any data type, schemaless
databases are popular in industries that are run on real-time data, like financial
services, gaming, and social media.

27
Schemaless vs. schema databases pros and cons

Schemaless Database Pros Schemaless Database Cons

All data (and metadata) remains No universal language available to query

unaltered and accessible data in a non-relational database

There is no existing “schema” for the data Though the NoSQL community is still
to be structured around growing at a tremendous rate, not all
troubleshooting issues have been
properly documented

Can add additional fields that SQL Lack of compatibility with SQL
databases can’t accommodate instructions

Accommodates key-value store, No ACID-level compliance, as data

document store, in-memory, retrievals can have inconsistencies given
column-oriented, or graph data models their distributed approach

1.3.5 Materialized Views

A materialized view is a duplicate data table created by combining data from multiple
existing tables for faster data retrieval. For example, consider a retail application with
two base tables for customer and product data. This performance difference can be
significant when a query is run frequently or is sufficiently complex. As a result,
materialized views can speed up expensive aggregation, projection, and selection
operations, especially those that run frequently and that run on large data sets.

How Materialized Views Work

Materialized views precompute and store the results of a query as a physical table in the
database. This precomputation occurs at regular intervals or can be triggered by specific
events.

Creating a Materialized View

1. Define the Query: Specify a query to retrieve data from source tables, including any
filtering, aggregations, or joins.

28
2. Populate the View: The database runs the query and stores the results as a physical
table.

Updating a Materialized View

Materialized views need periodic updates to reflect changes in the source data. The
refresh frequency varies based on requirements.

Data Refresh Approaches

1. Full Refresh: Completely recomputes and replaces the materialized view. Simple but
resource-intensive.

2. Incremental Refresh: Applies only changes from the source data, more efficient for
large datasets.

3. On-Demand Refresh: Refreshes are triggered by specific events or requests, offering

more control but requiring careful management.

Use Cases of Materialized Views

1. Data Distribution: Efficiently replicate and distribute data to multiple locations,

reducing network load and improving access for remote workers.

2. Time Series Analysis: Store precomputed data summaries (e.g., monthly or weekly)
for business intelligence and reporting.

3. Optimizing Remote Data: Store remote data locally to reduce network

communication and enhance performance in distributed databases.

MongoDB uses aggregation functions to deliver a similar capability to materialized views

but for a NoSQL environment.

29
fig: materialized view in NOSQL mongodb

Advantages of Materialized Views

Materialized views offer key benefits:

1. Speed: They improve query performance by allowing you to query precomputed data
instead of recalculating it each time, saving time on complex queries.

2. Simplicity: They consolidate complex queries into one table, making data
transformations and maintenance easier. This also helps reduce the data replicated in
the view.

3. Consistency: They provide a consistent snapshot of data, ensuring reliable access

even if the source data changes or is deleted. This is useful for time-based reporting.

4. Access Control: They allow you to control data access, letting users see specific data
without accessing the underlying source tables.

Challenges with Materialized Views:

Materialized views introduce complexity in terms of maintenance, requiring a balance

between query efficiency and storage costs, along with data consistency issues.

1. Maintenance: Creating effective rules for updates is necessary to keep materialized

views beneficial.

30
2. Performance Impact: Frequent updates can degrade system performance,
especially during peak periods.

3. Storage Costs: Materialized views replicate data, leading to significant storage

demands, particularly in large, frequently updated databases.

4. Management: Clear refresh rules and schedules are needed, along with strategies to
handle data inconsistencies, refresh failures, and storage strain.

The following table shows key similarities and differences between tables, regular views,
cached query results, and materialized views:

Performance Security Simplifies Supports Uses Uses Credits

Benefits Benefits Query Clustering Storage for
Logic Maintenance

Regular ✔ ✔
table

Regular ✔ ✔
view

Material ✔ ✔ ✔ ✔ ✔ ✔
ized
view

1.3.6 Modeling for Data Access

when the order is placed by the customer.

31
Figure 3.3. Customer is stored separately from Order.
In document stores, since we can query inside documents, removing references to
Orders from the Customer object is possible. This change allows us to not update the
Customer object when new orders are placed by the Customer.
{
"customerId": 1,
"name": "Martin",
"billingAddress": [{"city": "Chicago"}],
"payment": [{"type": "debit", "ccinfo": "1000-1000-1000-1000"}]
}
{
"orderId": 99,
"customerId": 1,
"orderDate": "Nov-20-2011",
"orderItems": [{"productId": 27, "price": 32.45}],
"orderPayment": [{"ccinfo": "1000-1000-1000-1000", "txnId": "abelif879rft"}],
"shippingAddress": {"city": "Chicago"}
}

32
Figure 3.4. Conceptual view into a column data store

Figure 3.5. Graph model of e-commerce data

This type of relationship traversal is very easy with graph databases. It is
especially convenient when you need to use the data to recommend products to users
or to find patterns in actions taken by users.

33
Key Points
● Aggregate-oriented databases make inter-aggregate relationships more difficult
to handle intra-aggregate relationships.
● Graph databases organize data into node and edge graphs; they work best for
data that has complex relationship structures.
● Schemaless databases allow you to freely add fields to records, but there is
usually an implicit schema expected by users of the data.
● Aggregate-oriented databases often compute materialized views to provide data
organized differently from their primary aggregates. This is often done with
map-reduce computations.

Apache Cassandra Administrator Associate - Exam Practice Tests
From Everand
Apache Cassandra Administrator Associate - Exam Practice Tests
Cristian Scutaru
No ratings yet
Kendall Sad9 Im 13
No ratings yet
Kendall Sad9 Im 13
30 pages
NOSQL
No ratings yet
NOSQL
23 pages
BDA Unit2 Complete
No ratings yet
BDA Unit2 Complete
56 pages
HBase
No ratings yet
HBase
36 pages
DPCO Unit 1 - New
No ratings yet
DPCO Unit 1 - New
78 pages
CCS334 BIG DATA ANALYTICS Session 1 Intr
No ratings yet
CCS334 BIG DATA ANALYTICS Session 1 Intr
18 pages
DBMS Unit 3
No ratings yet
DBMS Unit 3
98 pages
Lab Manual: Sri Ramakrishna Institute of Technology
No ratings yet
Lab Manual: Sri Ramakrishna Institute of Technology
49 pages
Unit 5 - SE - Notes
No ratings yet
Unit 5 - SE - Notes
45 pages
CS8492 /database Management Systems 2017 Regulations
No ratings yet
CS8492 /database Management Systems 2017 Regulations
20 pages
P.prabu (28x61c) CCS334 BDA - Unit 4
No ratings yet
P.prabu (28x61c) CCS334 BDA - Unit 4
28 pages
NoSQL Notes
No ratings yet
NoSQL Notes
5 pages
AI Chatbot Unit 2
No ratings yet
AI Chatbot Unit 2
7 pages
CSE287 (Database Management Systems Laboratory) - Final
No ratings yet
CSE287 (Database Management Systems Laboratory) - Final
13 pages
Sample Paper Q0503
No ratings yet
Sample Paper Q0503
20 pages
BDA Lab ManuaL[1]
No ratings yet
BDA Lab ManuaL[1]
83 pages
Big Data Management Syllabus
100% (1)
Big Data Management Syllabus
5 pages
Multimedia Database
No ratings yet
Multimedia Database
14 pages
Implementation Techniques - Unit 4
No ratings yet
Implementation Techniques - Unit 4
29 pages
Hbase
No ratings yet
Hbase
13 pages
cb3401-unit-2
No ratings yet
cb3401-unit-2
24 pages
APP Question Bank Unit3
100% (1)
APP Question Bank Unit3
5 pages
Mc5502 Bda Unit I Notes
No ratings yet
Mc5502 Bda Unit I Notes
106 pages
Cs8582-Object Oriented Analysisand Design Laboratory-46023968-Cs8582 - Ooad Lab
No ratings yet
Cs8582-Object Oriented Analysisand Design Laboratory-46023968-Cs8582 - Ooad Lab
132 pages
Big Data Analytics Unit-5
No ratings yet
Big Data Analytics Unit-5
28 pages
Programming in C - CS3251 - HandWritten Notes - Un_250316_200237
No ratings yet
Programming in C - CS3251 - HandWritten Notes - Un_250316_200237
38 pages
Unit No.4 Parallel Database
No ratings yet
Unit No.4 Parallel Database
32 pages
Dbms Mod4 PDF
No ratings yet
Dbms Mod4 PDF
36 pages
Unit 5 - Chapter 2 - Introduction To MongoDB
No ratings yet
Unit 5 - Chapter 2 - Introduction To MongoDB
53 pages
DBMS Unit - 5
No ratings yet
DBMS Unit - 5
27 pages
Bda - Unit 2
No ratings yet
Bda - Unit 2
30 pages
CS8492 DBMS Unit 2 PDF
100% (1)
CS8492 DBMS Unit 2 PDF
18 pages
No SQL
No ratings yet
No SQL
11 pages
21CSE354T - Full Stack Web Development Question Bank (1)
100% (1)
21CSE354T - Full Stack Web Development Question Bank (1)
9 pages
UNIT 3 OOSE ppt (1)
No ratings yet
UNIT 3 OOSE ppt (1)
86 pages
Storage Technologies Question Bank (1)
No ratings yet
Storage Technologies Question Bank (1)
61 pages
Advanced Databases - Unit - V - PPT
No ratings yet
Advanced Databases - Unit - V - PPT
71 pages
Lecture Notes-Cns by Suthoju Girija Rani
100% (1)
Lecture Notes-Cns by Suthoju Girija Rani
163 pages
Module 4 Nosql
No ratings yet
Module 4 Nosql
8 pages
Streamprocessing Labmanual
No ratings yet
Streamprocessing Labmanual
48 pages
cb3401 question bank
No ratings yet
cb3401 question bank
9 pages
UNIT-3: Introduction To Parallel Database and I/O Parallelism
No ratings yet
UNIT-3: Introduction To Parallel Database and I/O Parallelism
52 pages
DBMS Unit 1 Notes
100% (1)
DBMS Unit 1 Notes
22 pages
Cp4152 Database Practice Lab Manual R 2021
No ratings yet
Cp4152 Database Practice Lab Manual R 2021
48 pages
UNIT-5: Read (X) : Performs The Reading Operation of Data Item X From The Database
No ratings yet
UNIT-5: Read (X) : Performs The Reading Operation of Data Item X From The Database
37 pages
Cs2203 Object Oriented Programming Iiird Sem Question Bank Unit - I Part - A (2 Marks)
No ratings yet
Cs2203 Object Oriented Programming Iiird Sem Question Bank Unit - I Part - A (2 Marks)
6 pages
391 - CS8091 Big Data Analytics - Anna University 2017 Regulation Syllabus
0% (2)
391 - CS8091 Big Data Analytics - Anna University 2017 Regulation Syllabus
2 pages
DAA-2020-21 Final Updated Course File
No ratings yet
DAA-2020-21 Final Updated Course File
49 pages
21IT1701 - MCMAD - Unit IV Notes
No ratings yet
21IT1701 - MCMAD - Unit IV Notes
23 pages
Fundamentals of Database Systems 6th Edition by Ramez Elmasri, Shamkant Navathe 0136086209 978-0136086208 download
100% (2)
Fundamentals of Database Systems 6th Edition by Ramez Elmasri, Shamkant Navathe 0136086209 978-0136086208 download
54 pages
Assignment 1
100% (1)
Assignment 1
19 pages
PPS - Unit 1
No ratings yet
PPS - Unit 1
69 pages
Notes - Unit 3 - Map Reduce Applications
No ratings yet
Notes - Unit 3 - Map Reduce Applications
11 pages
Oop PPT 1
No ratings yet
Oop PPT 1
94 pages
CCA 3 QP 2021-Final
No ratings yet
CCA 3 QP 2021-Final
2 pages
Big Data Analytics Lab Manual
No ratings yet
Big Data Analytics Lab Manual
80 pages
6.1 Emerging Databases
No ratings yet
6.1 Emerging Databases
18 pages
Model Question Paper
No ratings yet
Model Question Paper
2 pages
DBMS QB
No ratings yet
DBMS QB
12 pages
Optimizing Hadoop for MapReduce
From Everand
Optimizing Hadoop for MapReduce
Khaled Tannir
No ratings yet
Enhanced Data Models For Advanced Applications
91% (11)
Enhanced Data Models For Advanced Applications
15 pages
CS403 Quiz-1 File by Vu Topper RM
No ratings yet
CS403 Quiz-1 File by Vu Topper RM
69 pages
IT Project Documentation Template
No ratings yet
IT Project Documentation Template
8 pages
RDBMS Question Paper
No ratings yet
RDBMS Question Paper
2 pages
ER Model
No ratings yet
ER Model
14 pages
SQL Refyne
No ratings yet
SQL Refyne
4 pages
JNTUA Database Management Systems Lab Manual R20
No ratings yet
JNTUA Database Management Systems Lab Manual R20
71 pages
Data Management
No ratings yet
Data Management
10 pages
DBMS - Lesson 3 - Introduction To Database Management Systems
No ratings yet
DBMS - Lesson 3 - Introduction To Database Management Systems
20 pages
CH 3 (SpatialDB)
No ratings yet
CH 3 (SpatialDB)
142 pages
Data Warehousing/Mining Comp 150 DW Semistructured Data: Instructor: Dan Hebert
No ratings yet
Data Warehousing/Mining Comp 150 DW Semistructured Data: Instructor: Dan Hebert
33 pages
21CSC205P - DBMS - Unit I - QB
No ratings yet
21CSC205P - DBMS - Unit I - QB
6 pages
Fresco
No ratings yet
Fresco
29 pages
Ict Spec Pratical
No ratings yet
Ict Spec Pratical
6 pages
Chapter - 1
No ratings yet
Chapter - 1
330 pages
Syllabus FOR M. SC. Computer Science
No ratings yet
Syllabus FOR M. SC. Computer Science
31 pages
University of Mysore Directorate of Outreach and Online Programs
No ratings yet
University of Mysore Directorate of Outreach and Online Programs
15 pages
Oracle DBA
100% (2)
Oracle DBA
569 pages
DMSMP
No ratings yet
DMSMP
20 pages
Pre-Board - I 24-25(X) IT
No ratings yet
Pre-Board - I 24-25(X) IT
10 pages
Database Quiz - Data Science Masters - PW Skills
No ratings yet
Database Quiz - Data Science Masters - PW Skills
3 pages
IDT Legacy Workflows
No ratings yet
IDT Legacy Workflows
32 pages
(IT) 01 Introduction To Database v.3
No ratings yet
(IT) 01 Introduction To Database v.3
50 pages
DBMS PPT7
No ratings yet
DBMS PPT7
23 pages
Tourism Report PDF
No ratings yet
Tourism Report PDF
40 pages
ip 26
No ratings yet
ip 26
5 pages
TL 0601
No ratings yet
TL 0601
17 pages
Class:Xii: Chapter 5: My SQL - SQL Revision Tour
No ratings yet
Class:Xii: Chapter 5: My SQL - SQL Revision Tour
3 pages
Primary Key Primary Index
No ratings yet
Primary Key Primary Index
9 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

4.2 NoSQL Databases UNIT-1

Uploaded by

4.2 NoSQL Databases UNIT-1

Uploaded by

NOSQL DATABASES

SQL vs NOSQL Databases

Feature SQL NOSQL

DB Type Relational Non-Relational

Server Type Centralized Distributed

Data Integrity Yes No

Scaling Vertical Horizontal &

Data Type Structured UnStructured /

ACID Properties Yes No

High Availability No Yes

When to Use NoSQL Databases

1.1.2 The Value of Relational Databases

Getting at Persistent Data

1.1.3 Impedance Mismatch

1.1.4 Application and Integration Databases

1.1.5 Attack of the Clusters

● As large properties moved towards clusters, that revealed a new problem:

1.1.6 The Emergence of NoSQL

Example of Relations and Aggregates

Figure 2.3. An aggregate data model

1.2.3 Consequences of Aggregate Orientation

1.2.4 Key-Value and Document Data Models

Example of a key-value data model

How do key-value databases work?

When to use a key-value database:

Working of Document Data Model:

1.2.5 Column-Family Stores

The Structure of a Column Store Database

Diagram of a keyspace containing column families.

Diagram of rows and columns in a wide column store database.

Here’s a breakdown of each element in the row:

Benefits of Column Store Databases

1.2.6 Summarizing Aggregate-Oriented Databases

1.3.3 Graph Databases

Fig: Graph Based Data Model

Working of Graph Data Model :

1.3.4 Schema less Databases

How does a schemaless database work?

Schemaless Database Pros Schemaless Database Cons

All data (and metadata) remains No universal language available to query

Accommodates key-value store, No ACID-level compliance, as data

1.3.5 Materialized Views

How Materialized Views Work

Creating a Materialized View

Updating a Materialized View

Data Refresh Approaches

3. On-Demand Refresh: Refreshes are triggered by specific events or requests, offering

Use Cases of Materialized Views

1. Data Distribution: Efficiently replicate and distribute data to multiple locations,

3. Optimizing Remote Data: Store remote data locally to reduce network

MongoDB uses aggregation functions to deliver a similar capability to materialized views

Advantages of Materialized Views

3. Consistency: They provide a consistent snapshot of data, ensuring reliable access

Challenges with Materialized Views:

Materialized views introduce complexity in terms of maintenance, requiring a balance

1. Maintenance: Creating effective rules for updates is necessary to keep materialized

3. Storage Costs: Materialized views replicate data, leading to significant storage

Performance Security Simplifies Supports Uses Uses Credits

1.3.6 Modeling for Data Access

Figure 3.5. Graph model of e-commerce data

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.