0% found this document useful (0 votes)
5 views31 pages

Unit 1-NoSQL

The document provides an overview of NoSQL databases, highlighting their advantages such as high scalability, flexibility, and cost-effectiveness, while also addressing disadvantages like lack of standardization and ACID compliance. It discusses various types of NoSQL databases, including document-oriented, key-value pair, graph-based, and column-based databases, along with their use cases in industries like e-commerce, mobile applications, and IoT. Additionally, it explains the CAP theorem, which outlines the trade-offs between consistency, availability, and partition tolerance in database design.

Uploaded by

zackbhavsar1209
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views31 pages

Unit 1-NoSQL

The document provides an overview of NoSQL databases, highlighting their advantages such as high scalability, flexibility, and cost-effectiveness, while also addressing disadvantages like lack of standardization and ACID compliance. It discusses various types of NoSQL databases, including document-oriented, key-value pair, graph-based, and column-based databases, along with their use cases in industries like e-commerce, mobile applications, and IoT. Additionally, it explains the CAP theorem, which outlines the trade-offs between consistency, availability, and partition tolerance in database design.

Uploaded by

zackbhavsar1209
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 31

NoSQL

Unit 1
1.1. Introduction to Big Data,
Overview of big data and NoSQL,
1.2. Databases
1.1 Introduction to Big Data,
Overview of big data and NoSQL
Advantages of NoSQL
The main advantages are high scalability and high availability.

1. High scalability: NoSQL databases use sharding for horizontal scaling. That
indicates addition more machines to handle the data.
2. Flexibility: NoSQL databases are designed to handle unstructured or
semi-structured data, which means that they can accommodate dynamic
changes to the data model.
3. High Availability: in case of any failure data replicates itself to the previous
consistent state.
4. Scalability in volumn: Can make them a good fit for applications that need to
handle large amounts of data or traffic.
5. Performance: Designed to handle the large amount of data to improve the
performance.
6. Cost Effectiveness: In compare of SQL
7. Agility: Ideal for agile development
Scale-out(Horizontal scaling)
Disadvantages of NoSQL
Lack of standardization:This lack of standardization can make it difficult to choose the right database for a
specific application

Lack of ACID compliance: they do not guarantee the consistency, integrity, and durability of data. This can be a
drawback for applications that require strong data consistency guarantees.

Open-source: NoSQL is open-source database. There is no reliable standard for NoSQL yet.

Lack of support for complex queries: NoSQL databases are not designed to handle complex queries, which
means that they are not a good fit for applications that require complex data analysis or reporting.

Lack of maturity: NoSQL databases are relatively new and lack the maturity of traditional relational databases.
This can make them less reliable and less secure than traditional databases.

Backup: Backup is a great weak point for some NoSQL databases like MongoDB. MongoDB has no approach for
the backup of data in a consistent manner.

Large document size: documents are quite large (BigData, network bandwidth, speed), and having descriptive
key names actually hurts since they increase the document size.
History of NoSQL
The acronym NoSQL was first used in 1998 by Carlo Strozzi. he term NoSQL can mean either “No SQL
systems” or the more commonly accepted translation of “Not only SQL,

NoSQL systems handle both structured and unstructured data, but they can also process unstructured Big
Data quickly. This led to organizations such as Facebook, Twitter, LinkedIn, and Google adopting NoSQL
systems.
Use of NoSQL in Industry
Session Store:

● Managing session data using relational database is very difficult, especially in case where applications are grown
very much.
● In such cases the right approach is to use a global session store, which manages session information for every user
who visits the site.
● NOSQL is suitable for storing such web application session information very is large in size.
● Since the session data is unstructured in form, so it is easy to store it in schema less documents rather than in
relation database record.

User Profile Store

● To enable online transactions, user preferences, authentication of user and more, it is required to store the user
profile by web and mobile application.
● In recent time users of web and mobile application are grown very rapidly. The relational database could not
handle such large volume of user profile data which growing rapidly, as it is limited to single server.
● Using NOSQL capacity can be easily increased by adding server, which makes scaling cost effective
Content and Metadata Store

● Many companies like publication houses require a place where they can store large amount of data,
which include articles, digital content and e-books, in order to merge various tools for learning in single
platform
● The applications which are content based, for such application metadata is very frequently accessed
data which need less response times.
● For building applications based on content, use of NoSQL provide flexibility in faster access to data and
to store different types of contents

Mobile Application:

● Since the smartphone users are increasing very rapidly, mobile applications face problems related to
growth and volume.
● Using NoSQL database mobile application development can be started with small size and can be easily
expanded as the number of user increases, which is very difficult if you consider relational databases.
● The mobile app companies like Kobo and Playtika, uses NOSQL and serving millions of users across the
world.
Third-Party Data Aggregation:

● Frequently a business require to access data produced by third party. For instance, a consumer packaged
goods company may require to get sales data from stores as well as shopper’s purchase history.
● In such scenarios, NoSQL databases are suitable, since NoSQL databases can manage huge amount of
data which is generating at high speed from various data sources.

Internet of Things

● Now a days, billions of devices are connected to internet, such as smartphones, tablets, home
appliances, systems installed in hospitals, cars and warehouses. For such devices large volume and
variety of data is generated and keep on generating.
● Relational databases are unable to store such data. The NOSQL permits organizations to expand
concurrent access to data from billions of devices and systems which are connected, store huge amount
of data and meet the required performance.
E-Commerce:

● E-commerce companies use NoSQL for store huge volume of data and large amount of request from user.

Social Gaming:

● Data-intensive applications such as social games which can grow users to millions. Such a growth in
number of users as well as amount of data requires a database system which can store such data and can
be scaled to incorporate number of growing users NOSQL is suitable for such applications.
● NOSQL has been used by some of the mobile gaming companies like, electronic arts, zynga and tencent.

Ad Targeting

● Displaying ads or offers on the current web page is a decision with direct income To determine what
group of users to target, on web page where to display ads, the platforms gathers behavioral and
demographic characteristics of users.
● A NoSQL database enables ad companies to track user details and also place the very quickly and
increases the probability of clicks.
● AOL, Mediamind and PayPal are some of the ad targeting companies which uses NoSQL
1.2. Databases
The Definition of the Four Types of NoSQL Data models
1. Document Oriented Database.

Database → Collection → Document

A database that stores information in documents.


Document?? → Records. Which stores the information about one object and any of its related
metadata.
Collections?? → Group of documents. It stores the document which has similar contents.
Document databases are suitable for storing and managing Big Data-sized collections of literal
documents like text documents, email messages, XML documents, etc. Documents are
de-normalised (aggregate) representations of a database entity, and are suitable for storing
semi-structured data that would require the extensive use of nulls in an RDBMS

In MongoDB, a record is a document that gets stored in a binary (JSON) format and documents
are grouped together into collections.

Collections are similar to the tables from relational databases


The document databases are similar to key-value pair database. The only difference is that
the value contains the structured or semi-structured data. This structured/semi-structured
data is called as a “document”

Examples of Document databases are – MongoDB, OrientDB, Apache CouchDB,


ApacheDB, etc.

DATABASE
COLLECTION
COLLECTION

Document #4
Document #1 Document #2 Document #3
Key:{key:Valu
Key:Value {Key:Value} [Key:Value]
e, Key:Value}
COLLECTION
Converting the table to the document
{
_id: 201,
sname:
{“Name”:”Shruti”,
“Surname”:”Agarwal”}
_id sname srollno sphno srollno: 32
sphno: 1234567890
201 Name = 32 1234567890 }
Shruti,
Surname {
=
_id: 202,
Agarwal
sname: “Aman”
202 Aman 11 4730158754 srollno: 11
sphno: 4730158754
}
Advantages:

● Flexibility:

documents of one database do not require consistency. They do not have to be of the same type,
nor do they have to be structured the same.

● Easy to update

Any new piece of information, when added to a relational database, has to be added to all data sets
to maintain the unified structure within a table of a relational database. With document stores, you
can add new pieces of information easily without having to add them to all existing data sets.

● Improved Performance

Rather than pulling data from multiple related tables, you can find everything you need within one
document. With everything kept in a single location, it is much faster to reach and retrieve the data.
Disadvantages:

NoSQL databases are simple when compared to relational databases. If you jeopardize the simplicity of a

document store, you will also jeopardize the previously mentioned improved performance. You can create

references between documents of a document store by interlinking them, but doing so can create complex

systems and deprive you of fast performance.

For large-volume database,if we want to create a network of mutually referenced data, we need to find a

way of mapping it and fitting it into a relational database.


2. Key- Value Pair Database
Its a non-relational database. The simplest form amongst all the database.

It has two columns with one is key and one is value.

The values can be simple data types like strings and numbers or complex object.

An efficient and compact structure of the index is used by the key-value store to have the option to
rapidly and dependably find value using its key

When to use a key-value database:

Here are a few situations in which you can use a key-value database:-
● User session attributes in an online app like finance or gaming, which is referred to as
real-time random data access.
● Caching mechanism for repeatedly accessing data or key-based design.
● The application is developed on queries that are based on keys.
Features
● One of the most un-complex kinds of NoSQL data models.
● For storing, getting, and removing data, key-value databases utilize simple functions.
● Querying language is not present in key-value databases.
● Built-in redundancy makes this database more reliable.

Advantages

● Its response time is fast due to its simplicity, given that the remaining environment near it is
very much constructed and improved.
● Key-value store databases are scalable vertically as well as horizontally.
● Built-in redundancy makes this database more reliable.
Disadvantages
● As querying language is not present in key-value databases, transportation of queries from
one database to a different database cannot be done.
● The key-value store database is not refined. You cannot query the database without a key.

Examples
CouchBase, AmazonDynamoDB, Riak, AeroSpike
3. Graph based database
This type of database store data as a network of
nodes and edges, which allows for the efficient
representation and manipulation of complex data
relationships.

A graph database is any storage system that provides


index-free adjacency. This means that every node
contains a direct pointer to its adjacent element and
no index lookups are necessary.

As the number of nodes increases, the cost of a hop


remains the same.

Graph databases are optimized for traversing through connected data, e.g. traversing through a list
of contacts on your social network to find out the degree of connections.

Graph databases usually come with a flexible data model, which means there is no need to define
the types of edges and vertices.
● The speed depends upon the number of relationships among the database elements.
● Updating data is also easy, as adding a new node or edge to a graph database is a
straightforward task that does not require significant schema changes.

Features of Graph database:

Flexibility, Agility, Improved performance with huge data.


Real world example: Recommendation Engine, Social N/W Site etc.

Pros:

– Extremely powerful, Connected data is locally indexed, Can provide ACID, Results in real-time,
Agile Structure

Cons:

– Difficult to scale out, though can scale up


Use Cases of Graph Databases:
Risk assessment, Fraud Detection, Recommendation applications in ML, Logistics etc.
4. Column Based DB
used in a database management system (DBMS) which helps to store data in columns rather than
rows. t is responsible for speeding up the time required to return a particular query.

the major motive of Columnar Database is to effectively read and write data

When to use the Columnar Database:

1. Queries that involve only a few columns.


2. Compression but column-wise only.
3. Clustering queries against a huge amount of data.
Advantages:

1. Columnar databases can be used for different tasks such as when the applications that are
related to big data comes into play then the column-oriented databases have greater attention in
such case.
2. The data in the columnar database has a highly compressible nature and has different operations
like (AVG), (MIN, MAX), which are permitted by the compression.
3. Efficiency and Speed: The speed of Analytical queries that are performed is faster in columnar
databases.
4. Self-indexing: Another benefit of a column-based DBMS is self-indexing, which uses less disk
space than a relational database management system containing the same data.

Disadvantages:

For Online transaction processing (OLTP) applications, Row oriented databases are more appropriate
than columnar databases.
CAP Theorem
Consistency:

The data should remain consistent even after the execution of an operation. This means
once data is written, any future read request should contain that data. For example,
after updating the order status, all the clients should be able to see the same data.

Availability:

The database should always be available and responsive. It should not have any
downtime.IMP: Availability has a particular meaning in the context of CAP—it means
that if you can talk to a node in the cluster, it can read and write data.

Partition Tolerance:

Partition Tolerance means that the system should continue to function even if the
communication among the servers is not stable. For example, the servers can be
partitioned into multiple groups which may not communicate with each other. Here, if part of
Partition tolerance means that the cluster can survive communication breakages in
the cluster that separate the cluster into multiple partitions unable to communicate
with each other.
A single-server system is
the obvious example of a
CA system—a system that
has Consistency and
Availability but not Partition
tolerance. A single machine
can’t partition, so it does
not have to worry
about partition tolerance.
Real life examples of CAP theorem
Let us consider a Company that has two stocks and one offer. Company stocks can support three
different operations:

1. Place orders
2. Check Product total available quantity

Stock with consistency Design: If the stock has chosen a consistent design, then the branch will
inform: I can’t accept sell right now because I can’t update the quantity in the other stock

Stock with Availability Design: I will allow you to place the order and will keep track of what
happened and then later when the partition heals with the other stock, the account quantity will be
updated in another stock
stock with a degree of consistency and availability design: For example, when a partition happens,
we can have stocks:

● Accept partial orders


● Not provide total quantity service.

stocks can also provide information about the quantity but only provide tentative quantity
information. This means, we are not sure if this is the correct quantity, but it is probably right if you
haven’t been running between Stocks.
How does the CAP Theorem affect making database decisions?

It will be all about the context in which your database is operating, the needs of the business, and
the expectations and needs of users.

You will have to consider things like:

● Is it important to avoid throwing up errors in the client?


● Or are we willing to sacrifice the visible user experience to ensure consistency?
● Is consistency an actual important part of the user’s experience
● Or can we actually do what we want with a relational database and avoid the need for
partition tolerance altogether?

these are ultimately user experience questions. To properly understand those, you need to be
sensitive to the overall goals of the project, and, as said above, the context in which your database
solution is operating. (Is it powering an internal analytics dashboard? Or is it supporting a widely
used external-facing website or application?)

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy