Unit 1-NoSQL
Unit 1-NoSQL
Unit 1
1.1. Introduction to Big Data,
Overview of big data and NoSQL,
1.2. Databases
1.1 Introduction to Big Data,
Overview of big data and NoSQL
Advantages of NoSQL
The main advantages are high scalability and high availability.
1. High scalability: NoSQL databases use sharding for horizontal scaling. That
indicates addition more machines to handle the data.
2. Flexibility: NoSQL databases are designed to handle unstructured or
semi-structured data, which means that they can accommodate dynamic
changes to the data model.
3. High Availability: in case of any failure data replicates itself to the previous
consistent state.
4. Scalability in volumn: Can make them a good fit for applications that need to
handle large amounts of data or traffic.
5. Performance: Designed to handle the large amount of data to improve the
performance.
6. Cost Effectiveness: In compare of SQL
7. Agility: Ideal for agile development
Scale-out(Horizontal scaling)
Disadvantages of NoSQL
Lack of standardization:This lack of standardization can make it difficult to choose the right database for a
specific application
Lack of ACID compliance: they do not guarantee the consistency, integrity, and durability of data. This can be a
drawback for applications that require strong data consistency guarantees.
Open-source: NoSQL is open-source database. There is no reliable standard for NoSQL yet.
Lack of support for complex queries: NoSQL databases are not designed to handle complex queries, which
means that they are not a good fit for applications that require complex data analysis or reporting.
Lack of maturity: NoSQL databases are relatively new and lack the maturity of traditional relational databases.
This can make them less reliable and less secure than traditional databases.
Backup: Backup is a great weak point for some NoSQL databases like MongoDB. MongoDB has no approach for
the backup of data in a consistent manner.
Large document size: documents are quite large (BigData, network bandwidth, speed), and having descriptive
key names actually hurts since they increase the document size.
History of NoSQL
The acronym NoSQL was first used in 1998 by Carlo Strozzi. he term NoSQL can mean either “No SQL
systems” or the more commonly accepted translation of “Not only SQL,
NoSQL systems handle both structured and unstructured data, but they can also process unstructured Big
Data quickly. This led to organizations such as Facebook, Twitter, LinkedIn, and Google adopting NoSQL
systems.
Use of NoSQL in Industry
Session Store:
● Managing session data using relational database is very difficult, especially in case where applications are grown
very much.
● In such cases the right approach is to use a global session store, which manages session information for every user
who visits the site.
● NOSQL is suitable for storing such web application session information very is large in size.
● Since the session data is unstructured in form, so it is easy to store it in schema less documents rather than in
relation database record.
● To enable online transactions, user preferences, authentication of user and more, it is required to store the user
profile by web and mobile application.
● In recent time users of web and mobile application are grown very rapidly. The relational database could not
handle such large volume of user profile data which growing rapidly, as it is limited to single server.
● Using NOSQL capacity can be easily increased by adding server, which makes scaling cost effective
Content and Metadata Store
● Many companies like publication houses require a place where they can store large amount of data,
which include articles, digital content and e-books, in order to merge various tools for learning in single
platform
● The applications which are content based, for such application metadata is very frequently accessed
data which need less response times.
● For building applications based on content, use of NoSQL provide flexibility in faster access to data and
to store different types of contents
Mobile Application:
● Since the smartphone users are increasing very rapidly, mobile applications face problems related to
growth and volume.
● Using NoSQL database mobile application development can be started with small size and can be easily
expanded as the number of user increases, which is very difficult if you consider relational databases.
● The mobile app companies like Kobo and Playtika, uses NOSQL and serving millions of users across the
world.
Third-Party Data Aggregation:
● Frequently a business require to access data produced by third party. For instance, a consumer packaged
goods company may require to get sales data from stores as well as shopper’s purchase history.
● In such scenarios, NoSQL databases are suitable, since NoSQL databases can manage huge amount of
data which is generating at high speed from various data sources.
Internet of Things
● Now a days, billions of devices are connected to internet, such as smartphones, tablets, home
appliances, systems installed in hospitals, cars and warehouses. For such devices large volume and
variety of data is generated and keep on generating.
● Relational databases are unable to store such data. The NOSQL permits organizations to expand
concurrent access to data from billions of devices and systems which are connected, store huge amount
of data and meet the required performance.
E-Commerce:
● E-commerce companies use NoSQL for store huge volume of data and large amount of request from user.
Social Gaming:
● Data-intensive applications such as social games which can grow users to millions. Such a growth in
number of users as well as amount of data requires a database system which can store such data and can
be scaled to incorporate number of growing users NOSQL is suitable for such applications.
● NOSQL has been used by some of the mobile gaming companies like, electronic arts, zynga and tencent.
Ad Targeting
● Displaying ads or offers on the current web page is a decision with direct income To determine what
group of users to target, on web page where to display ads, the platforms gathers behavioral and
demographic characteristics of users.
● A NoSQL database enables ad companies to track user details and also place the very quickly and
increases the probability of clicks.
● AOL, Mediamind and PayPal are some of the ad targeting companies which uses NoSQL
1.2. Databases
The Definition of the Four Types of NoSQL Data models
1. Document Oriented Database.
In MongoDB, a record is a document that gets stored in a binary (JSON) format and documents
are grouped together into collections.
DATABASE
COLLECTION
COLLECTION
Document #4
Document #1 Document #2 Document #3
Key:{key:Valu
Key:Value {Key:Value} [Key:Value]
e, Key:Value}
COLLECTION
Converting the table to the document
{
_id: 201,
sname:
{“Name”:”Shruti”,
“Surname”:”Agarwal”}
_id sname srollno sphno srollno: 32
sphno: 1234567890
201 Name = 32 1234567890 }
Shruti,
Surname {
=
_id: 202,
Agarwal
sname: “Aman”
202 Aman 11 4730158754 srollno: 11
sphno: 4730158754
}
Advantages:
● Flexibility:
documents of one database do not require consistency. They do not have to be of the same type,
nor do they have to be structured the same.
● Easy to update
Any new piece of information, when added to a relational database, has to be added to all data sets
to maintain the unified structure within a table of a relational database. With document stores, you
can add new pieces of information easily without having to add them to all existing data sets.
● Improved Performance
Rather than pulling data from multiple related tables, you can find everything you need within one
document. With everything kept in a single location, it is much faster to reach and retrieve the data.
Disadvantages:
NoSQL databases are simple when compared to relational databases. If you jeopardize the simplicity of a
document store, you will also jeopardize the previously mentioned improved performance. You can create
references between documents of a document store by interlinking them, but doing so can create complex
For large-volume database,if we want to create a network of mutually referenced data, we need to find a
The values can be simple data types like strings and numbers or complex object.
An efficient and compact structure of the index is used by the key-value store to have the option to
rapidly and dependably find value using its key
Here are a few situations in which you can use a key-value database:-
● User session attributes in an online app like finance or gaming, which is referred to as
real-time random data access.
● Caching mechanism for repeatedly accessing data or key-based design.
● The application is developed on queries that are based on keys.
Features
● One of the most un-complex kinds of NoSQL data models.
● For storing, getting, and removing data, key-value databases utilize simple functions.
● Querying language is not present in key-value databases.
● Built-in redundancy makes this database more reliable.
Advantages
● Its response time is fast due to its simplicity, given that the remaining environment near it is
very much constructed and improved.
● Key-value store databases are scalable vertically as well as horizontally.
● Built-in redundancy makes this database more reliable.
Disadvantages
● As querying language is not present in key-value databases, transportation of queries from
one database to a different database cannot be done.
● The key-value store database is not refined. You cannot query the database without a key.
Examples
CouchBase, AmazonDynamoDB, Riak, AeroSpike
3. Graph based database
This type of database store data as a network of
nodes and edges, which allows for the efficient
representation and manipulation of complex data
relationships.
Graph databases are optimized for traversing through connected data, e.g. traversing through a list
of contacts on your social network to find out the degree of connections.
Graph databases usually come with a flexible data model, which means there is no need to define
the types of edges and vertices.
● The speed depends upon the number of relationships among the database elements.
● Updating data is also easy, as adding a new node or edge to a graph database is a
straightforward task that does not require significant schema changes.
Pros:
– Extremely powerful, Connected data is locally indexed, Can provide ACID, Results in real-time,
Agile Structure
Cons:
the major motive of Columnar Database is to effectively read and write data
1. Columnar databases can be used for different tasks such as when the applications that are
related to big data comes into play then the column-oriented databases have greater attention in
such case.
2. The data in the columnar database has a highly compressible nature and has different operations
like (AVG), (MIN, MAX), which are permitted by the compression.
3. Efficiency and Speed: The speed of Analytical queries that are performed is faster in columnar
databases.
4. Self-indexing: Another benefit of a column-based DBMS is self-indexing, which uses less disk
space than a relational database management system containing the same data.
Disadvantages:
For Online transaction processing (OLTP) applications, Row oriented databases are more appropriate
than columnar databases.
CAP Theorem
Consistency:
The data should remain consistent even after the execution of an operation. This means
once data is written, any future read request should contain that data. For example,
after updating the order status, all the clients should be able to see the same data.
Availability:
The database should always be available and responsive. It should not have any
downtime.IMP: Availability has a particular meaning in the context of CAP—it means
that if you can talk to a node in the cluster, it can read and write data.
Partition Tolerance:
Partition Tolerance means that the system should continue to function even if the
communication among the servers is not stable. For example, the servers can be
partitioned into multiple groups which may not communicate with each other. Here, if part of
Partition tolerance means that the cluster can survive communication breakages in
the cluster that separate the cluster into multiple partitions unable to communicate
with each other.
A single-server system is
the obvious example of a
CA system—a system that
has Consistency and
Availability but not Partition
tolerance. A single machine
can’t partition, so it does
not have to worry
about partition tolerance.
Real life examples of CAP theorem
Let us consider a Company that has two stocks and one offer. Company stocks can support three
different operations:
1. Place orders
2. Check Product total available quantity
Stock with consistency Design: If the stock has chosen a consistent design, then the branch will
inform: I can’t accept sell right now because I can’t update the quantity in the other stock
Stock with Availability Design: I will allow you to place the order and will keep track of what
happened and then later when the partition heals with the other stock, the account quantity will be
updated in another stock
stock with a degree of consistency and availability design: For example, when a partition happens,
we can have stocks:
stocks can also provide information about the quantity but only provide tentative quantity
information. This means, we are not sure if this is the correct quantity, but it is probably right if you
haven’t been running between Stocks.
How does the CAP Theorem affect making database decisions?
It will be all about the context in which your database is operating, the needs of the business, and
the expectations and needs of users.
these are ultimately user experience questions. To properly understand those, you need to be
sensitive to the overall goals of the project, and, as said above, the context in which your database
solution is operating. (Is it powering an internal analytics dashboard? Or is it supporting a widely
used external-facing website or application?)