0% found this document useful (0 votes)
2 views5 pages

Big Data Technology

The document discusses the various types of NoSQL databases and their suitability for different data structures, emphasizing their flexibility and scalability compared to traditional SQL databases. It provides real-world examples, such as e-commerce platforms like Amazon benefiting from NoSQL databases for dynamic data, while banking systems require the reliability of relational databases. Additionally, it highlights the advantages of MapReduce in big data analytics, particularly its scalability and fault tolerance, along with an example of its application in log analysis.

Uploaded by

wambuadominic025
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views5 pages

Big Data Technology

The document discusses the various types of NoSQL databases and their suitability for different data structures, emphasizing their flexibility and scalability compared to traditional SQL databases. It provides real-world examples, such as e-commerce platforms like Amazon benefiting from NoSQL databases for dynamic data, while banking systems require the reliability of relational databases. Additionally, it highlights the advantages of MapReduce in big data analytics, particularly its scalability and fault tolerance, along with an example of its application in log analysis.

Uploaded by

wambuadominic025
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 5

1

Big Data Technology

Student’s Name

Institutional Affiliation

Professor’s Name

Course Names

Submission Due Date


2

Big Data Technology

Why do we have many different NoSQL database types?

We have many different NoSQL database types because modern data comes in various

forms and structures, and a single database model cannot efficiently accommodate all use cases.

Basic relational databases optimize data that has a predefined schema. But big data—semi

structured or unstructured—has become very prevalent and developers and organizations need

more flexible, scalable and high-performance alternatives. The four categories of NoSQL

databases are document stores, key value stores, column family stores, and graph databases.

Each of these is aimed at meeting a particular data storage and retrieval necessity (Tripathi,

2025). For instance, MongoDB works very well when you are working with semi structured data

such as JSON documents, where as when you are working with relationship data in social

networks or recommendation engine, graph databases such as Neo4j are good. NoSQL types are

so diverse that organizational needs result in the usage of a best fit solution suitable for their

peculiar data architecture, performance and scalability requirements.

Can you provide a real-world example of where a NoSQL database is more suitable than a

SQL database and why?

A real-world example of a scenario where a NoSQL database is more suitable than a SQL

database is an e-commerce website such as Amazon. In such an application, information on the

product is very different depending on categories. Since it supports storing dynamic, schema-less

documents, it’s great to use a document-oriented NoSQL database such as MongoDB. Suppose

the book entry has fields like “author” and “ISBN”, but the shoe entry has fields such as “size”

and “color”. When trying to represent this in a relational database, the tables would be sparse,
3

consisting of many null values or it would require complex table designs. In addition to this,

NoSQL databases allow Amazon to scale horizontally to deal with great volumes of concurrent

users and massive data throughput. This is because this scalability and flexibility make NoSQL

the better choice for this scenario.

Can you provide a real-world example of where a relational database is more suitable than

a nonrelational database and why?

A real-world example where a relational database is more suitable is in banking systems.

Banks need to ensure that financial transactions are consistent, reliable, and meet ACID

(Atomicity, Consistency, Isolation, Durability) compliance. The other reason is that SQL based

relational databases such as Oracle or PostgreSQL are better fit for this purpose since they imply

strong schema definition, data integrity and transactional reliability (Tripathi, 2025). For

example, the system is responsible for ensuring that a debit and a corresponding credit operation

are completed successfully and in a single step when a user transfers money from one account to

another. There were major financial discrepancies, but any inconsistency would cause them. For

mission critical applications such as banking, eventual consistency is unacceptable, however,

nonrelational databases are more scalable.

In your opinion, what are the top two advantages of using MapReduce in big data

analytics?

In my opinion, the top two advantages of using MapReduce in big data analytics are its

scalability and fault tolerance. First, scalability is crucial because MapReduce allows the

processing of vast amounts of data across distributed computing resources. It reduces the size of

a problem by dividing it into separate chunks, processes them in parallel, and aggregates the
4

results just right (Abdalla et al., 2025). This makes it possible for organizations to conveniently

deal with terabytes or even petabytes of data. Second, the MapReduce framework is built with

fault tolerance. In case any node in the cluster goes down, system will automatically

reassignments the task to another node and this will not disturb the process of data processing.

MapReduce is highly reliable and therefore can be used in production for large scale data

analysis.

Can you provide an example of how MapRedcue enables sequential programming on a

computing cluster?

An example of how MapReduce enables sequential programming on a computing cluster

is in log analysis for a web application. To count the number of hits per URL for a server logs,

suppose a company desired this analysis. In the "Map" phase, the log files are read separately in

different nodes, and each URL along with count of 1 is extracted from each (Abdalla et al.,

2025). The "Reduce" phase sums all acts of each unique URL across the cluster. Map, shuffle,

reduce, is a logical, sequential flow of process, but these steps are run in parallel across many

machines. With the abstraction, simple sequential code can be written and the developers do not

need to explicitly manage complex parallelism.


5

References

Abdalla, H. B., Kumar, Y., Zhao, Y., & Tosi, D. (2025). A Comprehensive Survey of MapReduce

Models for Processing Big Data. Big Data and Cognitive Computing, 9(4), 77.

https://www.mdpi.com/2504-2289/9/4/77

Tripathi, N. (2025). NoSQL database education: A review of models, tools and teaching methods.

Journal of Systems and Software, 112391.

https://oulurepo.oulu.fi/bitstream/handle/10024/55103/nbnfioulu-202504142608.pdf?

sequence=1

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy