Big Data Technology
Big Data Technology
Student’s Name
Institutional Affiliation
Professor’s Name
Course Names
We have many different NoSQL database types because modern data comes in various
forms and structures, and a single database model cannot efficiently accommodate all use cases.
Basic relational databases optimize data that has a predefined schema. But big data—semi
structured or unstructured—has become very prevalent and developers and organizations need
more flexible, scalable and high-performance alternatives. The four categories of NoSQL
databases are document stores, key value stores, column family stores, and graph databases.
Each of these is aimed at meeting a particular data storage and retrieval necessity (Tripathi,
2025). For instance, MongoDB works very well when you are working with semi structured data
such as JSON documents, where as when you are working with relationship data in social
networks or recommendation engine, graph databases such as Neo4j are good. NoSQL types are
so diverse that organizational needs result in the usage of a best fit solution suitable for their
Can you provide a real-world example of where a NoSQL database is more suitable than a
A real-world example of a scenario where a NoSQL database is more suitable than a SQL
product is very different depending on categories. Since it supports storing dynamic, schema-less
documents, it’s great to use a document-oriented NoSQL database such as MongoDB. Suppose
the book entry has fields like “author” and “ISBN”, but the shoe entry has fields such as “size”
and “color”. When trying to represent this in a relational database, the tables would be sparse,
3
consisting of many null values or it would require complex table designs. In addition to this,
NoSQL databases allow Amazon to scale horizontally to deal with great volumes of concurrent
users and massive data throughput. This is because this scalability and flexibility make NoSQL
Can you provide a real-world example of where a relational database is more suitable than
Banks need to ensure that financial transactions are consistent, reliable, and meet ACID
(Atomicity, Consistency, Isolation, Durability) compliance. The other reason is that SQL based
relational databases such as Oracle or PostgreSQL are better fit for this purpose since they imply
strong schema definition, data integrity and transactional reliability (Tripathi, 2025). For
example, the system is responsible for ensuring that a debit and a corresponding credit operation
are completed successfully and in a single step when a user transfers money from one account to
another. There were major financial discrepancies, but any inconsistency would cause them. For
In your opinion, what are the top two advantages of using MapReduce in big data
analytics?
In my opinion, the top two advantages of using MapReduce in big data analytics are its
scalability and fault tolerance. First, scalability is crucial because MapReduce allows the
processing of vast amounts of data across distributed computing resources. It reduces the size of
a problem by dividing it into separate chunks, processes them in parallel, and aggregates the
4
results just right (Abdalla et al., 2025). This makes it possible for organizations to conveniently
deal with terabytes or even petabytes of data. Second, the MapReduce framework is built with
fault tolerance. In case any node in the cluster goes down, system will automatically
reassignments the task to another node and this will not disturb the process of data processing.
MapReduce is highly reliable and therefore can be used in production for large scale data
analysis.
computing cluster?
is in log analysis for a web application. To count the number of hits per URL for a server logs,
suppose a company desired this analysis. In the "Map" phase, the log files are read separately in
different nodes, and each URL along with count of 1 is extracted from each (Abdalla et al.,
2025). The "Reduce" phase sums all acts of each unique URL across the cluster. Map, shuffle,
reduce, is a logical, sequential flow of process, but these steps are run in parallel across many
machines. With the abstraction, simple sequential code can be written and the developers do not
References
Abdalla, H. B., Kumar, Y., Zhao, Y., & Tosi, D. (2025). A Comprehensive Survey of MapReduce
Models for Processing Big Data. Big Data and Cognitive Computing, 9(4), 77.
https://www.mdpi.com/2504-2289/9/4/77
Tripathi, N. (2025). NoSQL database education: A review of models, tools and teaching methods.
https://oulurepo.oulu.fi/bitstream/handle/10024/55103/nbnfioulu-202504142608.pdf?
sequence=1