0% found this document useful (0 votes)
16 views7 pages

Yasir f29 Ass1 Bigdata

The document outlines prominent tools used in Big Data Analytics, including Apache Hadoop, Apache Spark, and Apache Kafka, detailing their features such as distributed processing, fault tolerance, and scalability. It also discusses the application of Big Data Analytics across various engineering fields, highlighting case studies in manufacturing, civil, environmental, electrical, and mechanical engineering that demonstrate its importance in solving complex problems. The significance of Big Data Analytics is emphasized through examples of predictive maintenance, infrastructure monitoring, and smart grid optimization.

Uploaded by

laoshisun69
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views7 pages

Yasir f29 Ass1 Bigdata

The document outlines prominent tools used in Big Data Analytics, including Apache Hadoop, Apache Spark, and Apache Kafka, detailing their features such as distributed processing, fault tolerance, and scalability. It also discusses the application of Big Data Analytics across various engineering fields, highlighting case studies in manufacturing, civil, environmental, electrical, and mechanical engineering that demonstrate its importance in solving complex problems. The significance of Big Data Analytics is emphasized through examples of predictive maintenance, infrastructure monitoring, and smart grid optimization.

Uploaded by

laoshisun69
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 7

NAME YASIR KHAN

ID F20603029
ASSIGNMENT # 1
SUBJECT BIG DATA ANALYTICS

Question#1 [Marks=10]

You are required to list down the prominent tools used in the field of Big Data Analytics
by exploring the books, internet sources and published research works. Furthermore,
provide the prominent features of each tool.

ANSWER:

1) Apache Hadoop:

Distributed storage and processing: Hadoop splits large datasets into smaller chunks and
distributes them across multiple computers in a cluster. This allows for faster processing since
each computer works on a smaller part of the data.

Fault tolerance: Hadoop is smart enough to handle computer failures. If one computer stops
working, Hadoop automatically copies the data stored on that computer to other computers in
the cluster, so the processing can continue without interruption.

Scalability: As your data grows, you can easily add more computers to your Hadoop cluster to
handle the extra load. This makes it easy to scale your data infrastructure as your business
grows.

MapReduce: MapReduce is a way of splitting up a big data processing task into smaller parts,
then combining the results. Hadoop uses this programming model to process large datasets in
parallel, making it faster and more efficient.
2) Apache Spark:

In-memory processing: Spark keeps data in memory as much as possible, which makes it
faster than systems like Hadoop that need to read and write from disk. This speed boost can
make a big difference when working with large datasets.

Unified analytics engine: Spark is a versatile tool that can handle a wide range of analytics
tasks, including SQL queries, machine learning, graph processing, and real-time streaming
analytics. This means you can use Spark for many different types of data analysis without
needing separate tools.

Fault tolerance: Just like Hadoop, Spark is designed to keep working even if some computers
in the cluster fail. It does this by keeping track of where data is stored and making copies as
needed to ensure nothing is lost.

Compatibility: Spark can run on different types of cluster management systems, including
Hadoop YARN, Apache Mesos, or Kubernetes. This flexibility makes it easy to integrate Spark
into your existing data infrastructure.

3) Apache Kafka:

Distributed messaging system: Kafka is a tool for storing and streaming data between
different systems or applications. It's designed to handle large volumes of data with high
throughput and low latency.

Horizontal scalability: Kafka can handle more data by adding more computers (called brokers)
to the cluster. This makes it easy to scale Kafka as your data needs grow.

Durability: Kafka stores data on disk, so it can recover even if the entire cluster goes down.
This makes it a reliable option for storing important data.

Stream processing: Kafka includes a feature called Kafka Streams that lets you process data
in real-time as it flows through the system. This can be useful for tasks like monitoring, fraud
detection, or real-time analytics.

4) Apache Flink:

Stream processing: Flink is designed to process streaming data in real-time, making it a good
choice for applications that need to react quickly to changing data.

Event time processing: Flink can handle data that arrives out of order or late, which is
common in streaming applications. It does this by keeping track of when each event occurred,
rather than just processing them in the order they arrive.
Stateful computations: Flink lets you keep track of state across multiple events, which can be
useful for tasks like aggregating data or detecting patterns over time.

Fault tolerance: Flink automatically takes checkpoints of its state, so it can recover quickly if
something goes wrong. This makes it a reliable option for mission-critical applications.

5) Apache Cassandra:

Distributed NoSQL database: Cassandra is a database designed to handle large amounts of


data spread across multiple computers, without any single point of failure.

Linear scalability: Cassandra can handle more data by adding more computers to the cluster.
This makes it easy to scale as your data grows.

Tunable consistency: Cassandra lets you choose how consistent you want your data to be.
You can prioritize consistency (making sure all copies of the data are the same) or availability
(making sure you can always access the data), depending on your needs.

Built-in replication: Cassandra automatically copies data to multiple computers in the cluster,
so even if one computer fails, you can still access your data. This makes it a reliable option for
storing important data.

6) MongoDB:

Document-oriented database: MongoDB stores data in flexible, JSON-like documents, making


it easy to work with complex data structures.

Scalability: MongoDB can handle more data by spreading it across multiple computers, a
process called sharding. This makes it easy to scale as your data grows.

Indexing: MongoDB supports various types of indexes, which can speed up queries by telling
MongoDB where to look for data.

Replication and high availability: MongoDB automatically copies data to multiple computers
in the cluster, so even if one computer fails, you can still access your data. It also includes
features for automatic failover, so if one computer goes down, another one can take its place
without interrupting service.

7) Elasticsearch:

Distributed search and analytics engine: Elasticsearch is built on Apache Lucene and offers
a distributed search and analytics engine for handling large volumes of structured and
unstructured data across clusters.

Real-time indexing and search: It enables real-time indexing and search, allowing users to
ingest data and search for it immediately after ingestion.
Scalability and fault tolerance: Elasticsearch's distributed architecture ensures scalability by
allowing users to add more nodes to the cluster. It also provides fault tolerance through data
replication and automatic failover mechanisms.

Full-text search capabilities: Elasticsearch offers powerful full-text search capabilities,


including support for complex queries, aggregations, and filtering.

8) Splunk

Platform for machine-generated big data: Splunk is a comprehensive platform designed for
searching, monitoring, and analyzing large volumes of machine-generated data.

Real-time data ingestion: It offers real-time data ingestion capabilities with high throughput
and low latency, enabling organizations to quickly access and analyze data as it's generated.

Indexing and search: Splunk indexes and searches various types of machine data, including
logs, events, and other types of machine-generated data, facilitating rapid data retrieval and
analysis.

Monitoring and alerting: Splunk provides robust monitoring and alerting capabilities, allowing
users to set up alerts based on predefined conditions and receive proactive insights into system
performance and potential issues.

Question#2 [Marks=10]

Provide the application areas of Big Data Analytics in various Engineering fields to solve
the complex engineering problems. Furthermore, discuss the importance of Big Data
Analytics in Engineering domain with the help of different case studies.

ANSWER:

Big Data Analytics has made significant inroads across diverse engineering sectors,
fundamentally reshaping the approach to tackling intricate engineering challenges. Below, we
explore various applications of Big Data Analytics in different engineering domains alongside
illuminating case studies that underscore their critical importance:
1. Manufacturing Engineering:
 Predictive Maintenance: Leveraging sensor data analysis, Big Data Analytics
empowers predictive maintenance, as demonstrated by General Electric (GE)
who successfully implemented IoT sensors and analytics to anticipate and
prevent machinery failures. This initiative resulted in substantial reductions in
downtime and maintenance expenses.
 Quality Control: Real-time production data analysis proves instrumental in defect
identification and process optimization, as evidenced by Ford's utilization of big
data analytics to enhance vehicle quality. By scrutinizing warranty claims, Ford
achieved diminished defects and heightened customer satisfaction.
2. Civil Engineering:
 Infrastructure Monitoring: Big Data Analytics plays a pivotal role in monitoring
infrastructure health, exemplified by the Hong Kong government's
implementation of a real-time bridge monitoring system. By harnessing big data
analytics, structural safety is vigilantly ensured through continuous analysis of
data from sensors, cameras, and other sources.
 Traffic Management: Effective traffic flow optimization and congestion reduction
are achieved through the analysis of traffic data from sensors and GPS devices,
as illustrated by the city of Los Angeles. By employing big data analytics to refine
traffic signal timing, Los Angeles achieved noteworthy reductions in travel times
and fuel consumption.
3. Environmental Engineering:
 Climate Modeling: Climate scientists leverage Big Data Analytics to model
climate change and its repercussions, as demonstrated by the European Centre
for Medium-Range Weather Forecasts (ECMWF). By harnessing big data
analytics for weather prediction and climate research, ECMWF enhances climate
modeling accuracy and insights.
 Natural Disaster Management: Historical data analysis forms the backbone of
predicting and mitigating the impact of natural disasters, a practice exemplified
by NASA's Earth Science Division. By utilizing big data analytics to monitor
natural disasters and issue early warnings, NASA effectively mitigates disaster-
related risks.
4. Electrical Engineering:
 Smart Grids: Optimizing energy distribution in smart grids is achieved through
consumption pattern analysis and grid performance data scrutiny, as showcased
by Pacific Gas and Electric (PG&E). By leveraging big data analytics, PG&E
enhances grid reliability and efficiency, fostering sustainable energy distribution.
 Power Systems Optimization: Data analysis from power plants and transmission
grids fuels energy generation and distribution optimization, exemplified by
Siemens. Through the implementation of a big data analytics platform for energy
management, Siemens achieved cost reductions and improved grid stability.
5. Mechanical Engineering:
 Product Design and Optimization: Engineers harness Big Data Analytics to
optimize product designs through simulation data and performance metrics
analysis, epitomized by Boeing's aircraft design optimization efforts. By
leveraging big data analytics, Boeing achieves the development of fuel-efficient
and environmentally friendly aircraft.
 Supply Chain Management: Supply chain data analysis facilitates inventory
management, production scheduling, and logistics optimization, as demonstrated
by General Motors (GM). By employing big data analytics, GM optimizes its
supply chain, resulting in cost reductions and efficiency improvements.

Importance of big data analytics:

1. Predictive Maintenance in Manufacturing:


 Case Study: General Electric (GE) - GE implemented predictive
maintenance using IoT sensors and Big Data Analytics in its jet engines.
By analyzing sensor data to anticipate potential failures, GE reduced
unplanned downtime and maintenance costs significantly. This proactive
approach not only improved operational efficiency but also enhanced
safety by preventing unexpected equipment failures.
2. Infrastructure Monitoring in Civil Engineering:
 Case Study: Hong Kong Bridge Monitoring System - The Hong Kong
government implemented a real-time monitoring system for bridges using
Big Data Analytics. By analyzing data from sensors and cameras,
engineers can monitor the structural health of bridges and ensure timely
maintenance to prevent accidents. This proactive monitoring approach
enhances public safety and prolongs the lifespan of critical infrastructure.
3. Traffic Management in Transportation Engineering:
 Case Study: Los Angeles Traffic Signal Optimization - Los Angeles
utilized Big Data Analytics to optimize traffic signal timing, reducing travel
times and fuel consumption. By analyzing traffic data from sensors and
GPS devices, the city improved traffic flow and minimized congestion,
resulting in a more efficient transportation network and enhanced quality
of life for residents.
4. Climate Modeling in Environmental Engineering:
 Case Study: European Centre for Medium-Range Weather Forecasts
(ECMWF) - ECMWF utilizes Big Data Analytics for weather prediction and
climate research. By analyzing large datasets of weather and
environmental data, scientists can model climate change and its impact
more accurately. This data-driven approach provides valuable insights for
policymakers and helps mitigate the adverse effects of climate change.
5. Smart Grid Optimization in Electrical Engineering:
 Case Study: Pacific Gas and Electric (PG&E) - PG&E implemented a
smart grid system that utilizes Big Data Analytics to optimize energy
distribution. By analyzing consumption patterns and grid performance
data, PG&E improves grid reliability and efficiency, leading to cost savings
for both utility companies and consumers. This data-driven approach also
supports the integration of renewable energy sources and promotes
sustainable energy practices.
6. Product Design Optimization in Mechanical Engineering:
 Case Study: Boeing Aircraft Design Optimization - Boeing leverages
Big Data Analytics to optimize aircraft design and performance. By
analyzing simulation data and performance metrics, engineers can identify
areas for improvement and iterate on design iterations more efficiently.
This data-driven approach results in the development of fuel-efficient and
environmentally friendly aircraft, contributing to sustainability efforts in the
aviation industry.
7. Supply Chain Management in Engineering:
 Case Study: General Motors (GM) - GM utilizes Big Data Analytics to
optimize its supply chain operations. By analyzing supply chain data, GM
improves inventory management, production scheduling, and logistics,
resulting in cost reductions and efficiency improvements. This data-driven
approach enhances the company's competitiveness and ensures timely
delivery of high-quality products to customers.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy