0% found this document useful (0 votes)

16 views7 pages

Yasir f29 Ass1 Bigdata

The document outlines prominent tools used in Big Data Analytics, including Apache Hadoop, Apache Spark, and Apache Kafka, detailing their features such as distributed processing, fault tolerance, and scalability. It also discusses the application of Big Data Analytics across various engineering fields, highlighting case studies in manufacturing, civil, environmental, electrical, and mechanical engineering that demonstrate its importance in solving complex problems. The significance of Big Data Analytics is emphasized through examples of predictive maintenance, infrastructure monitoring, and smart grid optimization.

Uploaded by

laoshisun69

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

16 views7 pages

Yasir f29 Ass1 Bigdata

Uploaded by

laoshisun69

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 7

NAME YASIR KHAN

ID F20603029
ASSIGNMENT # 1
SUBJECT BIG DATA ANALYTICS

Question#1 [Marks=10]

You are required to list down the prominent tools used in the field of Big Data Analytics
by exploring the books, internet sources and published research works. Furthermore,
provide the prominent features of each tool.

ANSWER:

1) Apache Hadoop:

Distributed storage and processing: Hadoop splits large datasets into smaller chunks and
distributes them across multiple computers in a cluster. This allows for faster processing since
each computer works on a smaller part of the data.

Fault tolerance: Hadoop is smart enough to handle computer failures. If one computer stops
working, Hadoop automatically copies the data stored on that computer to other computers in
the cluster, so the processing can continue without interruption.

Scalability: As your data grows, you can easily add more computers to your Hadoop cluster to
handle the extra load. This makes it easy to scale your data infrastructure as your business
grows.

MapReduce: MapReduce is a way of splitting up a big data processing task into smaller parts,
then combining the results. Hadoop uses this programming model to process large datasets in
parallel, making it faster and more efficient.
2) Apache Spark:

In-memory processing: Spark keeps data in memory as much as possible, which makes it
faster than systems like Hadoop that need to read and write from disk. This speed boost can
make a big difference when working with large datasets.

Unified analytics engine: Spark is a versatile tool that can handle a wide range of analytics
tasks, including SQL queries, machine learning, graph processing, and real-time streaming
analytics. This means you can use Spark for many different types of data analysis without
needing separate tools.

Fault tolerance: Just like Hadoop, Spark is designed to keep working even if some computers
in the cluster fail. It does this by keeping track of where data is stored and making copies as
needed to ensure nothing is lost.

Compatibility: Spark can run on different types of cluster management systems, including
Hadoop YARN, Apache Mesos, or Kubernetes. This flexibility makes it easy to integrate Spark
into your existing data infrastructure.

3) Apache Kafka:

Distributed messaging system: Kafka is a tool for storing and streaming data between
different systems or applications. It's designed to handle large volumes of data with high
throughput and low latency.

Horizontal scalability: Kafka can handle more data by adding more computers (called brokers)
to the cluster. This makes it easy to scale Kafka as your data needs grow.

Durability: Kafka stores data on disk, so it can recover even if the entire cluster goes down.
This makes it a reliable option for storing important data.

Stream processing: Kafka includes a feature called Kafka Streams that lets you process data
in real-time as it flows through the system. This can be useful for tasks like monitoring, fraud
detection, or real-time analytics.

4) Apache Flink:

Stream processing: Flink is designed to process streaming data in real-time, making it a good
choice for applications that need to react quickly to changing data.

Event time processing: Flink can handle data that arrives out of order or late, which is
common in streaming applications. It does this by keeping track of when each event occurred,
rather than just processing them in the order they arrive.
Stateful computations: Flink lets you keep track of state across multiple events, which can be
useful for tasks like aggregating data or detecting patterns over time.

Fault tolerance: Flink automatically takes checkpoints of its state, so it can recover quickly if
something goes wrong. This makes it a reliable option for mission-critical applications.

5) Apache Cassandra:

Distributed NoSQL database: Cassandra is a database designed to handle large amounts of

data spread across multiple computers, without any single point of failure.

Linear scalability: Cassandra can handle more data by adding more computers to the cluster.
This makes it easy to scale as your data grows.

Tunable consistency: Cassandra lets you choose how consistent you want your data to be.
You can prioritize consistency (making sure all copies of the data are the same) or availability
(making sure you can always access the data), depending on your needs.

Built-in replication: Cassandra automatically copies data to multiple computers in the cluster,
so even if one computer fails, you can still access your data. This makes it a reliable option for
storing important data.

6) MongoDB:

Document-oriented database: MongoDB stores data in flexible, JSON-like documents, making

it easy to work with complex data structures.

Scalability: MongoDB can handle more data by spreading it across multiple computers, a
process called sharding. This makes it easy to scale as your data grows.

Indexing: MongoDB supports various types of indexes, which can speed up queries by telling
MongoDB where to look for data.

Replication and high availability: MongoDB automatically copies data to multiple computers
in the cluster, so even if one computer fails, you can still access your data. It also includes
features for automatic failover, so if one computer goes down, another one can take its place
without interrupting service.

7) Elasticsearch:

Distributed search and analytics engine: Elasticsearch is built on Apache Lucene and offers
a distributed search and analytics engine for handling large volumes of structured and
unstructured data across clusters.

Real-time indexing and search: It enables real-time indexing and search, allowing users to
ingest data and search for it immediately after ingestion.
Scalability and fault tolerance: Elasticsearch's distributed architecture ensures scalability by
allowing users to add more nodes to the cluster. It also provides fault tolerance through data
replication and automatic failover mechanisms.

Full-text search capabilities: Elasticsearch offers powerful full-text search capabilities,

including support for complex queries, aggregations, and filtering.

8) Splunk

Platform for machine-generated big data: Splunk is a comprehensive platform designed for
searching, monitoring, and analyzing large volumes of machine-generated data.

Real-time data ingestion: It offers real-time data ingestion capabilities with high throughput
and low latency, enabling organizations to quickly access and analyze data as it's generated.

Indexing and search: Splunk indexes and searches various types of machine data, including
logs, events, and other types of machine-generated data, facilitating rapid data retrieval and
analysis.

Monitoring and alerting: Splunk provides robust monitoring and alerting capabilities, allowing
users to set up alerts based on predefined conditions and receive proactive insights into system
performance and potential issues.

Question#2 [Marks=10]

Provide the application areas of Big Data Analytics in various Engineering fields to solve
the complex engineering problems. Furthermore, discuss the importance of Big Data
Analytics in Engineering domain with the help of different case studies.

ANSWER:

Big Data Analytics has made significant inroads across diverse engineering sectors,
fundamentally reshaping the approach to tackling intricate engineering challenges. Below, we
explore various applications of Big Data Analytics in different engineering domains alongside
illuminating case studies that underscore their critical importance:
1. Manufacturing Engineering:
 Predictive Maintenance: Leveraging sensor data analysis, Big Data Analytics
empowers predictive maintenance, as demonstrated by General Electric (GE)
who successfully implemented IoT sensors and analytics to anticipate and
prevent machinery failures. This initiative resulted in substantial reductions in
downtime and maintenance expenses.
 Quality Control: Real-time production data analysis proves instrumental in defect
identification and process optimization, as evidenced by Ford's utilization of big
data analytics to enhance vehicle quality. By scrutinizing warranty claims, Ford
achieved diminished defects and heightened customer satisfaction.
2. Civil Engineering:
 Infrastructure Monitoring: Big Data Analytics plays a pivotal role in monitoring
infrastructure health, exemplified by the Hong Kong government's
implementation of a real-time bridge monitoring system. By harnessing big data
analytics, structural safety is vigilantly ensured through continuous analysis of
data from sensors, cameras, and other sources.
 Traffic Management: Effective traffic flow optimization and congestion reduction
are achieved through the analysis of traffic data from sensors and GPS devices,
as illustrated by the city of Los Angeles. By employing big data analytics to refine
traffic signal timing, Los Angeles achieved noteworthy reductions in travel times
and fuel consumption.
3. Environmental Engineering:
 Climate Modeling: Climate scientists leverage Big Data Analytics to model
climate change and its repercussions, as demonstrated by the European Centre
for Medium-Range Weather Forecasts (ECMWF). By harnessing big data
analytics for weather prediction and climate research, ECMWF enhances climate
modeling accuracy and insights.
 Natural Disaster Management: Historical data analysis forms the backbone of
predicting and mitigating the impact of natural disasters, a practice exemplified
by NASA's Earth Science Division. By utilizing big data analytics to monitor
natural disasters and issue early warnings, NASA effectively mitigates disaster-
related risks.
4. Electrical Engineering:
 Smart Grids: Optimizing energy distribution in smart grids is achieved through
consumption pattern analysis and grid performance data scrutiny, as showcased
by Pacific Gas and Electric (PG&E). By leveraging big data analytics, PG&E
enhances grid reliability and efficiency, fostering sustainable energy distribution.
 Power Systems Optimization: Data analysis from power plants and transmission
grids fuels energy generation and distribution optimization, exemplified by
Siemens. Through the implementation of a big data analytics platform for energy
management, Siemens achieved cost reductions and improved grid stability.
5. Mechanical Engineering:
 Product Design and Optimization: Engineers harness Big Data Analytics to
optimize product designs through simulation data and performance metrics
analysis, epitomized by Boeing's aircraft design optimization efforts. By
leveraging big data analytics, Boeing achieves the development of fuel-efficient
and environmentally friendly aircraft.
 Supply Chain Management: Supply chain data analysis facilitates inventory
management, production scheduling, and logistics optimization, as demonstrated
by General Motors (GM). By employing big data analytics, GM optimizes its
supply chain, resulting in cost reductions and efficiency improvements.

Importance of big data analytics:

1. Predictive Maintenance in Manufacturing:

 Case Study: General Electric (GE) - GE implemented predictive
maintenance using IoT sensors and Big Data Analytics in its jet engines.
By analyzing sensor data to anticipate potential failures, GE reduced
unplanned downtime and maintenance costs significantly. This proactive
approach not only improved operational efficiency but also enhanced
safety by preventing unexpected equipment failures.
2. Infrastructure Monitoring in Civil Engineering:
 Case Study: Hong Kong Bridge Monitoring System - The Hong Kong
government implemented a real-time monitoring system for bridges using
Big Data Analytics. By analyzing data from sensors and cameras,
engineers can monitor the structural health of bridges and ensure timely
maintenance to prevent accidents. This proactive monitoring approach
enhances public safety and prolongs the lifespan of critical infrastructure.
3. Traffic Management in Transportation Engineering:
 Case Study: Los Angeles Traffic Signal Optimization - Los Angeles
utilized Big Data Analytics to optimize traffic signal timing, reducing travel
times and fuel consumption. By analyzing traffic data from sensors and
GPS devices, the city improved traffic flow and minimized congestion,
resulting in a more efficient transportation network and enhanced quality
of life for residents.
4. Climate Modeling in Environmental Engineering:
 Case Study: European Centre for Medium-Range Weather Forecasts
(ECMWF) - ECMWF utilizes Big Data Analytics for weather prediction and
climate research. By analyzing large datasets of weather and
environmental data, scientists can model climate change and its impact
more accurately. This data-driven approach provides valuable insights for
policymakers and helps mitigate the adverse effects of climate change.
5. Smart Grid Optimization in Electrical Engineering:
 Case Study: Pacific Gas and Electric (PG&E) - PG&E implemented a
smart grid system that utilizes Big Data Analytics to optimize energy
distribution. By analyzing consumption patterns and grid performance
data, PG&E improves grid reliability and efficiency, leading to cost savings
for both utility companies and consumers. This data-driven approach also
supports the integration of renewable energy sources and promotes
sustainable energy practices.
6. Product Design Optimization in Mechanical Engineering:
 Case Study: Boeing Aircraft Design Optimization - Boeing leverages
Big Data Analytics to optimize aircraft design and performance. By
analyzing simulation data and performance metrics, engineers can identify
areas for improvement and iterate on design iterations more efficiently.
This data-driven approach results in the development of fuel-efficient and
environmentally friendly aircraft, contributing to sustainability efforts in the
aviation industry.
7. Supply Chain Management in Engineering:
 Case Study: General Motors (GM) - GM utilizes Big Data Analytics to
optimize its supply chain operations. By analyzing supply chain data, GM
improves inventory management, production scheduling, and logistics,
resulting in cost reductions and efficiency improvements. This data-driven
approach enhances the company's competitiveness and ensures timely
delivery of high-quality products to customers.

Big Data Tools For Data Analysis V1
No ratings yet
Big Data Tools For Data Analysis V1
44 pages
Data Management For Distributed Sensor Networks: A Literature Review
No ratings yet
Data Management For Distributed Sensor Networks: A Literature Review
68 pages
Asit Kumar Das - M5 SPARK
No ratings yet
Asit Kumar Das - M5 SPARK
24 pages
dSbDa MiniProject Case Study
No ratings yet
dSbDa MiniProject Case Study
10 pages
Hadoop Training in Bangalore
No ratings yet
Hadoop Training in Bangalore
38 pages
Hadoop
No ratings yet
Hadoop
21 pages
Big Data Analytics
100% (1)
Big Data Analytics
14 pages
1921 Dresser The Quimby Manuscripts
No ratings yet
1921 Dresser The Quimby Manuscripts
463 pages
Big Data Analytics
No ratings yet
Big Data Analytics
61 pages
Cours BI 23 24 Session 4 2
No ratings yet
Cours BI 23 24 Session 4 2
46 pages
Python
No ratings yet
Python
10 pages
Big Data Lab Manual
No ratings yet
Big Data Lab Manual
36 pages
Unit 5
No ratings yet
Unit 5
4 pages
DC Unit V
No ratings yet
DC Unit V
26 pages
Datascience Tools
No ratings yet
Datascience Tools
6 pages
Big Data Analytics
100% (3)
Big Data Analytics
79 pages
Open Source Technology For Big Data Analytics
No ratings yet
Open Source Technology For Big Data Analytics
2 pages
Data Science
No ratings yet
Data Science
87 pages
Class 8 - MongoDB, Neo4j, InfluxDB, Cassandra
No ratings yet
Class 8 - MongoDB, Neo4j, InfluxDB, Cassandra
2 pages
BDA - Unit 4
No ratings yet
BDA - Unit 4
18 pages
Big 1
No ratings yet
Big 1
4 pages
Big Data Complete Notes
No ratings yet
Big Data Complete Notes
33 pages
BD Unit 1,2
No ratings yet
BD Unit 1,2
12 pages
Wheel Loader
No ratings yet
Wheel Loader
28 pages
Detailed Lesson Plan in Arts (Landscape)
No ratings yet
Detailed Lesson Plan in Arts (Landscape)
4 pages
Big Data Analytics Tools and Technologies With Key Features
No ratings yet
Big Data Analytics Tools and Technologies With Key Features
2 pages
Comsats University Islamabad: Assignment #1
No ratings yet
Comsats University Islamabad: Assignment #1
4 pages
Last Min Preparation - Big Data
No ratings yet
Last Min Preparation - Big Data
5 pages
Unit-1 Introduction To Data Analytics
No ratings yet
Unit-1 Introduction To Data Analytics
35 pages
Ebooks File Professional Responsibility 5th Ed., Paperback Edition W. Bradley Wendel All Chapters
100% (14)
Ebooks File Professional Responsibility 5th Ed., Paperback Edition W. Bradley Wendel All Chapters
85 pages
Presentation 20
No ratings yet
Presentation 20
31 pages
Assignment Group 3
No ratings yet
Assignment Group 3
21 pages
2.2. Components of Hadoop - Analysing
No ratings yet
2.2. Components of Hadoop - Analysing
16 pages
Big Data Analytics Tools, BHARATH.S (Assignment-1)
No ratings yet
Big Data Analytics Tools, BHARATH.S (Assignment-1)
17 pages
Big Data Unit 1
No ratings yet
Big Data Unit 1
24 pages
Big Data With Hadoop
No ratings yet
Big Data With Hadoop
26 pages
Akbar
No ratings yet
Akbar
11 pages
BD by Maaz
No ratings yet
BD by Maaz
19 pages
Abhishek Paul 19SCSE2030072 Big Data and Technologies Assignment 3
No ratings yet
Abhishek Paul 19SCSE2030072 Big Data and Technologies Assignment 3
7 pages
Crimeprevpdf PDF
100% (1)
Crimeprevpdf PDF
68 pages
Akash High Scale Benchmarks
No ratings yet
Akash High Scale Benchmarks
74 pages
Unit 5
No ratings yet
Unit 5
14 pages
Analyzing Limitations and Solutions of Existing Data Analytics
No ratings yet
Analyzing Limitations and Solutions of Existing Data Analytics
21 pages
BDTools
No ratings yet
BDTools
15 pages
Top 4 Open Source Tools You Can Use To Handle Big Data
No ratings yet
Top 4 Open Source Tools You Can Use To Handle Big Data
64 pages
IOT and Comp - Architecture
No ratings yet
IOT and Comp - Architecture
17 pages
Big Data Technologies UNIT 1
No ratings yet
Big Data Technologies UNIT 1
5 pages
Basic Terms of DATA ENGINEERING
No ratings yet
Basic Terms of DATA ENGINEERING
9 pages
With Its Application Modern Data Analytic Tools
No ratings yet
With Its Application Modern Data Analytic Tools
5 pages
Data Analytics Unit-3 Notes
No ratings yet
Data Analytics Unit-3 Notes
21 pages
Big Data Pyq 21-22
No ratings yet
Big Data Pyq 21-22
9 pages
Dye or Die PDF
No ratings yet
Dye or Die PDF
5 pages
Uc PDF
No ratings yet
Uc PDF
10 pages
Introduction To Big Data PDF
No ratings yet
Introduction To Big Data PDF
16 pages
2
No ratings yet
2
7 pages
A Tangled Tale Carrol L.
No ratings yet
A Tangled Tale Carrol L.
57 pages
CT 2
No ratings yet
CT 2
8 pages
15 Big Data Tools and Technologies To Know About in 2021
No ratings yet
15 Big Data Tools and Technologies To Know About in 2021
7 pages
Activity: NAME: Chogle Saif Ali ROLLNO.: 12CO27 Class: Be-Co Summary: Components of Hadoop Ecosystem
No ratings yet
Activity: NAME: Chogle Saif Ali ROLLNO.: 12CO27 Class: Be-Co Summary: Components of Hadoop Ecosystem
5 pages
Bda Angel
No ratings yet
Bda Angel
5 pages
Business Plan For TNTS Standard Format
No ratings yet
Business Plan For TNTS Standard Format
21 pages
Introduction To Big Dat1
No ratings yet
Introduction To Big Dat1
6 pages
Bda Kar
No ratings yet
Bda Kar
5 pages
Speaking (Daily Activities)
100% (1)
Speaking (Daily Activities)
3 pages
Sub Unit 3
No ratings yet
Sub Unit 3
9 pages
Big Data Deals With Large Data Sets
No ratings yet
Big Data Deals With Large Data Sets
4 pages
Bangladesh University of Professionals: Submitted by Submitted To ID: Section: Batch
No ratings yet
Bangladesh University of Professionals: Submitted by Submitted To ID: Section: Batch
6 pages
Big Data Platforms
No ratings yet
Big Data Platforms
4 pages
BigData Unit 2
No ratings yet
BigData Unit 2
15 pages
Assessment of Existing Steel Structures - Reccomendations For Estimation of Exisitng Fatigue Life
No ratings yet
Assessment of Existing Steel Structures - Reccomendations For Estimation of Exisitng Fatigue Life
109 pages
Tools For Data Science
No ratings yet
Tools For Data Science
6 pages
Torres-Castaño Et Al., 2024 IJIC Empodera1 SDM
No ratings yet
Torres-Castaño Et Al., 2024 IJIC Empodera1 SDM
15 pages
Writ of Habeas Corpus
No ratings yet
Writ of Habeas Corpus
31 pages
All India Mock Test - 02
No ratings yet
All India Mock Test - 02
16 pages
26a CSS Tema Oscuro v2
No ratings yet
26a CSS Tema Oscuro v2
22 pages
Book Unit 2
No ratings yet
Book Unit 2
4 pages
Interview Writing
No ratings yet
Interview Writing
4 pages
Kamp (1974) Free Choice Permission
No ratings yet
Kamp (1974) Free Choice Permission
18 pages
1ST Periodical Test Result in Fil.4 2022 2023
No ratings yet
1ST Periodical Test Result in Fil.4 2022 2023
10 pages
Current Affairs Supplement 2018
No ratings yet
Current Affairs Supplement 2018
46 pages
MKU
No ratings yet
MKU
5 pages
Ftu 2.4.7
No ratings yet
Ftu 2.4.7
5 pages
ECO 350: Money and Banking: Professor Griffy
No ratings yet
ECO 350: Money and Banking: Professor Griffy
17 pages
Results For Quiz What Breed of Cat Are You
No ratings yet
Results For Quiz What Breed of Cat Are You
1 page
Just Culture (Aviation Safety Management Systems)
100% (1)
Just Culture (Aviation Safety Management Systems)
0 pages
Radial Brochure
No ratings yet
Radial Brochure
6 pages
An Examination of Audit Delay Further Evidence From New Zealand
No ratings yet
An Examination of Audit Delay Further Evidence From New Zealand
13 pages
1188 International Speech Contest Tiebreaking Judges Guide and Ballot
No ratings yet
1188 International Speech Contest Tiebreaking Judges Guide and Ballot
2 pages
Mass Attendance Mass Attendance: Kalubkob, Silang, Cavite Kalubkob, Silang, Cavite
No ratings yet
Mass Attendance Mass Attendance: Kalubkob, Silang, Cavite Kalubkob, Silang, Cavite
1 page
Real-Time Analytics: Techniques to Analyze and Visualize Streaming Data
From Everand
Real-Time Analytics: Techniques to Analyze and Visualize Streaming Data
Byron Ellis
No ratings yet

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Yasir f29 Ass1 Bigdata

Uploaded by

Yasir f29 Ass1 Bigdata

Uploaded by

NAME YASIR KHAN

Distributed NoSQL database: Cassandra is a database designed to handle large amounts of

Document-oriented database: MongoDB stores data in flexible, JSON-like documents, making

Full-text search capabilities: Elasticsearch offers powerful full-text search capabilities,

Importance of big data analytics:

1. Predictive Maintenance in Manufacturing:

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.