0% found this document useful (0 votes)

16 views18 pages

BDA QB Answers 8 To 15

Uploaded by

Raghu Nayak

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

16 views18 pages

BDA QB Answers 8 To 15

Uploaded by

Raghu Nayak

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 18

8. Explain the following:1.Data Sources 2.Data Quality 3.

Data Preprocessing

8.1 Data Sources :

1.5 Data Source
Applications, programs and tools use data. Sources can be external, such as sensors, trackers, web logs,
computersystemslogs and feeds. Sources can be machines, which source data from data-creating
programs. A source can be internal. Sources can be data repositories, such as database, relational
database, flat file, spreadsheet, mail server, web server, directory services, even text or files such as
comma-separated values (CSV) files. Source may be a data store for applications
Types of Data Source: structured , semi-structured , multi-structured or unstructured
Structured Data Source
 Data source for ingestion, storage and processing can be a file, database or streaming data.
 The source may be on the same computer running a program or a networked computer
 Structured data sources are SQL Server, MySQL

Unstructured Data Source

 Unstructured data sources are distributed over high-speed networks.

 The data need high velocity processing. Sources are from distributed file systems.
 The sources are of file types, such as .txt (text file), .csv

8.2 Data Quality

High quality means data, which enables all the required operations, analysis, decisions, planning and knowledge
discovery correctly. Five R's as follows:

 Relevancy,
 recency,
 range,
 robustness
 reliability.

Factors Affecting Data Quality :

 Data Noise : Noise in data refers to data giving additional meaningless information besides true
(actual/required) information. Noise is random in character
 Outlier : An outlier in data refers to data, which appears to not belong to the dataset.For example, data that
is outside an expected range.
 Missing Value : Missing value implies data not appearing in the data set.
 Duplicate value : Duplicate value implies the same data appearing two or more times in a dataset

8.3 Data Pre-processing :

Data pre-processing is an important step at the ingestion layer.Pre-processing is a must before data mining and
analytics. Pre-processing is also a must before running a Machine Learning (ML) algorithm.

Pre-processing needs are:

 Dropping out of range, inconsistent and outlier values
 Filtering unreliable, irrelevant and redundant information
 Data cleaning, editing, reduction and/or wrangling
 Data validation, transformation or transcoding
 ELT processing

9. Discuss Data store export to cloud

The data pre-processing, data mining, analysis, visualization and data store. The data exports to cloud
services. The results integrate at the enterprise server or data warehouse.
Data store export to the cloud refers to transferring data from on-premise storage systems or local servers
to a cloud environment. Cloud storage provides scalable, secure, and cost-effective options for storing
large datasets.

Data Store first pre-processes from machine and file data sources. Pre-processing transforms the data in
table or partition schema or supported data formats. For example, JSON, CSV and AVRO. Data then exports
in compressed or uncompressed data formats.
Cloud service BigQuery consists of bigquery.tables.create; bigquery.dataEditor; bigquery.dataOwner;
bigquery.admin; bigquery.tables.updateData and other service functions. Analytics uses Google Analytics
360. BigQuery cloud exports data to a Googlecloud or cloud backup only.
Data store export to the cloud refers to transferring data from on-premise storage systems or local servers
to a cloud environment. Cloud storage provides scalable, secure, and cost-effective options for storing
large datasets, which is crucial for Big Data analytics.
Apache Sqoop is a key tool used for data transfers between relational databases and Hadoop Distributed File
Systems (HDFS). It facilitates the process of exporting and importing data, particularly between RDBMS systems (like
MySQL, PostgreSQL, and SQL Server) and cloud-based Hadoop clusters.

Steps for Exporting Data to the Cloud:

1. Establish Connection: Sqoop connects to the relational database via JDBC (Java Database
Connectivity). The system gathers metadata and examines the database for the data being transferred.
2. Data Export: Sqoop submits a map-only Hadoop job that divides the input dataset into splits and
transfers each split using individual map tasks. This ensures efficient use of resources and allows
scalable export to cloud-based HDFS.
3. Cloud Storage Destination: The exported data is usually placed in a directory in HDFS or a similar
cloud-based distributed file system. HDFS is designed to handle large datasets across multiple
servers, making it ideal for cloud environments.

Benefits of Exporting Data to the Cloud

1. Scalability

2. Cost Effectiveness:

3. Accessibility

4. Disaster Recovery

5. Security:

10.List the characteristics of Big Data platform

A Big Data platform supports large datasets and volume of data. The data generate at a higher velocity, in
more varieties or in higher veracity. Managing Big Data requires large resources of MPPs, cloud, parallel
processing and specialized tools. Bigdata platform should provision tools and services for:

 storage, processing and analytics,

 developing, deploying, operating and managing Big Data environment,
 reducing the complexity of multiple data sources and integration of applications into one cohesive
solution,
 custom development, querying and integration with other systems, and
 the traditional as well as Big Data techniques.

Characteristics of a Big Data Platform

Innovative Non-Traditional Methods: Utilizes advanced techniques for storage, processing, and analytics
that go beyond traditional approaches to handle complex data efficiently.
 Distributed Data Stores: Data is distributed across multiple nodes to ensure redundancy, fault
tolerance, and improved performance.
 Scalability and Elasticity: Cloud computing platforms allow seamless scalability and elasticity,
enabling the system to grow and shrink based on demand.
 High Volume Data Stores: Capable of storing and managing massive volumes of data, often running
into petabytes or more.
 Massive Parallelism: Executes multiple operations concurrently, leveraging parallel processing to
improve speed and efficiency.
 High-Speed Networks: Relies on high-speed networking infrastructure to facilitate quick data
transfer and low-latency communication between nodes.
 High-Performance Processing: Employs optimized and fine-tuned processing techniques to ensure
high performance for both batch and real-time data analytics.
 NoSQL Data Management Models: Utilizes NoSQL databases to handle unstructured and semi-
structured data efficiently, offering flexibility and scalability.
 In-Memory Processing: Uses in-memory data processing for faster transaction and query
performance, suitable for both OLAP (Online Analytical Processing) and OLTP (Online Transaction
Processing) systems.
 Comprehensive Data Analytics: Includes capabilities for data retrieval, mining, reporting,
visualization, and advanced analytics to extract insights from large datasets.
 Graph Databases: Supports graph databases for analyzing relationships and patterns within social
network data and other interconnected datasets.
 Machine Learning: Integrates machine learning algorithms and models to derive predictive and
prescriptive insights from data.
 Diverse Data Sources: Ingests data from a wide range of sources such as data storages, data
warehouses, big data solutions like Oracle Big Data, MongoDB NoSQL, Cassandra NoSQL, and more.
 Real-Time Data Sources: Captures and processes data from real-time sources including sensors,
financial transaction audit trails, web, social media, weather data, and health records.

11.How does Toy company can optimize the benefits using Big Data Analytics

Optimizing Benefits for a Toy Company Using Big Data Analytics

To leverage Big Data Analytics effectively, a toy company can follow several strategies, drawing insights fro
m various industry practices:
1. Customer Acquisition and Retention

 Personalized Experience: By analyzing customer data, the toy company can tailor marketing
campaigns to individual preferences, similar to how Amazon does it. This can be based on pa
st purchases, browsing behavior, and demographic information.

 Loyalty Programs: Utilize data to identify patterns and trends that foster customer loyalty, e
nsuring that the marketing efforts are directed toward retaining valuable customers.

2. Focused and Targeted Campaigns

 Segmented Marketing: Use Big Data to identify specific customer segments and target them
with customized advertising campaigns. For instance, the company can deliver ads through
SMS, e-mails, WhatsApp, LinkedIn, Facebook, and Twitter.

 Ad Optimization: Real-
time analytics can help in understanding the effectiveness of various advertising channels an
d campaigns, allowing the company to allocate resources efficiently.

3. Innovative Product Development

 Product Insights: Analyze feedback from various sources such as social media, customer revi
ews, and purchase data to innovate and improve products. This will ensure the toys meet cu
stomer expectations and stay competitive.

 Trend Analysis: Utilize Big Data to spot emerging trends in the market, enabling the compan
y to develop products that are in line with current consumer interests.

4. Detection of Marketing Frauds

 Fraud Prevention: Big Data analytics can help in detecting and preventing marketing frauds
by merging existing data with information from social media, websites, blogs, and emails. Th
is enriched data set can identify suspicious activities and prevent fraudulent transactions.

5. Risk Management

 Identify Risks: Implement Big Data solutions to improve risk management models, allowing t
he company to develop smarter strategies for navigating high-risk environments.

 Predictive Analytics: Use predictive analytics to foresee potential issues in the supply chain,
production, or market trends, enabling preemptive measures.

6. Supply Chain Optimization

 Enhanced Collaboration: By analyzing data across the supply chain, the company can foster
high-level collaboration among suppliers, improving efficiency and reducing constraints.

 Performance Tracking: Track supplier performance and optimize inventory management, en

suring a smooth supply chain operation.

7. Customer Value Analytics (CVA)

 Understand Customer Needs: Use CVA to analyze what customers really want from the prod
ucts, ensuring that the toys deliver both perceived and desired value.

 Consistent Customer Experience: Implement insights from CVA to provide consistent and de
lightful customer experiences, akin to leading marketers like Amazon.

12.Explain the usage of Big data analytics: i)to detect Marketing

Frauds ii)in medicine iii)advertising

: i)to detect Marketing Frauds

Big Data Analytics in Detection of Marketing Frauds

Big Data analytics enable fraud detection. Big Data usages has the following features-for

enabling detection and prevention of frauds:

 Fusing of existing data at an enterprise data warehouse with data from sources such as social

 media, websites, biogs, e-mails, and thus enriching existing data

 Using multiple sources of data and connecting with many applications

 Providing greater insights using querying of the multiple source data

 Analyzing data which enable structured reports and visualization

 Providing high volume data mining, new innovative applications and thus leading to new

 business intelligence and knowledge discovery

ii)in medicine
Big Data analytics deploys large volume of data to identify and derive intelligence using predictive models about

individuals. Big Data driven approaches help in research in medicine which can help patients Following are some

findings: building the health profiles of individual patients and predicting models for diagnosing better and offer

better treatment, Aggregating large volume and variety of information around from multiple sources the DNAs,

proteins, and metabolites to cells, tissues, organs, organisms, and ecosystems, that can enhance the understanding

of biology of diseases. Big data creates patterns and models by data mining and help in better understanding and

research, Deploying wearable devices data, the devices data records during active as well as inactive periods,

provide better understanding of patient health, and better risk profiling the user for certain diseases.

iii)advertising

The impact of Big Data is tremendous on the digital advertising industry. The digital advertising industry
sends advertisements using SMS, e-mails, WhatsApp, Linkedln, Facebook, Twitter and other mediums. Big
Data captures data of multiple sources in large volume, velocity and variety of data unstructured and
enriches the structured data at the enterprise data warehouse.

Big data real time analytics provide emerging trends and patterns, and gain actionable insights for facing
competitions from similar products. The data helps digital advertisers to discover new relationships, lesser
competitive regions and areas. Success from advertisements depend on collection, analyzing and mining.
The new insights enable the personalization and targeting the online, social media and mobile for
advertisements called hyper-localized advertising

Advertising on digital medium needs optimization. Too much usage can also effect negatively. Phone calls,
SMSs, e-mail-based advertisements can be nuisance if sent without appropriate researching on the
potential targets. The analytics help in this direction. The usage of Big Data after appropriate filtering and
elimination is crucial enabler of BigData Analytics with appropriate data, data forms and data handling in
the right manner.

13.How are Big Data used in i)Chocolate company ii)Automobile industry

The chocolate industry is undergoing significant transformation through digitalization and the incorporation
of Big Data Analytics. Here's how a chocolate company can optimize benefits using Big Data:
1. Digitalization of Chocolate Manufacturing

 IoT and Smart Factories: Using Internet of Things (IoT) technologies, such as digital twins, d
ata analytics, and AI, to create interconnected smart factories. This real-
time data integration aids in decision-
making and streamlines processes, leading to increased productivity and reduced operational
costs.

 Automated Processes: Machines can switch automatically between different production stage
s (dosing, mixing, refining, conching) and adapt to various recipes and product types, enhanci
ng flexibility and efficiency.

2. Monitoring and Improving Food Safety

 Data-
Driven Risk Assessment: By analyzing data related to raw materials, companies can identify
and mitigate potential food safety risks. Big Data allows for the identification of prevalent ris
ks associated with raw materials, comparative evaluation of risks based on origin, and critical
evaluation of suppliers.

 Real-Time Monitoring: Utilizing digital tools to monitor nearly 400 unique chocolate-
related food safety incidents globally. This enables proactive measures and timely actions to
prevent incidents.

3. Optimizing Supply Chain and Inventory Management

 Demand Forecasting: By analyzing consumer data and market trends, chocolate companies ca
n predict demand more accurately, ensuring optimal inventory levels and reducing waste.

 Supplier Evaluation: Big Data helps in selecting the best suppliers by evaluating their risk pro
files and performance, ensuring consistent quality and reliability in raw materials.

4. Enhancing Customer Experience

 Personalized Marketing: Leveraging data analytics to understand customer preferences and b

ehavior, enabling targeted and personalized marketing campaigns. This helps in building stro
nger customer relationships and loyalty.

 Product Innovation: Using insights from customer feedback and market trends to innovate ne
w products and improve existing ones, ensuring that offerings meet customer expectations.

5. Efficient Operations and Cost Reduction

 Automation and Robotics: Integrating robotic technology to handle various production tasks,
which reduces labor costs and increases efficiency. Flexible configurations allow for quick ad
justments to produce different products or cater to seasonal demands.

 Energy Management: Using data analytics to optimize energy consumption in the manufactur
ing process, leading to cost savings and sustainable practices.

6. Fraud Detection and Prevention

 Comprehensive Data Analysis: Merging data from various sources (social media, websites, e
mails) with enterprise data to detect and prevent marketing frauds. This enriched data set pro
vides deeper insights into potential fraud risks.

References :
Digital Bytes Make Better Bites: The Digitalization of Chocolate Manufacturing | News & Insights | Gray

Chocolate and Big Data: The Recipe for Food Safety Is Changing - FoodSafetyTech

ii)Automobile industry

The automobile industry is leveraging Big Data to drive innovation, enhance customer experiences, and im
prove operational efficiencies. Here’s how:

1. Product Development and Innovation

 R&D Activities: Automobile companies use Big Data analytics to streamline research and dev
elopment processes. By analyzing vast datasets, companies can identify emerging trends, cu
stomer preferences, and technological advancements.

 Strategic Partnerships: Collaborations, such as National Instruments Corporation (NIC) acqui

ring Heinzinger GmbH’s electronic vehicle systems division, enhance capabilities in electrific
ation, battery testing, and sustainable energy.

2. Supply Chain and Manufacturing

 Optimized Manufacturing: Big Data helps in optimizing manufacturing processes by providin

g real-time insights into production lines, reducing downtime, and improving efficiency.

 Inventory Management: Predictive analytics ensures optimal inventory levels, reducing wast
e and cost.

3. Connected Vehicles and Intelligent Transportation

 Telematics Data: Collecting and analyzing data from connected vehicles to enhance safety, p
erformance, and user experience. This includes monitoring vehicle health, driving patterns,
and real-time navigation.

 Intelligent Transportation Systems: Improving traffic management and reducing congestion

through data-driven insights.

4. Customer Behavior Analytics

 Customer Retention: Using Big Data to understand customer behavior, preferences, and sati
sfaction levels. This helps in creating personalized marketing strategies and improving custo
mer retention rates.

 Customer Experience: Analyzing data from multiple customer interactions to enhance the ov
erall customer experience, which is crucial for long-term loyalty and competitive edge.

5. OEM Warranty and Aftersales/Dealers

 Predictive Maintenance: Analyzing data from vehicle sensors to predict potential issues befo
re they occur, reducing downtime and improving customer satisfaction.

 Aftermarket Services: Providing tailored services and solutions based on data-

driven insights to enhance customer satisfaction and loyalty.

6. Sales, Marketing, and Other Applications

 Targeted Marketing: Utilizing Big Data to design and execute highly targeted marketing cam
paigns that resonate with specific customer segments.

 Sales Forecasting: Predicting sales trends based on historical data and market analysis to inf
orm strategic decisions.

7. Risk Management

 Fraud Detection: Combining data from various sources (social media, websites, blogs) with e
nterprise data to detect and prevent fraud.

 Risk Assessment: Using Big Data to evaluate and manage risks associated with raw materials
, suppliers, and market conditions.

8. Global and Regional Market Insights

 Market Segmentation: The big data market in the automotive industry is segmented by appli
cation and geography, allowing companies to tailor strategies based on regional and applica
tion-specific insights.

 Competitive Edge: Companies like IBM, Microsoft, and SAP are leading the charge, offering a
dvanced Big Data solutions tailored to the automotive industry.
References : https://www2.deloitte.com/content/dam/Deloitte/ch/Documents/manufacturing/deloitte-ch-auto-
automotive-news-supplement.pdf

14.What is Hadoop?Explain the core components of Hadoop.

Hadoop is a computing environment in which input data stores, processes and stores the
results. The environment consists of clusters which distribute at the cloud or set of servers.
Each cluster consists of a string of data files constituting data blocks. The toy named
Hadoop consisted of a stuffed elephant. The Hadoop system cluster stuffs files in data
blocks.

The complete system consists of a scalable distributed set of clusters. Infrastructure consists of cloud for clusters. A
cluster consists of sets of computers or PCs. The Hadoop platform provides a low cost Big Data platform, which is
open source and uses cloud services. Tera Bytes of data processing takes just few minutes. Hadoop enables
distributed processing of large datasets (above 10 million bytes) across clusters of computers using a programming
model called MapReduce.

The system characteristics are scalable, self- manageable, self-healing and distributed file system. Scalable means
can be scaled up (enhanced) by adding storage and processing units as per the requirements.
The Hadoop core components of the framework are:

 Hadoop Common - The common module contains the libraries and utilities that are
required by the other modules of Hadoop. For example, Hadoop common provides
various components and interfaces for distributed file system and general
input/output. This includes serialization, Java RPC (Remote Procedure Call) and file-
based data structures.
 Hadoop Distributed File System (HDFS) - A Java-based distributed file system which
can store all kinds of data on the disks at the clusters.
 MapReduce vl - Software programming model in Hadoop 1 using Mapper and
Reducer. The vl processes large sets of data in parallel and in batches.
 YARN - Software for managing resources for computing. The user application tasks or
sub- tasks run in parallel at the Hadoop, uses scheduling and handles the requests for
the resources in distributed running of the tasks.
 MapReduce v2 - Hadoop 2 YARN-based system for parallel processing of large
datasets and distributed processing of the application tasks

15.Explain Hadoop Ecosystem with a neat Diagram

The Hadoop Ecosystem is a collection of open-source components and tools that work
together to enable the storage, processing, and analysis of large datasets in a distributed
environment. This ecosystem is built around the Hadoop framework and provides a variety
of solutions for managing Big Data. Here’s a detailed explanation of the core components
and tools of the Hadoop ecosystem:

Core Components of Hadoop

1. Hadoop Distributed File System (HDFS):

o A distributed file system designed to store large data sets across multiple
nodes.

o It provides fault-tolerant storage by replicating data blocks across different

nodes, ensuring data availability even in the event of hardware failures.

o HDFS stores data in large blocks (default is 128 MB), which are distributed
across different nodes in a cluster, enabling parallel processing.

2. MapReduce:
o A programming model used for processing large datasets in parallel across a
Hadoop cluster.

o It consists of two main functions: Map, which processes and filters data, and
Reduce, which aggregates the output of the Map phase to provide final results.

o It allows the distribution of tasks across many nodes, enabling efficient

processing of large data.

3. YARN (Yet Another Resource Negotiator):

o YARN is the resource management layer of Hadoop.

o It manages and schedules jobs by allocating system resources for the execution
of tasks across the nodes in a cluster.

o It allows multiple applications to run simultaneously and handle large-scale

distributed data processing.

Key Benefits of the Hadoop Ecosystem:

 Scalability: The ecosystem allows for easy scaling by adding more nodes to handle increasing
amounts of data and processing.
 Fault Tolerance: Data is replicated across multiple nodes, ensuring that it is available even if
hardware fails.
 Cost-Effective: Built on open-source technologies and designed to run on commodity hardware,
Hadoop offers a cost-efficient solution for handling big data.
 Flexibility: The ecosystem provides tools for various tasks, from data storage and processing to
machine learning and real-time analytics.

Hadoop Ecosystem Tools (BDA(18CS72)Module-2):

1. ZooKeeper:
o A coordination service for distributed applications, ensuring synchronization across clusters.
o ZooKeeper handles configuration management, name service, and failure recovery in a
distributed environment.
o It manages the distributed systems by controlling access to shared resources and resolving
issues like race conditions and deadlocks.
2. Oozie:
o A workflow scheduler system designed to manage and run Hadoop jobs.
o It can orchestrate complex job workflows by chaining multiple tasks and handling
dependencies between them.
o It uses Directed Acyclic Graphs (DAGs) to represent workflows and supports time-based
triggers for running recurrent jobs.
3. Sqoop:
o A tool used to transfer data between Hadoop and relational databases (such as MySQL,
Oracle, PostgreSQL).
o It supports both import and export functions, enabling data movement from databases to
HDFS, and from Hadoop back into relational systems.
4. Flume:
o A tool for collecting, aggregating, and transporting large volumes of streaming data to HDFS.
o Often used to ingest log data from various sources, such as social media, web servers, and
sensor networks, into Hadoop.
o It provides fault-tolerance and reliable data flow mechanisms, ensuring efficient handling of
large data streams.
5. HBase:
o A non-relational, distributed, column-oriented database that runs on top of HDFS.
o HBase is used for real-time read/write access to large datasets, offering random access to
billions of rows and millions of columns.
o It is modeled after Google’s Bigtable and provides scalability, fault tolerance, and
consistency.
6. Hive:
o A data warehouse software that facilitates querying and managing large datasets stored in
HDFS using a SQL-like query language called HiveQL.
o Hive translates SQL queries into MapReduce jobs, making it easier for users familiar with
SQL to work with large datasets in Hadoop.
o It supports batch processing and is optimized for read-heavy workloads.
7. Pig:
o A high-level platform for creating MapReduce programs using a scripting language called
Pig Latin.
o Pig is designed for processing large datasets and simplifies the writing of complex data
transformations compared to Java MapReduce.
o It allows users to focus on the data flow without worrying about the underlying MapReduce
details.
8. Mahout:
o A machine learning library that provides scalable algorithms for clustering, classification, and
collaborative filtering on large datasets.
o Mahout leverages Hadoop and MapReduce to handle data mining and machine learning
tasks, enabling pattern discovery in big data.
9. Ambari:
o A web-based management tool that simplifies the provisioning, monitoring, and management
of Hadoop clusters.
o Ambari provides an intuitive user interface and REST APIs for managing cluster health,
configuring security, and monitoring various Hadoop components.
o It is widely used to automate the management of Hadoop clusters, making it easier to
administer and maintain large distributed systems.

Notes - Big Data Analytics Unit I, Ii, Iii
No ratings yet
Notes - Big Data Analytics Unit I, Ii, Iii
39 pages
Unit 5
No ratings yet
Unit 5
68 pages
Big Data Introduction
No ratings yet
Big Data Introduction
7 pages
Big Data Analysis by Deshbandhu
No ratings yet
Big Data Analysis by Deshbandhu
368 pages
Unit 1 Topic 2 Big Data Platform
No ratings yet
Unit 1 Topic 2 Big Data Platform
31 pages
Notes For DMML
No ratings yet
Notes For DMML
27 pages
Data Analytics Notes Unit 1
No ratings yet
Data Analytics Notes Unit 1
23 pages
BDA Unit 1
No ratings yet
BDA Unit 1
17 pages
Module 1-BDA
No ratings yet
Module 1-BDA
82 pages
Data Science and Big Data Analytics Unit 1 Notes
No ratings yet
Data Science and Big Data Analytics Unit 1 Notes
13 pages
Unit 1
No ratings yet
Unit 1
20 pages
21CS71 Imp
No ratings yet
21CS71 Imp
29 pages
Data Analytics
No ratings yet
Data Analytics
69 pages
Unit 1 Big Data
No ratings yet
Unit 1 Big Data
15 pages
Module 1
No ratings yet
Module 1
29 pages
Module 1 ML Chapter2
No ratings yet
Module 1 ML Chapter2
56 pages
BDA UNIT-1 (Lecture-1)
No ratings yet
BDA UNIT-1 (Lecture-1)
5 pages
Big Data Analytics
No ratings yet
Big Data Analytics
21 pages
Team 8, Monika Kashyap, 085
No ratings yet
Team 8, Monika Kashyap, 085
11 pages
Bda Unit 1
No ratings yet
Bda Unit 1
27 pages
DSBDA EndSem2023 12F FlyHigh
No ratings yet
DSBDA EndSem2023 12F FlyHigh
20 pages
BIG DATA 1 Unit
100% (1)
BIG DATA 1 Unit
17 pages
Big Data in Business
No ratings yet
Big Data in Business
13 pages
Internal 1
No ratings yet
Internal 1
19 pages
Bda Unit-1 Notes
No ratings yet
Bda Unit-1 Notes
10 pages
Unit 1 Big Data
No ratings yet
Unit 1 Big Data
124 pages
Data Analytics
No ratings yet
Data Analytics
20 pages
BDA Assignment L9
No ratings yet
BDA Assignment L9
7 pages
Unit-1 Introduction To Data Analytics
No ratings yet
Unit-1 Introduction To Data Analytics
35 pages
Big Data Analytics
No ratings yet
Big Data Analytics
10 pages
IM08
No ratings yet
IM08
36 pages
CCD Chapter 3 Notes
No ratings yet
CCD Chapter 3 Notes
11 pages
Bda Unit1
No ratings yet
Bda Unit1
19 pages
UNIT-1 BigData
No ratings yet
UNIT-1 BigData
10 pages
Lecture 2 - Hadoop 221
No ratings yet
Lecture 2 - Hadoop 221
28 pages
Partiunit5introduction To Big Data Its Type and Advantagedisadvantages
No ratings yet
Partiunit5introduction To Big Data Its Type and Advantagedisadvantages
4 pages
Big Data Analytics 1
No ratings yet
Big Data Analytics 1
22 pages
BDA Assignment 1: Big Data Features and Characteristics
No ratings yet
BDA Assignment 1: Big Data Features and Characteristics
14 pages
Unit 1
No ratings yet
Unit 1
11 pages
Fundamentals of Big Data and Business Analytics - Assignment June 2021 K...
No ratings yet
Fundamentals of Big Data and Business Analytics - Assignment June 2021 K...
9 pages
Big Data Analytics Project Proposal by Slidesgo
No ratings yet
Big Data Analytics Project Proposal by Slidesgo
12 pages
Bigdata Notes
No ratings yet
Bigdata Notes
136 pages
DA-1,2,3 (1) Merged
No ratings yet
DA-1,2,3 (1) Merged
39 pages
CS8091 BIGDATA ANALYTICS QUESTION BANK - Watermark
No ratings yet
CS8091 BIGDATA ANALYTICS QUESTION BANK - Watermark
95 pages
File 1
No ratings yet
File 1
3 pages
Business Analytics Notes
No ratings yet
Business Analytics Notes
31 pages
C# Lab Manual As On 23-01-2024
No ratings yet
C# Lab Manual As On 23-01-2024
34 pages
Assignment - Fundamentals of Big Data and Business Analytics
No ratings yet
Assignment - Fundamentals of Big Data and Business Analytics
9 pages
21RMI56 Notes
No ratings yet
21RMI56 Notes
85 pages
Hadoop Report
No ratings yet
Hadoop Report
110 pages
11-12 Big Data Concepts and Tools
No ratings yet
11-12 Big Data Concepts and Tools
30 pages
6th Sem DS Syllabus 2022 Scheme
No ratings yet
6th Sem DS Syllabus 2022 Scheme
54 pages
Ccs 334
No ratings yet
Ccs 334
16 pages
21RMI56 Notes
No ratings yet
21RMI56 Notes
126 pages
Hadoop Lab
100% (2)
Hadoop Lab
6 pages
Data Pipelines From Zero To Solid
No ratings yet
Data Pipelines From Zero To Solid
58 pages
Course Outcome 2.6.1 - 0
No ratings yet
Course Outcome 2.6.1 - 0
82 pages
Data Science and Big Data Analytics
No ratings yet
Data Science and Big Data Analytics
2 pages
Apache Flink
No ratings yet
Apache Flink
116 pages
Chapter One and Two
No ratings yet
Chapter One and Two
7 pages
Hive - PIG - HBase - Zookeeper
100% (1)
Hive - PIG - HBase - Zookeeper
31 pages
Exp 1-2
No ratings yet
Exp 1-2
9 pages
Thesis Writing Services in Delhi
100% (3)
Thesis Writing Services in Delhi
8 pages
Prakash, Chandra - Google Cloud Professional Data Engineer Practice Tests 2019 - GCP Data Engineer Dumps 2019. 100 - Unconditional Pass Guarantee Ex (2019, 万千书友聚集地) - Libgen.li
No ratings yet
Prakash, Chandra - Google Cloud Professional Data Engineer Practice Tests 2019 - GCP Data Engineer Dumps 2019. 100 - Unconditional Pass Guarantee Ex (2019, 万千书友聚集地) - Libgen.li
141 pages
Introduction To Hadoop: Dr. G Sudha Sadhasivam Professor, CSE PSG College of Technology Coimbatore
No ratings yet
Introduction To Hadoop: Dr. G Sudha Sadhasivam Professor, CSE PSG College of Technology Coimbatore
34 pages
Breadth-First Graph Search Using An Iterative Map-Reduce Algorithm
No ratings yet
Breadth-First Graph Search Using An Iterative Map-Reduce Algorithm
25 pages
Group C9
No ratings yet
Group C9
32 pages
CG Se
No ratings yet
CG Se
80 pages
Unit 3 - Bda
No ratings yet
Unit 3 - Bda
36 pages
Survey of Research On Confidential Computing
No ratings yet
Survey of Research On Confidential Computing
22 pages
Vikram 1DT22CS415 Report
No ratings yet
Vikram 1DT22CS415 Report
16 pages
Hive Commands Simplin
No ratings yet
Hive Commands Simplin
5 pages
Resume Template 1
No ratings yet
Resume Template 1
2 pages
Dbms Manual 2023 24
No ratings yet
Dbms Manual 2023 24
57 pages
Sowmya Marripeddi
No ratings yet
Sowmya Marripeddi
9 pages
Sol Big Data and Analytics Jan-2024
No ratings yet
Sol Big Data and Analytics Jan-2024
12 pages
Hadoop Mini Project
No ratings yet
Hadoop Mini Project
8 pages
Database 2 - C309 3mise: Syllabus
No ratings yet
Database 2 - C309 3mise: Syllabus
17 pages
Technical & Script - Kiddies Groups Presents: Udemy Machine Learning
No ratings yet
Technical & Script - Kiddies Groups Presents: Udemy Machine Learning
14 pages
MODULE I - Lesson 3 Big Data
No ratings yet
MODULE I - Lesson 3 Big Data
9 pages
Case Study On Processing Data Driven For Health
No ratings yet
Case Study On Processing Data Driven For Health
9 pages
Machine Learning
No ratings yet
Machine Learning
14 pages
Simple Linear Regression
No ratings yet
Simple Linear Regression
6 pages
Cloud Unit V
No ratings yet
Cloud Unit V
23 pages
Answers RM&IPR Brief
No ratings yet
Answers RM&IPR Brief
6 pages
Bda Case Study UBER
No ratings yet
Bda Case Study UBER
7 pages
Wiley Big Data Specialist by Jigsaw Academy
No ratings yet
Wiley Big Data Specialist by Jigsaw Academy
4 pages
Big Data
No ratings yet
Big Data
7 pages
The Pathologies of Big Data Summary
No ratings yet
The Pathologies of Big Data Summary
2 pages
Supports Low Energy Radio Operation. Ietf 6lowpan Ieft Coap Rfid/Nfc
No ratings yet
Supports Low Energy Radio Operation. Ietf 6lowpan Ieft Coap Rfid/Nfc
6 pages
Hbase Interview Questions
No ratings yet
Hbase Interview Questions
5 pages
CS6712 2013 Regulation-Lesson plan-CS6712-GRID AND CLOUD COMPUTING LAB-7th sem-ODD2018
No ratings yet
CS6712 2013 Regulation-Lesson plan-CS6712-GRID AND CLOUD COMPUTING LAB-7th sem-ODD2018
2 pages
DP-500 Designing and Implementing Enterprise-Scale Analytics Solutions Using Microsoft Azure and Microsoft Power BI Exam Guide
From Everand
DP-500 Designing and Implementing Enterprise-Scale Analytics Solutions Using Microsoft Azure and Microsoft Power BI Exam Guide
Anand Vemula
No ratings yet
Database And Computer Management: SERIES 1, #3
From Everand
Database And Computer Management: SERIES 1, #3
Elias Mutegi
No ratings yet
Hadoop Ecosystem for Big Data
From Everand
Hadoop Ecosystem for Big Data
Dr. Zemelak Goraga
No ratings yet
Data Lake Development with Big Data: Explore architectural approaches to building Data Lakes that ingest, index, manage, and analyze massive amounts of data using Big Data technologies
From Everand
Data Lake Development with Big Data: Explore architectural approaches to building Data Lakes that ingest, index, manage, and analyze massive amounts of data using Big Data technologies
Pradeep Pasupuleti
No ratings yet
Comprehensive Guide to Azure HDInsight: Definitive Reference for Developers and Engineers
From Everand
Comprehensive Guide to Azure HDInsight: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Knight's Microsoft Business Intelligence 24-Hour Trainer: Leveraging Microsoft SQL Server Integration, Analysis, and Reporting Services with Excel and SharePoint
From Everand
Knight's Microsoft Business Intelligence 24-Hour Trainer: Leveraging Microsoft SQL Server Integration, Analysis, and Reporting Services with Excel and SharePoint
Brian Knight
3/5 (1)
The Power of Big Data: Transforming Industries and Shaping the Future
From Everand
The Power of Big Data: Transforming Industries and Shaping the Future
Tom Henricksen
No ratings yet
Learn Data Warehousing in 24 Hours
From Everand
Learn Data Warehousing in 24 Hours
Alex Nordeen
No ratings yet
Practical Data Strategies and Recipes
From Everand
Practical Data Strategies and Recipes
Tom Henricksen
No ratings yet
Learn Hadoop in 24 Hours
From Everand
Learn Hadoop in 24 Hours
Alex Nordeen
No ratings yet

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

BDA QB Answers 8 To 15

Uploaded by

BDA QB Answers 8 To 15

Uploaded by

8. Explain the following:1.Data Sources 2.Data Quality 3.

8.1 Data Sources :

Unstructured Data Source

 Unstructured data sources are distributed over high-speed networks.

8.2 Data Quality

Factors Affecting Data Quality :

8.3 Data Pre-processing :

Pre-processing needs are:

9. Discuss Data store export to cloud

Steps for Exporting Data to the Cloud:

Benefits of Exporting Data to the Cloud

10.List the characteristics of Big Data platform

 storage, processing and analytics,

Characteristics of a Big Data Platform

Optimizing Benefits for a Toy Company Using Big Data Analytics

2. Focused and Targeted Campaigns

3. Innovative Product Development

4. Detection of Marketing Frauds

6. Supply Chain Optimization

 Performance Tracking: Track supplier performance and optimize inventory management, en

7. Customer Value Analytics (CVA)

12.Explain the usage of Big data analytics: i)to detect Marketing

: i)to detect Marketing Frauds

enabling detection and prevention of frauds:

 media, websites, biogs, e-mails, and thus enriching existing data

 Using multiple sources of data and connecting with many applications

 Providing greater insights using querying of the multiple source data

 Analyzing data which enable structured reports and visualization

 business intelligence and knowledge discovery

13.How are Big Data used in i)Chocolate company ii)Automobile industry

2. Monitoring and Improving Food Safety

3. Optimizing Supply Chain and Inventory Management

4. Enhancing Customer Experience

 Personalized Marketing: Leveraging data analytics to understand customer preferences and b

5. Efficient Operations and Cost Reduction

6. Fraud Detection and Prevention

1. Product Development and Innovation

 Strategic Partnerships: Collaborations, such as National Instruments Corporation (NIC) acqui

2. Supply Chain and Manufacturing

 Optimized Manufacturing: Big Data helps in optimizing manufacturing processes by providin

3. Connected Vehicles and Intelligent Transportation

 Intelligent Transportation Systems: Improving traffic management and reducing congestion

4. Customer Behavior Analytics

5. OEM Warranty and Aftersales/Dealers

 Aftermarket Services: Providing tailored services and solutions based on data-

6. Sales, Marketing, and Other Applications

8. Global and Regional Market Insights

14.What is Hadoop?Explain the core components of Hadoop.

15.Explain Hadoop Ecosystem with a neat Diagram

Core Components of Hadoop

1. Hadoop Distributed File System (HDFS):

o It provides fault-tolerant storage by replicating data blocks across different

o It allows the distribution of tasks across many nodes, enabling efficient

3. YARN (Yet Another Resource Negotiator):

o YARN is the resource management layer of Hadoop.

o It allows multiple applications to run simultaneously and handle large-scale

Key Benefits of the Hadoop Ecosystem:

Hadoop Ecosystem Tools (BDA(18CS72)Module-2):

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.