0% found this document useful (0 votes)
156 views28 pages

Chapter 3545

Business analytics is the practice of exploring and analyzing an organization's data through statistical methods to gain insights that inform business decisions. It allows companies to make data-driven decisions and optimize processes. There are different categories of analytical methods like descriptive, predictive, and prescriptive analytics. Descriptive analytics summarizes past performance, predictive analytics forecasts future trends, and prescriptive analytics recommends actions. Business analytics has grown in many industries as more companies treat data as a valuable asset. Successful business analytics depends on data quality, skilled analysts, and commitment to data-driven decision making.

Uploaded by

vishwanath286699
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
156 views28 pages

Chapter 3545

Business analytics is the practice of exploring and analyzing an organization's data through statistical methods to gain insights that inform business decisions. It allows companies to make data-driven decisions and optimize processes. There are different categories of analytical methods like descriptive, predictive, and prescriptive analytics. Descriptive analytics summarizes past performance, predictive analytics forecasts future trends, and prescriptive analytics recommends actions. Business analytics has grown in many industries as more companies treat data as a valuable asset. Successful business analytics depends on data quality, skilled analysts, and commitment to data-driven decision making.

Uploaded by

vishwanath286699
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 28

BUSINESS

ANALYTICS
As per New CBCS Syllabus for BBA, 3rd Year, 6th Semester for
All the Universities in Telangana State w.e.f. 2018-19

Prem Sagar Jakkula Sandeep Agarwalla


M.Sc., M.Phil., M.Tech., APSET, TSSET (Ph.D.) MCA, M.Tech. (CSE)
Faculty, Department of Computer Science, Faculty, Department of Computer Science,
St. Mary’s Centenary Degree College, Indian Institute of Management & Commerce (IIMC),
St. Francis Street, Secunderabad. Khairatabad, Hyderabad.

V.Karuna Sree
MBA, M.Com, PGD(Maths) (PhD)
Academic Head,
Ethemes College of Commerce & Business Management,
Panjagutta, Hyderabad.

ISO 9001:2015 CERTIFIED


© Authors
No part of this publication may be reproduced, stored in a retrieval system, or transmitted in any form or by any
means, electronic, mechanical, photocopying, recording and/or otherwise without the prior written permission of the
authors and the publisher.

First Edition : 2019

Published by : Mrs. Meena Pandey for Himalaya Publishing House Pvt. Ltd.,
“Ramdoot”, Dr. Bhalerao Marg, Girgaon, Mumbai - 400 004.
Phone: 022-23860170, 23863863; Fax: 022-23877178
E-mail: himpub@vsnl.com; Website: www.himpub.com

Branch Offices :

New Delhi : “Pooja Apartments”, 4-B, Murari Lal Street, Ansari Road, Darya Ganj, New Delhi - 110 002.
Phone: 011-23270392, 23278631; Fax: 011-23256286

Nagpur : Kundanlal Chandak Industrial Estate, Ghat Road, Nagpur - 440 018.
Phone: 0712-2738731, 3296733; Telefax: 0712-2721216

Bengaluru : Plot No. 91-33, 2nd Main Road, Seshadripuram, Behind Nataraja Theatre,
Bengaluru - 560 020. Phone: 080-41138821; Mobile: 09379847017, 09379847005

Hyderabad : No. 3-4-184, Lingampally, Besides Raghavendra Swamy Matham, Kachiguda,


Hyderabad - 500 027. Phone: 040-27560041, 27550139

Chennai : New No. 48/2, Old No. 28/2, Ground Floor, Sarangapani Street, T. Nagar,
Chennai - 600 012. Mobile: 09380460419

Pune : “Laksha” Apartment, First Floor, No. 527, Mehunpura, Shaniwarpeth (Near Prabhat Theatre),
Pune - 411 030. Phone: 020-24496323, 24496333; Mobile: 09370579333

Lucknow : House No. 731, Shekhupura Colony, Near B.D. Convent School, Aliganj,
Lucknow - 226 022. Phone: 0522-4012353; Mobile: 09307501549

Ahmedabad : 114, “SHAIL”, 1st Floor, Opp. Madhu Sudan House, C.G. Road, Navrang Pura,
Ahmedabad - 380 009. Phone: 079-26560126; Mobile: 09377088847

Ernakulam : 39/176 (New No. 60/251), 1st Floor, Karikkamuri Road, Ernakulam, Kochi - 682 011.
Phone: 0484-2378012, 2378016; Mobile: 09387122121

Bhubaneswar : Plot No. 214/1342, Budheswari Colony, Behind Durga Mandap, Bhubaneswar - 751 006.
Phone: 0674-2575129; Mobile: 09338746007

Kolkata : 108/4, Beliaghata Main Road, Near ID Hospital, Opp. SBI Bank, Kolkata - 700 010.
Phone: 033-32449649; Mobile: 07439040301

DTP by : Nilima

Printed at : M/s. Aditya Offset Process (I) Pvt. Ltd., Hyderabad. On behalf of HPH.
Preface
The objective of this book is to provide an understanding of basic concepts of Business Analytics
like Descriptive, Predictive and Prescriptive Analytics, Data Mining Techniques and an overview of
Programming using R.
The related matter had been written in a simple and lucid style, easily understandable language
even for the below-average students with sufficient support from real business information.
This book covers five units:
Unit-I: Business Analytics, Categories of Business Analytical Methods and Models, Business
Analytics in Practice, Big Data – Overview of Using Data, Types of Data.
Unit-II: Description Statistics (Central Tendency, Variability), Data Visualization – Definition,
Visualization Techniques – Tables, Cross Tabulations, Charts, Data Dashboards using MS-Excel or
SPSS.
Unit-III: Trend Lines, Regression Analysis – Linear and Multiple, Forecasting Techniques, Data
Mining – Definition, Approaches in Data Mining – Data Exploration and Reduction, Classification,
Association, and Cause and Effect Modeling.
Unit-IV: Linear Optimization, Non-linear Programming, Integer Optimization, Cutting Plane
Algorithm and Other Methods, Decision Analysis – Risk and Uncertainty Methods.
Unit-V: Programming using R, R Environment, R Packages, Reading and Writing Data in R, R
Functions, Control Statements, Frames and Subsets, Managing and Manipulating Data in R.
Also, we had given MCQs.
We consider this book is useful for understanding purposes to students and professionals. This
book lays down the framework defining Descriptive, Predictive and Prescriptive Analytics.
And our wholehearted gratitude to FR. Rev. Allam. Arogya Reddy, Principal, St. Mary’s
Centenary Degree colLege and Shri. K.Raghuveer Principal, IIMC, Hyderabad and lso to my dear
colleagues and to our family members.
We offer our gratitude to Himalaya Publishing House Pvt. Ltd., who is a leader in Commerce and
Management Publications. Our sincere regards to Shri Niraj Pandey (Director), Vijay Pandey (General
Manager – Marketing) and especially Mr. G. Anil Kumar (Sales Manager), Hyderabad for his sincere
efforts and interest shown and for the best efforts and patience taken for bringing out this book it time.
Any query or any suggestions regarding improvement or errors, if any, will be gratefully
acknowledged and shall be incorporated in our consecutive editions
You can contact on jakkula.premsagar@gmail.com.
January 2019
Hyderabad Authors
Syllabus
Objective: The objective of the course is to provide an understanding of basic concepts of
Business Analytics like Descriptive, Predictive and Prescriptive Analytics and an overview of
Programming using R.

UNIT-I: INTRODUCTION TO BUSINESS ANALYTICS


Definition of Business Analytics, Categories of Business Analytical Methods and Models,
Business Analytics in Practice, Big Data – Overview of Using Data, Types of Data.

UNIT-II: DESCRIPTIVE ANALYTICS


Overview of Description Statistics (Central Tendency, Variability), Data Visualization –
Definition, Visualization Techniques – Tables, Cross Tabulations, Charts, Data Dashboards
Using MS-Excel or SPSS.

UNIT-III: PREDICTIVE ANALYTICS


Trend Lines, Regression Analysis – Linear and Multiple, Forecasting Techniques, Data
Mining – Definition, Approaches in Data Mining – Data Exploration and Reduction,
Classification, Association, Cause and Effect Modelling.

UNIT-IV: PRESCRIPTIVE ANALYTICS


Overview of Linear Optimization, Non-linear Programming Integer Optimization, Cutting
Plane Algorithm and Other Methods, Decision Analysis – Risk and Uncertainty Methods.

UNIT-V: PROGRAMMING USING R


R Environment, R Packages, Reading and Writing Data in R, R Functions, Control Statements,
Frames and Subsets, Managing and Manipulating Data in R.
Contents
Sr. No. Topic Page No.

1 Introduction to Business Analytics 1 – 22

2 Descriptive Analytics 23 – 89

3 Predictive Analytics 90 – 151

4 Prescriptive Analytics 152 – 175

5 Programming using R 176 – 218


Unit-I
Chapter 1

Introduction to Business Analytics

Chapter Outline
1.1 Definition of Business Analytics
1.2 Categories of Business Analytical Methods and Models
1.3 Business Analytics in Practice
1.4 Big Data – Overview of Using Data
1.5 Types of Data
1.6 Questions

1.1 Definition of Business Analytics


 The use of business analytics has grown exponentially in all areas, including healthcare,
government, retail, e-commerce, media, manufacturing and the service industry.
 The result is an increased need for employees with an analytical approach to management
who can utilize data, understand statistical and quantitative models, and are able to make
better data-driven business decisions.
Definition: Business Analytics (BA) is the practice of iterative, methodical exploration of an
organization’s data, with an emphasis on statistical analysis. Business analytics is used by companies
committed to data-driven decision-making.
BA is used to gain insights that inform business decisions and can be used to automate and
optimize business processes. Data-driven companies treat their data as a corporate asset and leverage it
for a competitive advantage. Successful business analytics depends on data quality, skilled analysts
who understand the technologies and the business, and an organizational commitment to data-driven
decision-making.

Business Analytics Examples


Business analytics techniques break down into two main areas. The first is basic business
intelligence. This involves examining historical data to get a sense of how a business department, team
or staff member performed over a particular time. This is a mature practice that most enterprises are
fairly accomplished at using.
2 Business Analytics

Business Cloud storage


reports Business database
or data set

1. Descriptive analytic Find possible business-


What happened?
analysis related opportunities.

What’s happening,
why is it happening, Predict opportunities in which
2. Predictive analytic
and what will happen? analysis the firm can take advantage.

Allocate resources to take


3. Prescriptive
How shall it be advantage of the predicted
analytic analysis
handled? opportunities.

Outcome of the entire BA analysis: Measurable increase


in business value and performance.

The second area of business analytics involves deeper statistical analysis. This may mean doing
predictive analytics by applying statistical algorithms to historical data to make a prediction about
future performance of a product, service or website design change. Or, it could mean using other
advanced analytics techniques like cluster analysis, to group customers based on similarities across
several data points. This can be helpful in targeted marketing campaigns, for example.

Relationship of BA Process and Organization Decision-making Process


The BA process can solve problems and identify opportunities to improve business performance.
In the process, organizations may also determine strategies to guide operations and help achieve
competitive advantages. Typically, solving problems and identifying strategic opportunities to follow
are organization decision-making tasks. The latter, identifying opportunities, can be viewed as a
problem of strategy choice requiring a solution. It should come as no surprise that the BA process
described closely parallel classic organization decision-making processes. As depicted in Figure, the
business analytics process has an inherent relationship to the steps in typical organization decision-
making processes.

Business Analytics vs. Data Science


The more advanced areas of business analytics can start to resemble data science, but there is a
distinction. Even when advanced statistical algorithms are applied to data sets, it does not necessarily
mean data science is involved. There are a host of business analytics tools that can perform these kinds
of functions automatically, requiring few of the special skills involved in data science.
Introduction to Business Analytics 3

1. Perception of disequilibrium: Observe and become aware of


Source of data potential problem (or opportunity) situations.

1. Descriptive analytic 2. Diagnostic process: Attempt to understand what is happening


analysis in a particular situation.

3. Problem statement: Identify and state problems and solution


2. Predictive analytic strategies in relation to organization goals and objectives.
analysis

4. Solution strategy selection: Select optimal course of action


3. Prescriptive analytic
analysis for the organization from the strategies determine previously,
and

5. Implementation: implement the strategy.

Outcome of both of these processes:

Measurable increase in business value and performance

True data science involves more custom coding and more open-ended questions. Data scientists
generally do not set out to solve a specific question, as most business analysts do. Rather, they will
explore data using advanced statistical methods and allow the features in the data to guide their
analysis.

Business Analytics Applications


Business analytics tools come in several different varieties:
1. Data visualization tools
2. Business intelligence reporting software
3. Self-service analytics platforms
4. Statistical analysis tools
5. Big data platforms
Self-service has become a major trend among business analytics tools. Users now demand
software that is easy to use and does not require specialized training. This has led to the rise of simple-
to-use tools from companies such as Tableau and Qlik, among others. These tools can be installed on
a single computer for small applications or in server environments for enterprise-wide deployments.
Once they are up and running, business analysts and others with less specialized training can use them
to generate reports, charts and web portals that track specific metrics in data sets.
Once the business goal of the analysis is determined, an analysis methodology is selected and
data is acquired to support the analysis. Data acquisition often involves extraction from one or more
4 Business Analytics

business systems, data cleansing and integration into a single repository, such as a data warehouse or
data mart. The analysis is typically performed against a smaller sample set of data.
Analytics tools range from spreadsheets with statistical functions to complex data mining and
predictive modeling applications. As patterns and relationships in the data are uncovered, new
questions are asked, and the analytical process iterates until the business goal is met.
Deployment of predictive models involves scoring data records – typically in a database – and
using the scores to optimize real-time decisions within applications and business processes. BA also
supports tactical decision-making in response to unforeseen events. And, in many cases, the decision-
making is automated to support real-time responses.

Discussion Questions
1. What is the difference between analytics and business analytics?
2. What is the difference between business analytics and business intelligence?
3. Why are the steps in the business analytics process sequential?
4. How is the business analytics process similar to the organization decision-making process?

1.2 Categories of Business Analytical Methods and Models


Four Types of Business Analytics
For different stages of business analytics, a huge amount of data is processed at various steps.
Depending on the stage of the workflow and the requirement of data analysis, there are four main
kinds of analytics – descriptive, diagnostic, predictive and prescriptive. These four types together
answer everything a company needs to know – from what is going on in the company to what
solutions to be adopted for optimizing the functions.
The four types of analytics are usually implemented in stages and no one type of analytics is said
to be better than the other. They are interrelated and each of these offers a different insight. With data
being important to so many diverse sectors – from manufacturing to energy grids, most of the
companies rely on one or all of these types of analytics. With the right choice of analytical techniques,
Big Data can deliver richer insights for the companies
The Hierarchy of Business Analytics
1. Descriptive Analytics: Describing or summarizing the existing data using existing business
intelligence tools to better understand what is going on or what has happened.
2. Diagnostic Analytics: Focus on past performance to determine what happened and why.
The result of the analysis is often an analytic dashboard.
3. Predictive Analytics: Emphasizes on predicting the possible outcome using statistical
models and machine learning techniques.
4. Prescriptive Analytics: It is a type of predictive analytics that is used to recommend one or
more course of action on analyzing the data.
Introduction to Business Analytics 5

1. Descriptive Analytics = asset, operation, environmental and diagnostic information


2. Diagnostic Analytics = identifies patterns of behavior (importance and urgency)
3. Predictive Analytics = suggests a timeframe for an action
4. Prescriptive Analytics = recommends specific actions

Descriptive Prescriptive
Diagnostic Predictive
recommends
explains what explains why it forecasts what
an action based on
happened. happened. might happen.
the forecast

1. Descriptive Analytics
This can be termed as the simplest form of analytics. The mighty size of big data is beyond
human comprehension and the first stage, hence involves crunching the data into understandable
chunks. The purpose of this analytics type is just to summarize the findings and understand what is
going on.
Among some frequently used terms, what people call as advanced analytics or business
intelligence is basically usage of descriptive statistics (arithmetic operations, mean, median, max,
percentage, etc.) on existing data. It is said that 80% of business analytics mainly involves descriptions
based on aggregations of past performance. It is an important step to make raw data understandable to
investors, shareholders and managers. This way, it gets easy to identify and address the areas of
strengths and weaknesses such that it can help in strategizing.
The two main techniques involved are data aggregation and data mining stating that this method
is purely used for understanding the underlying behavior and not to make any estimations. By mining
historical data, companies can analyze the consumer behaviors and engagements with their businesses
that could be helpful in targeted marketing, service improvement, etc. The tools used in this phase are
MS Excel, MATLAB, SPSS, STATA, etc.
2. Diagnostic Analytics
Diagnostic Analytics is used to determine why something happened in the past. It is characterized
by techniques such as drill-down, data discovery, data mining and correlations. Diagnostic analytics
takes a deeper look at data to understand the root causes of the events. It is helpful in determining what
factors and events contributed to the outcome. It mostly uses probabilities, likelihoods and the
distribution of outcomes for the analysis.
In a time series data of sales, diagnostic analytics would help you understand why the sales have
decreased or increased for a specific year or so. However, this type of analytics has a limited ability to
6 Business Analytics

give actionable insights. It just provides an understanding of causal relationships and sequences while
looking backward.
A few techniques that uses diagnostic analytics include attribute importance, principle
components analysis, sensitivity analysis and conjoint analysis. Training algorithms for classification
and regression also fall in this type of analytics.
3. Predictive Analytics
Predictive Analytics is used to predict future outcomes. However, it is important to note that
it cannot predict if an event will occur in the future; it merely forecasts what the probabilities of the
occurrence of the event are. A predictive model builds on the preliminary descriptive analytics stage to
derive the possibility of the outcomes.
The essence of predictive analytics is to devise models such that the existing data is understood to
extrapolate the future occurrence or simply, predict the future data. One of the common applications of
predictive analytics is found in sentiment analysis where all the opinions posted on social media are
collected and analyzed (existing text data) to predict the person’s sentiment on a particular subject as
being positive, negative or neutral (future prediction).
Hence, predictive analytics includes building and validation of models that provide accurate
predictions. Predictive analytics relies on machine learning algorithms like random forests, SVM, etc.
and statistics for learning and testing the data. Usually, companies need trained data scientists and
machine learning experts for building these models. The most popular tools for predictive analytics
include Python, R, Rapid Miner, etc.
The prediction of future data relies on the existing data as it cannot be obtained otherwise. If the
model is properly tuned, it can be used to support complex forecasts in sales and marketing. It goes
a step ahead of the standard BI in giving accurate predictions.
4. Prescriptive Analytics
The basis of this analytics is predictive analytics, but it goes beyond the three mentioned above to
suggest the future solutions. It can suggest all favorable outcomes according to a specified course of
action and also suggest various course of actions to get to a particular outcome. Hence, it uses a strong
feedback system that constantly learns and updates the relationship between the action and the
outcome.
The computations include optimization of some functions that are related to the desired outcome.
For example, while calling for a cab online, the application uses GPS to connect you to the correct
driver from among a number of drivers found nearby. Hence, it optimizes the distance for faster arrival
time. Recommendation engines also use prescriptive analytics.
The other approach includes simulation where all the key performance areas are combined to
design the correct solutions. It makes sure whether the key performance metrics are included in the
solution. The optimization model will further work on the impact of the previously made forecasts.
Because of its power to suggest favorable solutions, prescriptive analytics is the final frontier of
advanced analytics or data science, in today’s term.

What is an Analytical Model?


Analytical models are mathematical models that have a closed form solution, i.e., the solution to
the equations used to describe changes in a system can be expressed as a mathematical analytic
function.
Introduction to Business Analytics 7

Analytical Models
An analytical model is simply a mathematical equation that describes relationships among
variables in a historical data set. The equation either estimates or classifies data values. In essence,
a model draws a “line” through a set of data points that can be used to predict outcomes. For example,
a linear regression draws a straight line through data points on a scatterplot that shows the impact of
advertising spend on sales for various ad campaigns. The model’s formula—in this case, “Sales =
17.813 + (.0897 * advertising spend)”—enables executives to accurately estimate sales if they spend a
specific amount on advertising (See Figure 1.)
Estimation Model (Linear Regression)

Sales
Sales = 17.813 + (.0897 * Advertising Spend)
Advertising

$120 $1,503 $6,000


$160 $1,755
$5,000
$205 $2,971
$4,000
$210 $1,682
$3,000
$225 $3,497
$230 $1,998 $2,000
$290 $4,528 $1,000
$315 $2,937 $0
$375 $3,622 $0 $100 $200 $300 $400 $500 $600
$390 $4,402 Advertising

Courtesy: Tony Rathburn and the Modeling Agency

Figure 1

Algorithms that create analytical models (or equations) come in all shapes and sizes.
Classification algorithms such as neural networks, decision trees, clustering and logistic regression use
a variety of techniques to create formulas that segregate data values into groups. Online retailers often
use these algorithms to create target market segments or determine which products to recommend to
buyers based on their past and current purchases (See Figure 2).
Classification of Algorithms

Decision Tree Cluster

Logistic Regression Neural Net

Figure 2
8 Business Analytics

Classification Models Separate Data Values into Logical Groups


Trusting Models. Unfortunately, some models are more opaque than others, i.e., it is hard to
understand the logic the model used to identify relevant patterns and relationships in the data. The
problem with these “black box” models is that businesspeople often have a hard time trusting them
until they see quantitative results, such as reduced costs or higher revenues. Getting business users to
understand and trust the output of analytical models is perhaps the biggest challenge in data mining.
To earn trust, analytical models have to validate a businessperson’s intuitive understanding of
how the business operates. In reality, most models do not uncover brand new insights; rather they
unearth relationships that people understand as true but are not looking at or acting upon. The models
simply refocus people’s attention on what is important and true and dispel assumptions (whether
conscious or unconscious) that are not valid.
Modeling Process
Given the power of analytical models, it is important that analytical modelers take a disciplined
approach. Analytical modelers need to adhere to a methodology to work productively and generate
accurate models. The modeling process consists of six distinct tasks:
1. Define the project
2. Explore the data
3. Prepare the data
4. Create the model
5. Deploy the model
6. Manage the model
Interestingly, preparing the data is the most time-consuming part of the process, and if not done
right, can torpedo the analytical model and project. “[Data preparation] can easily be the difference
between success and failure, between usable insights and incomprehensible murk, between worthwhile
predictions and useless guesses,” writes Dorian Pyle in his book “Data Preparation for Data Mining.”
Figure 3 shows a breakdown of the time required for each of these six steps. Data preparation
consumes one-quarter (25%) of an analytical modeler’s time, followed by model creation (23%), data
exploration (18%), project definition (13%), scoring and deployment (12%), and model management
(9%). Thus, almost half of an analytical modelers’ time (43%) is spent exploring and preparing data,
although this varies based on the condition and availability of data. Analytical modelers are like house
painters who must spend lots of time preparing a paint surface to ensure a long-lasting paint finish.
Analytical Modeling Tasks

1. Project definition 13%

2. Data exploration 18%

3. Data preparation 25%

4. Model creation 23%

5. Scoring and deployment 12%

6. Model management 9%

Other 8%

Figure 3
Introduction to Business Analytics 9

From Wayne Eckerson, “Predictive Analytics: Extending the Value of Your Data Warehousing
Investment,” 2007. Based on 166 respondents, who have a predictive modeling practice.
Project Definition. Although defining an analytical project does not take as long as some of the
other steps, it is the most critical task in the process. Modelers that do not know explicitly what they
are trying to accomplish will not be able to create useful analytical models. Thus, before they start,
good analytical modelers spend a lot of time defining objectives, impact and scope.
Project Objectives. Project Objectives consist of the assumptions or hypotheses that a model
will evaluate. Often, it helps to brainstorm hypotheses and then prioritize them based on business
requirements. Project impact defines the model output (e.g., a report, a chart, or scoring program), how
the business will use that output (e.g., embedded in a daily sales report or operational application or
used in strategic planning), and the projected return on investment. Project scope defines who, what,
where, when, why and how of the project, including timelines and staff assignments.
For example, a project objective might be: “Reduce the amount of false positives when scanning
credit card transactions for fraud.” While the output might be: “A computer model capable of running
on a server and measuring 7,000 transaction per minute, scoring each with probability and confidence,
and routing transactions above a certain threshold to an operator for manual intervention.”
Data Exploration. Data exploration or data discovery involves sifting through various sources of
data to find the data sets that best fit the project. During this phase, the analytical modeler will
document each potential data set with the following items:
 Access methods: Source systems, data interfaces, machine formats (e.g., ASCII or
EBCDIC), access rights and data availability.
 Data characteristics: Field names, field lengths, content, format, granularity and statistics
(e.g., counts, mean, mode, median, and min/max values).
 Business rules: Referential integrity rules, defaults and other business rules.
 Data pollution: Data entry errors, misused fields and bogus data.
 Data completeness: Empty or missing values and sparsely.
 Data consistency: Labels and definitions.
Typically, an analytical modeler will compile all this information into a document and use it to
help prioritize which data sets to use for which variables (See Figure 4). A data warehouse with well
documented meta data can greatly accelerate the data exploration phase because it also maintains
much of this information. However, analytical modelers often want to explore external data and other
data sets that do not exist in the data warehouse and must compile this information manually.
Data Profile Document

Source State University Alumini List. Kathleen Manx. 617 653-4733


Transmission: One-time transmission of membership list
Connectivity 1GB Flash drive shipped via UPS
Format ASCII text file. Fixed-width fields
Name Begin/End Length/type Description Valid Values Count
Filler 00001/0089 89/A Blanks Null 1000
SSN 0090/0098 9/N Pre-approved SSN 0-9 rt justified 898
10 Business Analytics

Phone 0099/0101 10/N Home phone 0-9 rt justified 643


Phone 0109/0016 10/N Business phone 0-9 rt justified 641
Birth date 0019/0024 6/N Birth date MMDDYY 695
Spouse 0025/0036 15/AN Pre-app. Add’I Card First, middle, last 333
Filler 0037/0039 89/A Blanks Null 1000
Comments: Missing 21% of values of SSN plus 3% are invalid values
A Data Profile Document describes the properties of a potential data set.
Data Preparation. Once analytical modelers document and select their data sets, then they must
standardize and enrich the data. First, this means correcting any data errors that exist in the data and
standardizing the machine format (e.g., ASCII vs EBCDIC). Then it involves merging and flattening
the data into a single wide table which may consist of hundreds of variables (i.e., columns). Finally, it
means enriching the data with third party data such as demographic, psychographic or behavioral data
that can enhance the models.
From there, analytical modelers transform the data. So, it is in an optimal form to address project
objectives and meet processing requirements for specific machine learning techniques. Common
transformations include summarizing data using reverse pivoting (See Figure 5), transforming
categorical values into numerical values, normalizing numeric values so they range from 0 to 1,
consolidating continuous data into a finite set of bins or categories, removing redundant variables and
filling in missing values.
Modelers try to eliminate variables and values that are not relevant as well as fill in empty fields
with estimated or default values. In some cases, modelers may want to increase the bias or skew in
a data set by duplicating outliers, giving them more weight in the model output. These are just some of
the many data preparation techniques that analytical modelers use.

Reverse Pivoting
Bank Transactions by Account Summarized Customer Activity
Period 1 - #

Period 1 - $

Period 2 - #

Period 2 - $

Period 3 - #

Period 3 - $

Period 4 - #

Period 4 - $
Product
Balance

Branch
Acct #

Acct #
Date

6651 3/17 $950 ELA ATM 6651 2 $65 3 $115 1 $100 5 $250

2894 3/2 $850 ELA CK 2894 1 $14 8 $95 0 $0 2 $80

6651 2/12 $825 WLA ATM

2894 2/11 $655 WLA CK

6651 2/4 $980 ELA ATM

2894 2/3 $970 ELA SAV

Figure 4
Introduction to Business Analytics 11

To model a banking “customer” not bank transactions, analytical modelers use a technique
called reverse pivoting to summarize banking transactions to show customer activity by period.
Analytical Modeling. Analytical modeling is as much art as science. Much of the craft involves
knowing what data sets and variables to select and how to format and transform the data for specific
data models. Often, a modeler will start with 100+ variables and then, through data transformation and
experimentation, winnow them down to 12 to 20 variables that are most predictive of the desired
outcome.
In addition, an analytical modeler needs to select historical data that has enough of the “answers”
built in it with a minimal amount of noise. Noise consists of patterns and relationships that have no
business value, such as a person’s birth date and age, which gives a 100% correlation. A data modeler
will eliminate one of those variable to reduce noise. In addition, they will validate their models by
testing them against random subsets of the data which they set aside in advance. If the scores remain
compatible across training, testing and validation data sets, then they know they have a fairly accurate
and relevant model.
Finally, the modeler must choose the right analytical techniques and algorithms or combinations
of techniques to apply to a given hypothesis. This is where modelers’ knowledge of business processes,
project objectives, corporate data and analytical techniques come into play. They may need to try
many combinations of variables and techniques before they generate a model with sufficient predictive
value.
Every analytical technique and algorithm has its strengths and weaknesses, as summarized in the
tables below. The goal is to pick the right modeling technique. So, you have to do as little preparation
and transformation as possible, according to Michael Berry and Gordon Inhofe in their book, “Data
Mining Techniques: For Marketing, Sales and Customer Support.”
Table 1: Analytical Models

Task Use Techniques


Classification Assign new records to a predefined class Logistic regression. Decision Trees.
based on its features; used to predict an Neural Networks. Link Analysis.
outcome. Yes/no; high/medium/low.
Forecasting Technique for predicting a numerical Linear Regression. Neural Networks.
outcome.
Prediction Uses estimation or classification to Neural Networks. Decision Trees.
predict future behavior of value. Link Analysis. Genetic Algorithms.
Market Basket Analysis.
Affinity Grouping Finds rules that define which terms go Market Basket Analysis. Memory
together; good for market basket, cross- Based Reasoning. Link Analysis.
selling and root cause analysis.
Clustering Find natural groupings of things that are Neural Networks. Decision Trees.
more like each other than members of Cluster Detection. Market Basket
another cluster. Analysis. Memory Based Reasoning
12 Business Analytics

Table 2: Analytical Techniques

Technique Task Strengths Consideration


Neural Networks Flexible; mimics interactions of Model are not easily explained at values
neurons in human brain can handle must be between 0 and 1 with no nulls.
time-based inputs. Can model multiple Not great for categorical variables or
variables at once. lots of variables.
Decision Trees Models are easy to explain; good for Models can get ‘bushy’ with sparse data
categorical and numeric data; good for and have to be “pruned”.
creating a subset of fields as input to
another technique.
Memory-based Finds values that most resemble the Do not work with numeric variable.
Reasoning variable to make a prediction. Little Only categorical variables. Does not
prep. Adapts to new inputs without work well with lots of variables.
training Works with text.
Market Basket A form of clustering that creates rules Do not work with numeric variables.
Analysis about which items are purchased Only categorical variables.
together.
Genetic Uses natural selection; tests each Not for classification.
Algorithms prediction against each other to
determine the best one.
Clustering Undirected learning. Finds natural Not predictive.
groups. Goods way to start.
Deploy the Model. Model deployment takes many forms, as mentioned above. Executives can
simply look at the model, absorb its insights, and use it to guide their strategic or operational planning.
But models can also be operationalized. The most basic way to operationalize a model is to embed it
in an operational report. For example, a daily sales report for a telecommunications company might
list each sales representative’s customers by their propensity to churn. Or a model might be applied at
the point of customer interaction, whether at a branch office or at an online checkout counter.
To apply models, you first have to score all the relevant records in your database. This involves
converting the model into SQL or some other program that can run inside the database that holds the
records that you want to score. Scoring involves running the model against each record and generating
a numeric value, usually between 0 and 1, which is then appended to the record as an additional
column. A higher score generally means a higher propensity to portray the desired or predicted
behavior. Scoring is usually a batch process that happens at night or on the weekend depending on the
volume of records that need to be scored. However, scoring can also happen in real-time, which is
essentially what online retailers do when they make real-time recommendations based on purchases a
customer just made.
Model Management. Once the model is built and deployed, it must be maintained. Models
become obsolete over time, as the market or environment in which they operate changes. This is
particularly true for volatile environments, such as customer marketing or risk management. Also,
complex models that deliver high business value usually require a team of people to create, modify,
update and certify the models.
Introduction to Business Analytics 13

In such an environment, it is critical to have a model repository that can track versions, audit
usage and manage a model through its lifecycle. Once an organization has more than one operational
model, it is imperative it implements model management utilities, which most data mining vendors
now support.

1.3 Business Analytics in Practice


Business Analytics is a set of tools and techniques that can be used to improve business
performance through fact-based decision-making. Data Exploration, Business Intelligence and Data
Mining have been there for a while and helped businesses to create Data Discipline in the organization.

Business Analytics is commonly defined as skills, technologies, applications and practices for
continuous iterative investigation of past business performance to gain insight and drive business
planning (Beller and Barnett, 2009). To identify the skillset commonly expected for business analytics
practitioners, we conducted a search of open position announcements using Indeed.com, a specialized
search engine which indexes job postings across numerous company websites as well as job posting
aggregators. We used the keyword “business analytics” to identify open positions in the New York
City metro area. We examined the job listings which were returned by the Indeed search engine and
after iterative evaluation decided to retain a relatively shortlist of positions which: (1) were offered at
large established companies and (2) exemplified the skillset commonly expected in the industry for
similar positions. Our rationale for focusing on the large established corporations is grounded in the
expectation that large companies have more established business processes and more clearly defined
job functions compared to smaller, less established companies (Humphrey, 1988). Our decision to
focus on a limited number of representative positions stems from the observation that while specific
industries and companies may have very distinct jobs requirements, our goal is to identify a common
set of skills that is frequently required across different companies and industries. The positions
selected for our analysis include the following:
 Data Visualization Consultant (Accenture)
 Data Analytics Manager (Deloitte)
 Business Intelligence Analyst (UBS)
 Compliance Office Analyst (Citibank)
 Data and Analytics Consultant (Accenture)
 Loan Operations Business Analyst (Capital One)
14 Business Analytics

 Business Intelligence Architect (Nike)


 Customer Intelligence Analyst (PSEG)
Job descriptions posted by companies follow various formats, but they generally list the required
skills. In order to develop a matrix representation of common skills required by each position, we draw
on an often cited view of business analytics in practice, which suggests that business analytical skillset
lies at the intersection of expertise from three domains: (1) the specific business domain, (2) technical
data management and programming expertise and (3) applied statistics.

Business
Domain
expertise

Applied Technical data


Statistics
Management skills

Figure 5

Our evaluation of the job requirements along the three domains in Figure 6 suggests that applied
statistical skills required by the companies encompass both a theoretical understanding of statistical
methods, as well as practical knowledge of software packages commonly used for statistical analysis –
primarily SAS and R software. The job descriptions commonly require familiarity with regression
modeling techniques. Application of regression analysis requires understanding of inherent
assumptions underlying the regressions, and necessitates foundational statistical knowledge of
distributions, sampling and statistical inference. Though not all job postings explicitly stated this
requirement, we inferred the need for foundational statistical knowledge wherever the position
required regression analysis expertise.
Data mining is a broad concept that encompasses many data model design and analytical
techniques which generally include regression analysis among them (Fayyad, Piatetsky-Shapiro and
Smyth, 1996). In our analysis, we separated regression skills from the more advanced data mining
methodologies, e.g., decision trees, neural networks, support vector machines as well as ensemble
modeling techniques. Further, we also separately evaluated job requirements for text analytical skills,
because analysis of textual data is a unique domain within data mining practice with specialized
expertise related to processing and modeling of textual data. Considering that 80% of the world’s data
today is unstructured, these skills set are becoming extremely important.
The ability to locate, extract and prepare data for analysis is foundational for business analytics in
practice. The required stated technical data management skills among the reviewed job postings span
the range from the basic structured query language (SQL) competency in popular relational database
management systems (RDBMS) to proficiency with large data set analysis leveraging Hadoop
infrastructure. While SQL, RDBMS and data warehousing skills are nearly universally required across
the positions which we reviewed, a growing number of positions also require competency with key-
Introduction to Business Analytics 15

value stores, most commonly exemplified by Hadoop implementations in practice. Data warehousing
job requirements often specifically call for experience with data extraction, transformation, loading
(ETL) and cleaning. Further, two of the eight positions in our sample explicitly required expertise with
Python Programming language as the development platform for performing data processing and
analysis.
Data visualization expertise was nearly universally required by the positions, which we included
in our analysis. Data visualization represents an important area of practice. Virtually, all positions in
our set listed Tableau software as the dominant tool for data visualization, but several positions also
suggested Qlikview as another potential software choice for data visualization. All positions
emphasized the importance of soft skills: effective communication and presentation as well as the
ability to work in groups, highlighting the fact that effective business analytics in practice often
requires group collaboration and effective communication of insights across the enterprise. These
skills become important in influencing the decision to implement the results of analytical exercise/
analytics team.
In addition to specific knowledge of statistical methods and technical skills, every position also
included business domain specific expertise which qualified an ideal job candidate. These
requirements are detailed in Table 3.
Table 3

Position (Company) Industry Specific Requirements


Data Visualization Consultant (Accenture) Industry experience: financial services, healthcare and
government
Data Analytics Manager (Deloitte) Enterprise risk management, risk reporting, financial
and regulatory reporting
Business Intelligence Analyst (UBS) Securities research
Compliance Office Analytics (Citibank) Anti-money laundering regulation and compliance
Data and Analytics Consultants (Accenture) Industry experience: financial services, healthcare,
high tech and government
Loan Operations Business Analyst (Capital One) Financial auditing and risk management
Intelligence Architect (Nike) High volume consumer data
Customer Intelligence Analyst (PSEG) Customer operations/experience

1.4 Big Data – Overview of Using Data


Big Data is a term defined for data sets that are large or complex that traditional data processing
applications are inadequate. Big Data basically consists of analysis zing, capturing the data, data
creation, searching, sharing, storage capacity, transfer, visualization, and querying and information
privacy.
16 Business Analytics

The volume of data that one has to deal has exploded to unimaginable levels in the past decade,
and at the same time, the price of data storage has systematically reduced. Private companies and
research institutions capture terabytes of data about their users’ interactions, business, social media,
and also sensors from devices such as mobile phones and automobiles. The challenge of this era is to
make sense of this sea of data. This is where big data analytics comes into picture.
Big Data Analytics largely involves collecting data from different sources, mugged it in a way
that it becomes available to be consumed by analysts and finally deliver data products useful to the
organization business.
The process of converting large amounts of unstructured raw data, retrieved from different
sources to a data product useful for organizations, forms the core of Big Data Analytics.

Big Data Life Cycle


In today’s big data context, the previous approaches are either incomplete or suboptimal. For
example, the SEMMA methodology disregards completely data collection and pre-processing of
different data sources. These stages normally constitute most of the work in a successful big data
project.
Introduction to Business Analytics 17

A Big Data Analytics Cycle can be described by the following stages:


1. Business Problem Definition
2. Research
3. Human Resources Assessment
4. Data Acquisition
5. Data Mugging
6. Data Storage
7. Exploratory Data Analysis
8. Data Preparation for Modeling and Assessment
9. Modeling
10. Implementation
1. Business Problem Definition: This is a point common in traditional BI and big data
analytics lifecycle. Normally, it is a non-trivial stage of a big data project to define the
problem and evaluate correctly how much potential gain it may have for an organization.
It seems obvious to mention this, but it has to be evaluated what are the expected gains and
costs of the project.
2. Research: Analyze what other companies have done in the same situation. This involves
looking for solutions that are reasonable for your company, even though it involves adapting
other solutions to the resources and requirements that your company has. In this stage,
a methodology for the future stages should be defined.
3. Human Resources Assessment: Once the problem is defined, it is reasonable to continue
analyzing if the current staff is able to complete the project successfully. Traditional BI
teams might not be capable to deliver an optimal solution to all the stages. So, it should be
considered before starting the project if there is a need to outsource a part of the project or
hire more people.
4. Data Acquisition: This section is key in a big data life cycle; it defines which type of
profiles would be needed to deliver the resultant data product. It is a non-trivial step of the
process; it normally involves gathering unstructured data from different sources. To give an
example, it could involve writing a crawler to retrieve reviews from a website. This involves
18 Business Analytics

dealing with text, perhaps in different languages normally requiring a significant amount of
time to be completed.
5. Data Mugging: Once the data is retrieved, for example, from the web, it needs to be stored
in an easy-to-use format. To continue with the reviews examples, let’s assume the data is
retrieved from different sites where each has a different display of the data.
Suppose one data source gives reviews in terms of rating in stars. Therefore, it is possible to
read this as a mapping for the response variable y ∈ {1, 2, 3, 4, 5}. Another data source
gives reviews using two arrows system, one for up voting and the other for down voting.
This would imply a response variable of the form y ∈ {positive, negative}.
In order to combine both the data sources, a decision has to be made in order to make these
two response representations equivalent. This can involve converting the first data source
response representation to the second form, considering one star as negative and five stars as
positive. This process often requires a large time allocation to be delivered with good quality.
6. Data Storage: Once the data is processed, it sometimes needs to be stored in a database. Big
data technologies offer plenty of alternatives regarding this point. The most common
alternative is using the Hadoop File System for storage that provides users a limited version
of SQL, known as HIVE Query Language. This allows most analytics task to be done in
similar ways as would be done in traditional BI data warehouses, from the user perspective.
Other storage options to be considered are Mongo DB, Redis and SPARK.
This stage of the cycle is related to the human resources knowledge in terms of their abilities
to implement different architectures. Modified versions of traditional data warehouses are
still being used in large-scale applications. For example, Teradata and IBM offer SQL
databases that can handle terabytes of data; open source solutions such as postgreSQL and
MySQL are still being used for large-scale applications.
Even though there are differences in how the different storages work in the background,
from the client side, most solutions provide a SQL API. Hence, having a good
understanding of SQL is still a key skill to have for big data analytics.
This stage a priori seems to be the most important topic; in practice, this is not true. It is not
even an essential stage. It is possible to implement a big data solution that would be working
with real-time data. So, in this case, we only need to gather data to develop the model and
then implement it in real time. So, there would not be a need to formally store the data at all.
7. Exploratory Data Analysis: Once the data has been cleaned and stored in a way that
insights can be retrieved from it, the data exploration phase is mandatory. The objective of
this stage is to understand the data. This is normally done with statistical techniques and also
plotting the data. This is a good stage to evaluate whether the problem definition makes
sense or is feasible.
8. Data Preparation for Modeling and Assessment: This stage involves reshaping the
cleaned data retrieved previously and using statistical pre-processing for missing values
imputation, outlier detection, normalization, feature extraction and feature selection.
9. Modelling: The prior stage should have produced several data sets for training and testing,
e.g., a predictive model. This stage involves trying different models and looking forward to
solving the business problem at hand. In practice, it is normally desired that the model
would give some insight into the business. Finally, the best model or combination of models
is selected evaluating its performance on a left-out data set.
Introduction to Business Analytics 19

10. Implementation: In this stage, the data product developed is implemented in the data
pipeline of the company. This involves setting up a validation scheme while the data product
is working in order to track its performance. For example, in case of implementing a
predictive model, this stage would involve applying the model to new data and once the
response is available, evaluate the model.

1.5 Types of Data


Structured
By structured data, we mean data that can be processed, stored and retrieved in a fixed format.
It refers to highly organized information that can be readily and seamlessly stored and accessed from
a database by simple search engine algorithms. For instance, the employee table in a company
database will be structured as the employee details, their job positions, their salaries, etc. will be
present in an organized manner.

Unstructured
Unstructured data refers to the data that lacks any specific form or structure whatsoever. This
makes it very difficult and time-consuming to process and analyze unstructured data. Email is an
example of unstructured data.
Structured Data Unstructured Data
Characteristics  Pre-defined data models  No pre-defined data model
 Usually text only  May be text, images, sound, video or
other formats
 Easy to search  Difficult to search
Resides in  Relational databases  Applications
 Data warehouses  NoSQL databases
 Data warehouses
 Data lakes
Generated by Humans or Machines Humans or Machines
Typical  Airline reservation systems  Word processing
applications  Inventory control  Presentation software
 CRM systems  Email clients
 ERP systems  Tools for viewing or editing media
Examples  Dates  Text Files
 Phone numbers  Presentation software
 Social security numbers  Email messages
 Credit card numbers  Audio files
 Customer names  Video files
 Addresses  Images
 Product names and numbers  Surveillance imagery
 Transaction information 
20 Business Analytics

Semi-structured
Semi-structured data pertains to the data containing both the formats mentioned above, i.e.,
structured and unstructured data. To be precise, it refers to the data that although has not been
classified under a particular repository (database), yet contains vital information or tags that segregate
individual elements within the data.
The Data
UNSTRUCTURED Landscape
DATA
Flat Text

Semi-structured data has a


lack of fixed, rigid schema.
UNSTRUCTURED There is no separation
DATA
XML between the data and the
HTML schema, self-describing
RDF structure (tags or other
markers).

UNSTRUCTURED
Databases

Characteristics of Big Data


Back in 2001, Gartner analyst Doug Laney listed the three ‘V’s of Big Data – Variety, Velocity,
and Volume. These characteristics, isolated, are enough to know what big data is.
1. Variety
2. Velocity
3. Volume
1. Variety: Variety of Big Data refers to structured, unstructured and semi-structured data that
is gathered from multiple sources. While in the past, data could only be collected from
spreadsheets and databases, today data comes in an array of forms such as emails, PDFs,
photos, videos, audios, SM posts, and so much more.
2. Velocity: Velocity essentially refers to the speed at which data is being created in real-time.
In a broader prospect, it comprises the rate of change, linking of incoming data sets at
varying speeds and activity bursts.
3. Volume: We already know that Big Data indicates huge ‘volumes’ of data that is being
generated on a daily basis from various sources like social media platforms, business
processes, machines, networks, human interactions, etc. Such a large amount of data are
stored in data warehouses.
Advantages of Big Data
One of the biggest advantages of Big Data is Predictive Analysis. Big Data Analytics tools can
predict outcomes accurately, thereby allowing businesses and organizations to make better decisions,
while simultaneously optimizing their operational efficiencies and reducing risks.
By harnessing data from social media platforms using Big Data analytics tools, businesses around
the world are streamlining their digital marketing strategies to enhance the overall consumer
experience. Big Data provides insights into the customer pain points and allows companies to improve
upon their products and services.
Introduction to Business Analytics 21

Being accurate, Big Data combines relevant data from multiple sources to produce highly
actionable insights. Almost 43% of companies lack the necessary tools to filter out irrelevant data,
which eventually costs them millions of dollars to hash out useful data from the bulk. Big Data tools
can help reduce this, saving you both time and money.
Big Data Analytics could help companies generate more sales leads which would naturally mean
a boost in revenue. Businesses are using Big Data Analytics tools to understand how well their
products/services are doing in the market and how the customers are responding to them. Thus, they
can understand better where to invest their time and money.
With Big Data insights, you can always stay a step ahead of your competitors. You can screen the
market to know what kind of promotions and offers your rivals are providing, and then you can come
up with better offers for your customers. Also, Big Data insights allow you to learn customer behavior
to understand the customer trends and provide a highly ‘personalized’ experience to them.
Who is Using Big Data?
The people who are using Big Data know better that, what Big Data is. Let’s look at some such
industries:
 Healthcare: Big Data has already started to create a huge difference in the healthcare sector.
With the help of predictive analytics, medical professionals and HCPs are now able to
provide personalized healthcare services to individual patients. Apart from that, fitness
wearables, telemedicine, remote monitoring – all powered by Big Data and AI – are helping
change lives for the better.
 Academia: Big Data is also helping enhance education today. Education is no more limited
to the physical bounds of the classroom – there are numerous online educational courses to
learn from. Academic institutions are investing in digital courses powered by Big Data
technologies to aid the all-round development of budding learners.
 Banking: The banking sector relies on Big Data for fraud detection. Big Data tools can
efficiently detect fraudulent acts in real-time such as misuse of credit/debit cards, archival of
inspection tracks, faulty alteration in customer stats, etc.
 Manufacturing: According to TCS 2013 Global Trend Study, the most significant benefit
of Big Data in manufacturing is improving the supply strategies and product quality. In the
manufacturing sector, Big Data helps create a transparent infrastructure, thereby predicting
uncertainties and incompetencies that can affect the business adversely.
 IT: One of the largest users of Big Data, IT companies around the world are using Big Data
to optimize their functioning, enhance employee productivity and minimize risks in business
operations. By combining Big Data technologies with ML and AI, the IT sector is
continually powering innovation to find solutions even for the most complex of problems.

1.6 Questions
I. Essay Type Questions
1. Explain about Business Analytics.
2. Describe categories of Business analytical methods.
3. How Big Data helps in Business Data Analysis?
4. Explain the types of data .
22 Business Analytics

II. Multiple Choice Questions


1. According to analysts, for what can traditional IT systems provide a foundation when they are
integrated with Big Data technologies like Hadoop?
(a) Big Data management and data mining (b) Data warehousing and business intelligence
(c) Management of Hadoop clusters (d) Collecting and storing unstructured data
2. All of the following accurately describe Hadoop, EXCEPT:
(a) Open Source (b) Real-time
(c) Java-based (d) Distributed computing approach
3. __________ has the world’s largest Hadoop cluster.
(a) Apple (b) Datamatics
(c) Facebook (d) None of the mentioned
4. What are the five V’s of Big Data?
(a) Volume (b) Velocity
(c) Variety (d) All the above
5. What are the main components of Big Data?
(a) Map Reduce (b) HDFS
(c) YARN (d) All of these
6. What are the different features of Big Data Analytics?
(a) Open Source (b) Scalability
(c) Data Recovery (d) All the above
7. Facebook Tackles Big Data with __________ based on Hadoop
(a) Project Prism (b) Prism
(c) Project Data (d) Project Bid
Ans.: 1. (a), 2. (b), 3. (c), 4. (d), 5. (d), 6. (d), 7. (a).



You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy