BA1 Introduction 2025
BA1 Introduction 2025
Business
Analytics
INTRODUCTION
Analytics overview
Business Analytics in practice
Measurement and scaling
Calculations
Mean median mode Geometric mean
Standard deviation and CV, Variance , Range
Z score – Identifying outliers
Scatter charts – Covariance – correlation coefficient
Analytics overview
BUSINESS ANALYTICS IN PRACTICE
Definition
Business analytics:
◦ Scientific process of transforming data into insight for making
better decisions.
◦ Used for data-driven or fact-based decision making, which is often
seen as more objective than other alternatives for decision making.
The Spectrum of Business Analytics
5
Data
Types of data
Discrete Interval
Continuous
Ratio
Data, Data Sets, Elements,
Variables, and Observations
Statistics is the science that deals with the collection, preparation, analysis, interpretation, and presentation
of data
Structured data
◦ Reside in a pre-defined, row-column format.
◦ Spreadsheet or database applications.
◦ Enter, store, query, and analyze.
◦ Numerical information that is objective and not open to interpretation.
Structured/Unstructured
Today, only about 20% of all data used in business decisions is structured.
Unstructured data
◦ Do not conform to a pre-defined, row-column format.
◦ Textual and multimedia content.
◦ Do not conform to database structures.
◦ These data may have some implied structure.
◦ Still considered unstructured.
◦ Do not conform to a row-column model required in most database systems.
◦ Example: social media data such as Twitter, YouTube, Facebook, and blogs.
Timeseries data
A sequential organization of data accordingly
to their time of occurrence is termed as time
series.
https://www.analyticssteps.com/blogs/introduction-time-series-analysis-time-series-forecasting-machine-learning-
methods-models
Cross sectional data
The key difference between time series and cross-sectional data is that time series data focuses on the
same variable over some time, while the cross-sectional data focuses on several variables at the
same point of time.
Big Data
Over Rank
Maruti 5 1 5 11 all
Hyundai 4 3 2 14
Mahindra 3 2 4 9 Maruti 35 5
Hyund 27 3
Toyota 1 4 3 8
ai
Kia 2 5 1 8
Mahin 31 4
dra
Toyota 24 2
Kia 22 1
Interval Scale
The standard survey rating scale is an interval scale
E.g. : the same scale where the three factors are ranked for the ordinal scale can
be given in the interval scale as follows (semantic)
The 5 brands can be rated on a scale of 1 to 5 for the aesthetics factor where 1
is the least and 5 the highest.
Affordability (semantic) Affordability (numerical)
1 2 3 4 5
Maruti Maruti 1 2 3 4 5
Hyundai Hyundai 1 2 33 4 5
Tata Tata 1 2 3 4 5
Chevy Chevy 1 2 3 44 5
Fiat Fiat 1 22 3 4 5
Ratio scale
Marketshare
The factor that clearly defines ratio scale is that it Hatchback cars
has a true zero point. (100)
Any numerical data on actuals – sales Maruti 40/100
In This approach, the each respondent is shown Hyundai 20/100
the five different types of cars and asked “how
Tata 20/100
much would you be willing to pay for this brand of
car?” VW Polo 10/100
At the end of research the data is collated and we Renault Kwid 10/100
might find that respondents are willing to pay 10%
more for Maruti over Hyundai, and 15% more for
Hyundai over Fiat
Levels of measurement
Scale Basic Operations Number system Typical usage Statistical tools
Descriptive Inferential
Nominal Determination of equality 1,2 Classifications of any Percentages Chi square
(Unique kind ,Mode
definition)
Ordinal Determination of greater or Order of Rankings Median Mann Whitney test,
less numerals Freidman, Two way
(0<1<2….<9) ANOVA, Rank order
correlation
Interval Determination of equality of Equality of Index numbers, Mean, T-test, factor analysis,
intervals differences attitude measures, Range, ANOVA
opinions Standard
deviation
Ratio Determination of equality of Equality of Sales, units All Coefficient of variance
ratios ratios produced,No:of arithmetic
customers, costs operations
Descriptives and
Visualisations
CHAPTER 2 &3
Visualisations
CHAPTER 2
Visualisations
Categorical Numeric
Construct frequency distribution
Construct frequency distribution • Line chart
• Bar Chart
• Histogram
• Pie chart
• Scatterplot – 2 variables
• Box plot
Data Preparation
Data Preparation
We often spend a considerable amount of time inspecting and
preparing the data for the subsequent analysis.
◦ Counting and sorting
◦ Handling missing values
◦ Subsetting
36
• If 𝑥 , 𝑥 , . . . , 𝑥 is a sample of n observations
=
◦ 𝑧 = z-score for 𝑥
◦ 𝑥̅ = sample mean
◦ s = sample standard deviation
Calculate z score
No: of students
in the class
46
54
42
46
32
z-Scores for the Class Size Data
• Identifying outliers:
◦ Outliers: Extreme values in a data set.
◦ It can be identified using standardized values (z-scores).
◦ Any data value with a z-score less than –3 or greater than +3 is an outlier.
40
Analysis of Relative Location
Measures of association
Correlation
BIVARIATE – T WO VARIABLES
CHAPTER 3
Scatter Plot
The first step in determining whether there is a relationship between two
variables is to examine the graph of the observed (or known) data. This
graph, or chart, is called a scatter diagram.
Refer data and scatter diagram plotted
A scatter diagram can give us two types of information. Visually, we can
look for patterns that indicate that the variables are related.
Then, if the variables are related, we can see what kind of line, or
estimating equation, describes this relationship.
Scatter diagram
An instructor is interested in finding out how the number of students absent on a given day is
related to the mean temperature that day. A random sample of 10 days was used for the study.
The following data indicate the number of students absent (ABS) and the mean temperature
(TEMP) for each day.
ABS 8 7 5 4 2 3 5 6 8 9
TEMP 10 20 25 30 40 45 50 55 59 60
(a) State the dependent (Y) variable and the independent (X) variable.
(b) Draw a scatter diagram of these data.
(c) Does the relationship between the variables appear to be linear or curvilinear?
(d) What type of curve could you draw through the data?
(e) What is the logical explanation for the observed relationship?
Measures of Association Between
Two Variables
• Scatter Charts: Useful graph for analyzing the relationship between two
variables.
47
Measures of Association Between
Two Variables
r value Relationshi
• Correlation coefficient: Measures the relationship p between
between two variables. the x and y
variables
◦ Not affected by the units of measurement for x and y.
<0 Negative
◦ Sample correlation coefficient denoted by 𝑟 . linear
◦ 𝑟 = Near 0 No linear
◦ 𝑠 = sample covariance =
∑ relationshi
p
∑ ̅ >0 Positive
◦ 𝑠 = sample standard deviation of x = linear
∑
◦ 𝑠 = sample standard deviation of y =
48
Calculate the correlation using
covariance method
X Y
Marks in accounts Marks in QT
1 48 45
2 35 20
3 17 40
4 23 25
5 47 45
Xbar= y bar =
Solution
xbar =34 y bar 35
X Y
Marks in accounts Marks in QT x-xbar y-ybar (x-xbar)^2 (y-ybar)^2 (x-xbar)(y-ybar)
48 45 14 10 196 100 140
35 20 1 -15 1 225 -15
17 40 -17 5 289 25 -85
23 25 -11 -10 121 100 110
47 45 13 10 169 100 130
776 550 280
Sxy 70
Sx 13.92839
Sy 11.72604
rxy 0.428594
Data for Bottled Water Sales at Queensland
Amusement Park for a Sample of 14 Summer Days
51
Chart Showing the Positive Linear Relation Between
Sales and High Temperatures
52
Sample Covariance Calculations for Daily High
Temperature and Bottled Water Sales at Queensland Amusement Park
53
Computation of Correlation
Coefficient
Illustration - To determine the sample correlation coefficient for bottled
water sales at Queensland Amusement Park:
12.8
𝑟 = = = 0.93
(4.36)(3.15)
• There is a very strong linear relationship between high temperature and
sales.
54
Practice – correlation
X Y
20 1
22 0
25 2
30 5
38 2
40 4
42 6
45 5
47 7
51 8