0% found this document useful (0 votes)

94 views109 pages

02-03 ASAP Business Analytics-2 Descriptive Statistics

Uploaded by

George Mathew

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

94 views109 pages

02-03 ASAP Business Analytics-2 Descriptive Statistics

Uploaded by

George Mathew

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 109

FOUNDATION TO DATA SCIENCE

Business Analytics

Unit1: BASIC STATISTICS REFRESHER AND HOW

TO EXPLORE DATA -2

Prof. Dr. George Mathew

B.Sc., B.Tech, PGDCA, PGDM, MBA, PhD 1
Measures of Location
1. Mean (Arithmetic Mean)
2. Median
3. Mode
4. Geometric Mean
5. Percentiles
6. Quartiles
1.Mean (Arithmetic Mean)
The most commonly used measure of location
is the mean (arithmetic mean), or average
value, for a variable. The mean provides a
measure of central location for the data. If the
data are for a sample (typically the case), the
mean is denoted by (x-bar) x̄. The sample
mean is a point estimate of the (typically
unknown) population mean for the variable of
interest. If the data for the entire population are
available, the population mean is computed in
the same manner, but denoted by the Greek
letter μ.
Home Sale Data

Practical: Excel File:02-03 ASAP Discriptive statistics_Excel Solver.xlsx

1.Mean (Arithmetic Mean)
Median
The median, another measure of central
location, is the value in the middle when the
data are arranged in ascending order (smallest
to largest value). With an odd number of
observations, the median is the middle value.
An even number of observations has no
single middle value. In this case, we follow
convention and define the median as the
average of the values for the middle two
observations.
Median
Mode
A third measure of location, the mode, is the
value that occurs most frequently in a data
set. To illustrate the identification of the
mode, consider the sample of five class
sizes.
32 34 42 46 46 54 56 67
Here 46 repeats twice others only once,
Hence Mode=46
Mean, Median, Mode
Geometric Mean
The geometric mean is a measure of location
that is calculated by finding the nth root of the
product of n values. The general formula for
the sample geometric mean, denoted x g ,
follows.

The geometric mean is often used in analyzing

growth rates in financial data. In these types of
situations, the arithmetic mean or average
value will provide misleading results.
Geometric Mean
To illustrate the use of the geometric mean,
consider Table 2.10 which shows the percentage
annual returns, or growth rates, for a mutual fund
over the past ten years. Suppose we want to
compute how much $100 invested in the fund at
the beginning of year 1 would be worth at the end
of year 10.
Geometric Mean
Product=
(0.779)(1.287)(1.109)(1.049)(1.158)(1.055)(0.630)(1.
265)(1.151)(1.021)] = $100(1.335)
= 1.3345
G.M= tenth root of 1.335

The geometric mean tells us that annual returns grew

at an average annual rate of (1.029 - 1)100, or 2.9
percent. In other words, with an average annual
growth rate of 2.9 percent, a $100 investment in the
fund at the beginning of year 1 would grow to
$100(1.029) 10 = $133.09 at the end of ten years.
Geometric Mean
We can use Excel to calculate the geometric
mean for the data in Table 3 by using the
function GEOMEAN. In Figure 10, the value
for the geometric mean in cell is found using
the formula ='=GEOMEAN(C4:C13).
Geometric Mean
Percentiles
A percentile is the value of a variable at
which a specified (approximate) percentage
of observations are below that value. The
pth percentile tells us the point in the data
where approximately p percent of the
observations have values less than the pth
percentile; hence, approximately (100 – p)
percent of the observations have values
greater than the pth percentile.
Percentiles
Percentiles
Percentiles
Therefore, $305,912.50 represents the 85th
percentile of the home sales data. The pth
percentile can also be calculated in Excel
using the function PERCENTILE.EXC.
Figure 12 shows the Excel calculation for the
85th percentile of the home sales data. The
value in cell E13 is calculated using the
formula =PERCENTILE.EXC(B2:B13,0.85);
B2:B13 defines the data set for which we are
calculating a percentile, and 0.85 defines the
percentile of interest.
CALCULATING VARIABILITY MEASURES FOR
THE HOME SALES DATA IN EXCEL
Quartiles
It is often desirable to divide data into four
parts, with each part containing
approximately one-fourth, or 25 percent, of
the observations. These division points are
referred to as the quartiles and are defined
as:
Q 1 = first quartile, or 25th percentile Q 2 =
second quartile, or 50th percentile (also the
median) Q 3 = third quartile, or 75th
percentile.
Quartiles
To demonstrate quartiles, the home sales
data are again arranged in ascending order.
108,000 138,000 138,000 142,000 186,000
199,500 208,000 254,000 254,000 257,500
298,000 456,250 We already identified Q2,
the second quartile (median) as 203,750.
To find Q1 and Q3, we must find the 25th
and 75th percentiles.
Quartiles
Inter Quartile Range
The difference between the third and first
quartiles is often referred to as the
interquartile range, or IQR. For the home
sales data, IQR = Q 3 - Q 1 = 256,625 -
139,000 = 117,625. Because it excludes the
smallest and largest 25 percent of values in
the data, the IQR is a useful measure of
variation for data that have extreme values
or are badly skewed.
Quartile Using Excel
A quartile can be computed in Excel using
the function QUARTILE.EXC. Figure 12
shows the calculations for first, second, and
third quartiles for the home sales data. The
formula used in cell E15 is
=QUARTILE.EXC(B2:B13,1). The range
B2:B13 defines the data set, and 1 indicates
that we want to compute the 1st quartile.
Cells E16 and E17 use similar formulas to
compute the second and third quartiles.
Compare the Spread of two Data
Compare the Spread of two
Data
Range
The simplest measure of variability is the
range. The range can be found by
subtracting the smallest value from the
largest value in a data set. Let us return to
the home sales data set to demonstrate the
calculation of range. Refer to the data from
home sales prices in Table 2. The largest
home sales price is $456,250, and the
smallest is $108,000. The range is $456,250
- $108,000 = $348,250.
Variance
The variance is a measure of variability of the
data. The variance is based on the deviation
about the mean, which is the difference
between the value of each observation (x i )
and the mean. For a sample, a deviation of an
observation about the mean is written (x i - x̄ ).
In the computation of the variance, the
deviations about the mean are squared.
Variance
Standard Deviation
The standard deviation is defined to be the
positive square root of the variance. We use
s to denote the sample standard deviation
and σ to denote the population standard
deviation. The sample standard deviation, s,
is a point estimate of the population
standard deviation,σ, and is derived from the
sample variance in the following way:
Coefficient of Variation
In some situations we may be interested in a
descriptive statistic that indicates how large
the standard deviation is relative to the
mean. This measure is called the coefficient
of variation and is usually expressed as a
percentage.
Identifying Outliers
Sometimes a data set will have one or more
observations with unusually large or unusually
small values. These extreme values are called
outliers. It should be removed during analysis to get
best results.
Standardized values (z-scores) can be used to
identify outliers. Usually data satisfy a bell-shaped
distribution, almost all the data values will be within
three standard deviations of the mean. Hence, in
using z-scores to identify outliers, we recommend
treating any data value with a z-score less than -3
or greater than +3 as an outlier.
z-scores
A z-score allows us to measure the relative
location of a value in the data set. More
specifically, a z-score helps us determine how
far a particular value is from the mean relative
to the data set’s standard deviation. Suppose
we have a sample of n observations, with the
values denoted by x 1 , x 2 , . . . , x n . In
addition, assume that the sample mean, x̄, and
the sample standard deviation, s, are already
computed. Associated with each value, x i , is
another value called its z-score.
Normal Distribution
Normal Distribution
z-scores

The z-score is often called the standardized

value. The z-score, z i , can be interpreted as
the number of standard deviations, x i , is from
the mean. For example, z 1 = 1.2 indicates
that x 1 is 1.2 standard deviations greater than
the sample mean.
z-scores
Box Plots
A box plot is a graphical summary of the
distribution of data. A box plot is developed
from the quartiles for a data set. Figure 14 is
a box plot for the home sales data. Here are
the steps used to construct the box plot:
Box Plots
What can we learn from these box plots?
The most expensive houses appear to be in Shadyside and
the cheapest houses in Hamilton. The median home selling
price in Groton is about the same as the median home
selling price in Irving. However, home sales prices in Irving
have much greater variability. Homes appear to be selling
in Irving for many different prices, from very low to very
high. Home selling prices have the least variation in Groton
and Hamilton. Unusually expensive home sales (relative to
the respective distribution of home sales values) have
occurred in Fairview, Groton, and Irving, which appear as
outliers. Groton is the only location with a low outlier, but
note that most homes sell for very similar prices in Groton,
so the selling price does not have to be too far from the
median to be considered an outlier.
Statistical Inference
Statistics uses data from a sample to make estimates
and test hypotheses about the characteristics of a
population through a process referred to as statistical
inference.
Example: To evaluate the advantages of the new
filament, by Norris Electronics, a sample of 200 bulbs
manufactured with the new filament were tested. Data
collected from this sample showed the number of
hours each lightbulb operated before filament
burnout. Data given in See Table 7.
Figure 16 provides a graphical summary of the
statistical inference process for Norris
Electronics.
Inferential statistics
inferential statistics, is used to make inferences or
to project from a sample to an entire population. For
example, when a firm test-markets a new product in
two cities of United States, it is not only concerned
about how customers in these two cities feel, but
they want to make an inference from these
sample markets to predict what will happen
throughout the United States. So, two applications of
statistics exist:
(1) descriptive statistics which describe
characteristics of the population or sample and
(2) inferential statistics which are used to generalize
from a sample to a population.
Population Parameters and Sample
Statistics
Population parameters are measured
characteristics of a specific population. In
other words, information about the entire
universe of interest. Sample statistics are
used to make inferences (guesses) about
population parameters based on sample data.
In our notation, we will generally represent
population parameters with Greek lowercase
letters Mu for example, sigma or and sample
statistics with English letters, such as X or S.
Correlation
Correlation Using Excell
To find the correlations between each pair of
stocks, click Data Analysis in the Analysis
group on the Data tab and then select
Correlation. You must install the Analysis
ToolPak “Summarizing data by using
histograms,” and “Summarizing data by
using descriptive statistics” before you can
use this feature. Click OK and then fill in the
Correlation dialog box as shown in Figure
Compare the Spread of two Data
Measures of Variability
In descriptive statistics we mainly analyse the
data for its Central tendency or Average and
Spread of the data, variability, or dispersion.
The spread is measured by measuring its
variation from the mean.
We measure the spread of data using various
measures of spread such as; Range,
Percentiles, Absolute mean deviation,
Variance and standard deviation, quartile
deviation etc.

Test Bank for Business Analytics 3rd Edition by Evans
No ratings yet
Test Bank for Business Analytics 3rd Edition by Evans
28 pages
The Athletic Recode Training System-Phase3
No ratings yet
The Athletic Recode Training System-Phase3
81 pages
Homework 4
No ratings yet
Homework 4
4 pages
Rrop Family (Vip) - SMC Gelo
No ratings yet
Rrop Family (Vip) - SMC Gelo
14 pages
Foundation School Manual
85% (27)
Foundation School Manual
72 pages
CSEC English A January 2021 P1
100% (1)
CSEC English A January 2021 P1
16 pages
EBA5005 Sample Exam Paper
No ratings yet
EBA5005 Sample Exam Paper
16 pages
Analyzing The External Environment of The Firm: Chapter Two
No ratings yet
Analyzing The External Environment of The Firm: Chapter Two
37 pages
Multiple Regression Analysis Using SPSS Laerd
No ratings yet
Multiple Regression Analysis Using SPSS Laerd
14 pages
Organizational Readiness to E-Transformation
From Everand
Organizational Readiness to E-Transformation
Aqel M. Aqel
No ratings yet
Rice Crop Parameters
100% (2)
Rice Crop Parameters
9 pages
MT416 - BCommII - Introduction To Business Analytics - MBA - 10039 - 19 - PratyayDas
No ratings yet
MT416 - BCommII - Introduction To Business Analytics - MBA - 10039 - 19 - PratyayDas
44 pages
STI Policy
No ratings yet
STI Policy
5 pages
2e. Supply Chain Analytics - Presentation
No ratings yet
2e. Supply Chain Analytics - Presentation
39 pages
In-Class Practices - Session 1 - Answers
No ratings yet
In-Class Practices - Session 1 - Answers
19 pages
Eba3e PPT ch04
No ratings yet
Eba3e PPT ch04
100 pages
Maintaining and Monitoring The Online Presence
No ratings yet
Maintaining and Monitoring The Online Presence
6 pages
Eba3e PPT ch06
No ratings yet
Eba3e PPT ch06
41 pages
BCSE 0105 - Machine Learning - Module 1 - Complete - NC
No ratings yet
BCSE 0105 - Machine Learning - Module 1 - Complete - NC
200 pages
Business Intelligence
No ratings yet
Business Intelligence
14 pages
Ms Excel
No ratings yet
Ms Excel
9 pages
Unit IV - Database Normalization
No ratings yet
Unit IV - Database Normalization
31 pages
MITx SCX KeyConcept SC1x FV
No ratings yet
MITx SCX KeyConcept SC1x FV
70 pages
Exploratory Data Analysis - Komorowski PDF
No ratings yet
Exploratory Data Analysis - Komorowski PDF
20 pages
2nd Unit - 2.2 - Data Analytics
No ratings yet
2nd Unit - 2.2 - Data Analytics
22 pages
Macroeconomics Chapter 1
No ratings yet
Macroeconomics Chapter 1
24 pages
2 - LinearProg 1 PDF
No ratings yet
2 - LinearProg 1 PDF
21 pages
Business Statistics
No ratings yet
Business Statistics
20 pages
Case Study 4
No ratings yet
Case Study 4
10 pages
Types of Analytics: What Is Descriptive Analytics?
No ratings yet
Types of Analytics: What Is Descriptive Analytics?
3 pages
Data Mining: Concepts and Techniques: - Chapter 6
No ratings yet
Data Mining: Concepts and Techniques: - Chapter 6
172 pages
01 ASAP TimeSeriesForcasting Day1 2 Introduction
No ratings yet
01 ASAP TimeSeriesForcasting Day1 2 Introduction
62 pages
Cluster Training PDF (Compatibility Mode)
No ratings yet
Cluster Training PDF (Compatibility Mode)
21 pages
Business Analytics
No ratings yet
Business Analytics
9 pages
Business Analytics - The Science of Data Driven Decision Making
No ratings yet
Business Analytics - The Science of Data Driven Decision Making
55 pages
Assignment 2.1
No ratings yet
Assignment 2.1
3 pages
Chapter 08 Advanced SQL
No ratings yet
Chapter 08 Advanced SQL
28 pages
Business Analytics: Methods, Models, and Decisions: Descriptive Statistics
No ratings yet
Business Analytics: Methods, Models, and Decisions: Descriptive Statistics
100 pages
Assignment 1&2
No ratings yet
Assignment 1&2
4 pages
1-Big Data Analytics
No ratings yet
1-Big Data Analytics
37 pages
Statistical Infrences Lec 1
No ratings yet
Statistical Infrences Lec 1
35 pages
SE Sec-A Lecture-10
No ratings yet
SE Sec-A Lecture-10
48 pages
Fundamentals of Business Statistics - Hypothesis
No ratings yet
Fundamentals of Business Statistics - Hypothesis
25 pages
BA ZG524 Advanced Statistical Methods
No ratings yet
BA ZG524 Advanced Statistical Methods
7 pages
Reubs High School: Statistics Project
No ratings yet
Reubs High School: Statistics Project
13 pages
BA4101 - Statistics - For - Management - Revised
No ratings yet
BA4101 - Statistics - For - Management - Revised
21 pages
Factor Analysis Xid-2898537 1 BSCdOjdTGS
No ratings yet
Factor Analysis Xid-2898537 1 BSCdOjdTGS
64 pages
A Multi-Dimensional Data Model
No ratings yet
A Multi-Dimensional Data Model
37 pages
3529201
No ratings yet
3529201
3 pages
ExcelR - Excel1
No ratings yet
ExcelR - Excel1
178 pages
Data Visualization: For Analytics and Business Intelligence
No ratings yet
Data Visualization: For Analytics and Business Intelligence
49 pages
Data Analytics Using R (DA-R)
100% (1)
Data Analytics Using R (DA-R)
67 pages
Chapter 8 B - Trendlines and Regression Analysis
No ratings yet
Chapter 8 B - Trendlines and Regression Analysis
73 pages
MBA Question Bank Jan Feb 2023 June July 2023 I II Sem
No ratings yet
MBA Question Bank Jan Feb 2023 June July 2023 I II Sem
36 pages
Linear Programming Examples Assignment 1
No ratings yet
Linear Programming Examples Assignment 1
5 pages
Data Analytics Project
No ratings yet
Data Analytics Project
9 pages
Welcome: To All MBA Students
No ratings yet
Welcome: To All MBA Students
60 pages
Time Series Analysis: 1 Contributed by National Academy of Statistical Administration
No ratings yet
Time Series Analysis: 1 Contributed by National Academy of Statistical Administration
56 pages
ASap
No ratings yet
ASap
66 pages
UNIT - II Supply Chain Analytics
No ratings yet
UNIT - II Supply Chain Analytics
43 pages
Factor Analysis
67% (3)
Factor Analysis
25 pages
What Is A DSS?: Decision Support Systems Concepts, Methodologies, and Technologies: An Overview
No ratings yet
What Is A DSS?: Decision Support Systems Concepts, Methodologies, and Technologies: An Overview
9 pages
Business Decision Making Course Outline
No ratings yet
Business Decision Making Course Outline
3 pages
Chapter 4 Descriptive Data Mining
No ratings yet
Chapter 4 Descriptive Data Mining
6 pages
Equity of Cybersecurity in the Education System: High Schools, Undergraduate, Graduate and Post-Graduate Studies.
From Everand
Equity of Cybersecurity in the Education System: High Schools, Undergraduate, Graduate and Post-Graduate Studies.
Joseph O. Esin
No ratings yet
01 FM 2402
No ratings yet
01 FM 2402
56 pages
01 Financial Forecasting
No ratings yet
01 Financial Forecasting
30 pages
15-2 ASAP Business Analycs Excel Add-In
No ratings yet
15-2 ASAP Business Analycs Excel Add-In
15 pages
33 GM - ASAP-Association Rule Mining
No ratings yet
33 GM - ASAP-Association Rule Mining
64 pages
03 ASAP TimeSeriesForcasting - Day3 - 4-1
No ratings yet
03 ASAP TimeSeriesForcasting - Day3 - 4-1
35 pages
01 ASAP GM TimeSeriesForcasting - Day1 - 2 - Introduction
No ratings yet
01 ASAP GM TimeSeriesForcasting - Day1 - 2 - Introduction
66 pages
10 ASAP Advanced Statistics Dimension Reduction
No ratings yet
10 ASAP Advanced Statistics Dimension Reduction
8 pages
5 ASAP Advanced Statistics - ANOVA - Total
No ratings yet
5 ASAP Advanced Statistics - ANOVA - Total
127 pages
07 ASAP Business Analytics Probability
No ratings yet
07 ASAP Business Analytics Probability
74 pages
Principles of Fumigation (Basic) - CTS
100% (5)
Principles of Fumigation (Basic) - CTS
48 pages
First, Confirm The Following Items:: Setup Guide
No ratings yet
First, Confirm The Following Items:: Setup Guide
20 pages
Desvendando Os Segredos Da Linguagem Corporal by Allan Pease Barbara Pease
No ratings yet
Desvendando Os Segredos Da Linguagem Corporal by Allan Pease Barbara Pease
3 pages
Personal Statement For Master in Urban Management Program
100% (1)
Personal Statement For Master in Urban Management Program
1 page
Model CV Curriculum Vitae European Engleza
No ratings yet
Model CV Curriculum Vitae European Engleza
2 pages
20024923063_357806867_81790XXXXX_3_2025 (2)
No ratings yet
20024923063_357806867_81790XXXXX_3_2025 (2)
5 pages
CV Najim Square Pharma 4 Years Experience
No ratings yet
CV Najim Square Pharma 4 Years Experience
2 pages
Template For Lesson Plan MCT S
No ratings yet
Template For Lesson Plan MCT S
3 pages
President Uhuru Kenyatta's Speech During The Extraordinary Session of The Assembly of Heads of State and Government of The African Union, Addis Ababa, Ethiopia
No ratings yet
President Uhuru Kenyatta's Speech During The Extraordinary Session of The Assembly of Heads of State and Government of The African Union, Addis Ababa, Ethiopia
7 pages
Fausto Romitelli - Wikipedia PDF
No ratings yet
Fausto Romitelli - Wikipedia PDF
3 pages
Mid-Term Lesson 4... in Hongkong and Macao (1888)
No ratings yet
Mid-Term Lesson 4... in Hongkong and Macao (1888)
9 pages
Piles - Axial Capacity and Axial Response
100% (1)
Piles - Axial Capacity and Axial Response
41 pages
Mock Exam Practice - Listening
No ratings yet
Mock Exam Practice - Listening
5 pages
Apa Accountant Paper
No ratings yet
Apa Accountant Paper
9 pages
GIÁO ÁN THẢO LUẬN +THỐNG NHẤT +THU HOACH CDDH 2+NCBH3
No ratings yet
GIÁO ÁN THẢO LUẬN +THỐNG NHẤT +THU HOACH CDDH 2+NCBH3
12 pages
Math Checkpoint Paper
No ratings yet
Math Checkpoint Paper
16 pages
Ism Class Xi Marketing (812) (Fe) Qp & Ms (22-23) Set A
No ratings yet
Ism Class Xi Marketing (812) (Fe) Qp & Ms (22-23) Set A
13 pages
Constitutional Crisis
No ratings yet
Constitutional Crisis
3 pages
Sanjeev Sharma Resume
No ratings yet
Sanjeev Sharma Resume
5 pages
Cartography Fact Sheet 4.0 PDF
No ratings yet
Cartography Fact Sheet 4.0 PDF
2 pages
Business Language 7
No ratings yet
Business Language 7
8 pages
CASE No. 205 People of The Philippines vs. Anacleto Q. Olvis Et Al
No ratings yet
CASE No. 205 People of The Philippines vs. Anacleto Q. Olvis Et Al
2 pages
COBSE LIST
No ratings yet
COBSE LIST
27 pages
Chapter 4: Heat Transfer
100% (2)
Chapter 4: Heat Transfer
62 pages
SC ST Scholarship 23-24 Renewal On 04.12.2023 1
No ratings yet
SC ST Scholarship 23-24 Renewal On 04.12.2023 1
8 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

02-03 ASAP Business Analytics-2 Descriptive Statistics

Uploaded by

02-03 ASAP Business Analytics-2 Descriptive Statistics

Uploaded by

FOUNDATION TO DATA SCIENCE

Unit1: BASIC STATISTICS REFRESHER AND HOW

Prof. Dr. George Mathew

Practical: Excel File:02-03 ASAP Discriptive statistics_Excel Solver.xlsx

The geometric mean is often used in analyzing

The geometric mean tells us that annual returns grew

The z-score is often called the standardized

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.