0% found this document useful (0 votes)
10 views33 pages

Descriptive Statistics - Numerical Measure

Descriptive stats with professor....

Uploaded by

pratham
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views33 pages

Descriptive Statistics - Numerical Measure

Descriptive stats with professor....

Uploaded by

pratham
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 33

Data Science

for Managerial
Decisions

Jasashwi Mandal
NITIE Mumbai
Descriptive Statistics: Numerical
Measures
▪ Measures of Location
▪ Measures of Variability
Measures of Location
▪ Mean
▪ Median
▪ Mode
▪ Weighted Mean
▪ Percentiles
▪ Quartiles
Mean
▪ Most important measure of location is the mean
▪ Provides a measure of central location
▪ The mean of a data set is the average of all the data values
▪ The sample mean 𝑥ҧ is the point estimator of the population
mean 𝜇
Sample Mean 𝑥ҧ
Sum of the values of the
n observations

Number of observations
in the sample
Population mean 𝜇
Sum of the values of the
n observations

σ 𝑥𝑖
𝜇=
𝑁
Number of observations
in the population
Sample Mean 𝑥ҧ
▪ Example: Apartment Rents
▪ Seventy apartments were randomly sampled in a small college town The
monthly rent prices for these apartments are listed below
Sample Mean 𝑥ҧ
▪ Example: Apartment Rents
▪ Seventy apartments were randomly sampled in a small college town The monthly rent prices
for these apartments are listed below
Median
▪ The median of a data set is the value in the middle when the data items are
arranged in ascending order.
▪ Whenever a data set has extreme values, the median is the preferred measure
of central location.
▪ The median is the measure of location most often reported for annual income
and property value data.
▪ A few extremely large incomes or property values can inflate the mean.
Median
▪ Here we have an odd number of observations:
7 observations:
26, 18, 27, 12, 14, 27, and 19.
Rewritten in ascending order:
12, 14, 18, 19, 26, 27, and 27.

▪ The median is the middle value in this list, so the median = 19.
Median
▪ Here we have an even number of observations:
8 observations:
26, 18, 27, 12, 14, 27, 19, and 30.
Rewritten in ascending order:
12, 14, 18, 19, 26, 27, 27, and 30.

▪ The median is the average of the two middle values in this list, so the
median = (19 + 26)/2 = 22.5.
Median
▪ Example: Apartment Rents
Notice that there are 70 values provided which are in ascending order.
▪ Averaging the 35th and 36th values: Median (575 + 575)/2 = 575.
Mode
▪ The mode of a data set is the value that occurs with the greatest frequency.
▪ The greatest frequency can occur at two or more different values.
▪ If the data have exactly two modes, the data are bimodal.
▪ If the data have more than two modes, the data are multimodal.

The mode is 550.


Weighted Mean

▪ In some instances the mean is computed by giving each observation a weight that
reflects its relative importance.
▪ The choice of weights depends on the application.
▪ The weights might be the number of credit hours earned for each grade, as in GPA.
▪ In other weighted mean computations, quantities such as pounds, dollars, or volume are
frequently used.
Weighted Mean
Weighted Mean
▪ Ron Butler, a home builder, is looking over the expenses he incurred for a house he just
built.
▪ For the purpose of pricing future projects, he would like to know the average wage ($/hour)
he paid the workers he employed.
▪ Listed below are the categories of workers he employed, along with their respective wage
and total hours worked.

Worker Wage ($/hr) Total Hours


Carpenter 21.60 520

Electrician 28.72 230

Laborer 11.80 410

Painter 19.75 270

Plumber 24.16 160


Weighted Mean
▪ Example: Construction Wages

Equally-weighted (simple) mean = $21.21


Percentiles
▪ A percentile provides information about how the data are spread over the
interval from the smallest value to the largest value.
▪ Admission test scores for colleges and universities are frequently reported in
terms of percentiles.
▪ The 𝑝th percentile of a data set is a value such that at least p percent of the
items take on this value or less and at least (100 – 𝑝) percent of the items take
on this value or more.
Percentiles
i. Arrange the data in ascending order (smallest value to largest value).
ii. Compute an index i
𝒑
𝒊 = 𝒏
𝟏𝟎𝟎
where p is the percentile of interest and n is the number of observations
iii. (a) If 𝑖 is not an integer, round up. The next integer greater than 𝑖 denotes the
position of the 𝑝𝑡ℎ percentile.
(b) If 𝑖 is an integer, the 𝑝𝑡ℎ percentile is the average of the values in positions
𝑖 and 𝑖 + 1.
80th Percentile
Example: Apartment Rents (There are 70 values provided which are in ascending order.)

𝒊 = (𝒑/𝟏𝟎𝟎)𝒏 = (𝟖𝟎/𝟏𝟎𝟎)𝟕𝟎 = 𝟓𝟔
Averaging the 56th and 57th data values:
80th Percentile = (635 + 649)/2 = 642
80th Percentile
Example: Apartment Rents (There are 70 values provided which are in ascending order.)

“At least 80% of the “At least 20% of the


items take on a items take on a
value of 646 or less.” value of 646 or more.”
Quartiles

▪ Quartiles are specific percentiles.


▪ First Quartile = 25th Percentile
▪ Second Quartile = 50th Percentile = Median
▪ Third Quartile = 75th Percentile
Third Quartile
Example: Apartment Rents (There are 70 values provided which are in ascending order.)
▪ Third quartile = 75th percentile
▪ i = (p/100)n = (75/100)70 = 52.5 = 53
▪ Third quartile = 625
Measures of Variability

▪ Range
▪ Interquartile Range
▪ Variance
▪ Standard Deviation
▪ Coefficient of Variation
Range
▪ The range of a data set is the difference between the largest and smallest data value.
▪ It is the simplest measure of variability.
▪ It is very sensitive to the smallest and largest data values.

▪ Range = largest value – smallest value = 715 – 525 = 190.


Interquartile Range
▪ The interquartile range of a data set is the difference between the third
quartile and the first quartile.
▪ It is the range for the middle 50% of the data.
▪ It overcomes the sensitivity to extreme data values.
Interquartile Range
▪ 3rd Quartile (Q3) = 625
▪ 1st Quartile (Q1) = 545

▪ IQR = 625 – 545 = 80


Variance
▪ The variance is a measure of variability that utilizes all the data.
▪ It is based on the difference between the value of each observation (xi)
and the mean (𝑥ҧ for a sample, m for a population).
▪ The variance is useful in comparing the variability of two or more
variables.
Sum of deviations about the mean ?
Sum of squared deviations about the mean ?
Variance
▪ The variance is the average of the squared deviations between each
data value and the mean.
▪ The variance of a sample is:

▪ The variance for a population is:


Standard Deviation
▪ The standard deviation of a data set is the positive square root of the
variance.
▪ It is measured in the same units as the data, making it more easily interpreted
than the variance.
▪ The standard deviation of a sample is:

▪ The standard deviation of a population is:


Coefficient of Variation
▪ The coefficient of variation indicates how large the standard deviation is
relative to the mean.
▪ The coefficient of variation of a sample is:

▪ The coefficient of variation of a population is:


Sample Variance, Standard Deviation, and
Coefficient of Variation
Example: Apartment Rents

• The variance is:

• The standard deviation is:

• The coefficient of variation is:

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy