0% found this document useful (0 votes)
12 views53 pages

Describing Data Numerical

The document discusses measures of central tendency, including mean, median, and mode, explaining their definitions, calculations, and properties. It also covers the concepts of dispersion and variability, detailing measures such as range, variance, and standard deviation. Additionally, it introduces the geometric mean and its applications, emphasizing the importance of selecting appropriate measures based on data characteristics.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views53 pages

Describing Data Numerical

The document discusses measures of central tendency, including mean, median, and mode, explaining their definitions, calculations, and properties. It also covers the concepts of dispersion and variability, detailing measures such as range, variance, and standard deviation. Additionally, it introduces the geometric mean and its applications, emphasizing the importance of selecting appropriate measures based on data characteristics.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 53

for

Business and
Economics

CHAPTER 3

DESCRIBING DATA: NUMERICAL


Central Tendency

In a raw data, frequency distribution or a


chart, there are certain values which occur
frequently, and there are some other
values which occur less frequently. The
value, which appears frequently, is more or
less a central value and the data is heavily
concentrated around this central value.
Thus the central tendency of a data can be
defined as the tendency of clustering
around of different values around a central
value which is a representative of all other
values in the data.
Describing Data Numerically

Describing Data
Numerically

Central Variation
Tendency
Arithmetic Rang
Mean e
Media Interquartile
n Range
Mode Varianc
e
Standard
Deviation
Coefficient of
Variation
Measures of Central Tendency

Overview
Central Tendency

Mean Median Mode

x i
x  i1
n
Arithmetic Midpoint of Most
average ranked frequently
values observed
value
The Arithmetic Mean is It is calculated by
the most widely used measure summing the values
of location and shows the and dividing by the
central value of the data. number of values.

A verag e
The major characteristics of the mean are: J oe

It requires the interval scale.


All values are used.
It is unique.
The sum of the deviations from the mean is 0.

Characteristics of the Mean


For ungrouped data, the
Population Mean is the
sum of all the population
  X
values divided by the total
number of population N
values:

where
µ is the population mean
N is the total number of observations.
X is a particular value.
 indicates the operation of adding.

Population Mean
A Parameter is a measurable characteristic of a
population.

The Kiers 56,000


family owns 42,000
four cars. The
23,000
following is the
current mileage 73,000
on each of the
four cars.
Find the mean mileage for the cars.

  X

56,000  ...  73,000
48,500
N 4
Example 1
For ungrouped data, the Sample Mean
is the sum of all the sample values
divided by the number of sample values:

X
X 
n

where n is the total number of


values in the sample.

Sample Mean
A statistic is a measurable characteristic of a sample.

A sample of
five 14.0,
executives
15.0,
received the
following 17.0,
bonus last 16.0,
year ($000): 15.0

X 14.0  ...  15.0 77


X    15.4
n 5 5

Example 2
Properties of the Arithmetic Mean
Every set of interval-level and ratio-level data has a
mean.
All the values are included in computing the mean.
A set of data has a unique mean.
The mean is affected by unusually large or small
data values. (Example- 3, 4, 5, 7, 8, 45)
The arithmetic mean is the only measure of location
where the sum of the deviations of each value from
the mean is zero.

Properties of the Arithmetic Mean


Consider the set of values: 3, 8, and 4.
The mean is 5. Illustrating the fifth
property

( X  X ) (3  5)  (8  5)  (4  5) 0

Example 3
Mean as a balance point
The Weighted Mean of a set of
numbers X1, X2, ..., Xn, with
corresponding weights w1, w2, ...,wn,
is computed from the following
formula:

( w1 X 1  w2 X 2  ...  wn X n )
Xw 
( w1  w2  ...wn )

Weighted Mean
During a one hour period on a
hot Saturday afternoon cabana
boy Chris served fifty drinks.
He sold five drinks for $0.50,
fifteen for $0.75, fifteen for
$0.90, and fifteen for $1.10.
Compute the weighted mean of
the price of the drinks.

5($0.50)  15($0.75)  15($0.90)  15($1.15)


Xw 
5  15  15  15
$44.50
 $0.89
50
Example 4
The Median is the
midpoint of the values after
they have been ordered from
the smallest to the largest.

There are as many


values above the
median as below it in
the data array.

The Median
Finding the Median
The location of the median:

n 1
Median position  position in the ordered data
2
 If the number of values is odd, the median is the
middle number.
 If the number of values is even, the median is the
average of the two middle numbers.

Note that n  1 is not the value of the median,


2
only the position of the median in the ranked data.
Properties of the
Median

There is a unique median for each data set.


It is not affected by extremely large or small
values and is therefore a valuable measure of
location when such values occur.
Itcan be computed for ratio-level, interval-level,
and ordinal-level data.

Properties of the Median


The Mode is another measure of
location and represents the value of
the observation that appears most
frequently.
Example 6: The exam scores for ten students
are: 81, 93, 84, 75, 68, 87, 81, 75, 81, 87.
Because the score of 81 occurs the most often, it
is the mode.

The Mode: Example 6


Properties of the
Mode
Not affected by extreme values.
Used for either numerical or
categorical data.
There may be no mode.
 Data can have more than one mode.
If it has two modes, it is referred to as
bimodal, three modes, tri-modal, and
the like.

The Mode: Example 6


Mean of Grouped Data
Median of Grouped Data

Where,
 L is the lower limit of the class containing the
median.
 n is the total number of frequencies.
 f is the frequency in the median class.
CF is the cumulative number of frequencies in all the
classes preceding the class containing the median.
 i is the width of the class in which the median lies.
Mode of Grouped Data
Symmetric distribution: A distribution having the
same shape on either side of the center

Skewed distribution: One whose shapes on either


side of the center differ; a nonsymmetrical distribution.

Can be positively or negatively skewed, or bimodal


The Relative Positions of the Mean, Median, and Mode
Zero skewness

Mean=Median=Mode

M ea n
M ed ia n
M ode

The Relative Positions of the Mean, Median, and Mode: Symmetric Distribution
Positively Skewed: Mean and median are to the right of the mode.

Mean>Median>Mode

M ode M ea n
M ed ia n
The Relative Positions of the Mean, Median, and Mode:
Right Skewed Distribution
Negatively Skewed: Mean and Median are to the left of the Mode.

Mean<Median<Mode

M ea n M ode
M ed ia n

The Relative Positions of the Mean, Median, and


Mode: Left Skewed Distribution
Which measure of location is the
“best”?

 Mean is generally used, unless extreme


values (outliers) exist
 Then median is often used, since the
median is not sensitive to extreme values.
 Example: Median home prices may be

reported for a region – less sensitive to


outliers
The Geometric Mean
(GM) of a set of n numbers
is defined as the nth root
of the product of the n
numbers. The formula is:

GM  n ( X 1)( X 2 )( X 3)...( Xn )

The geometric mean is used to


average percents, indexes, and
relatives.
Geometric Mean
The increase in salary for two years s were 5 and 15
percent.
The arithmetic mean is (5+15)/2 =10.0.
The geometric mean is -

GM  (1.05)(1.15)
1.09886
The GM gives a more conservative
profit figure because it is not
heavily weighted by the rate of 15
percent.
Example 7
Example: The profit earned by Atkins
Construction Company on four recent
project were 30%, 20%, -40%, 200%.

Example 7
Another use of the Grow th in Sales 1999-2004

geometric mean is to 50

determine the percent

Sales in Millions($)
40

increase in sales, 30

production or other 20

10
business or economic 0
series from one time 1999 2000 2001 2002 2003 2004

period to another. Year

(Value at end of period)


GM n 1
(Value at beginning of period)

Geometric Mean continued


The total number of females enrolled in American
colleges increased from 755,000 in 1992 to 835,000 in
2000. That is, the geometric mean rate of increase is
1.27%.

835,000
GM 8  1 .0127
755,000

Example 8
Dispersion refers to the deviation
(i.e. deviation of the values in a
set of data from a central value.)

Marks obtained by A 63, 74, 56, 44, 66,


65, 80, 43
Marks obtained by B 61, 54, 56, 57, 60,
59, 55, 62

Measures of Dispersion
Why study dispersion?

 It indicates how close or far apart the


individual values from the average. Such it
indicates how reliable the average is? Or
how representative the average is?
 Little deviation means the mean is
highly representative & vice-versa.

 Compares the spread in two or more


distribution.
Measures of Variability

Variation

Range Interquartil Variance Standard Coefficient


e Deviation of
Range Variation
 Measures of
variation give
information on the
spread or variability
of the data values.
Same center,
different
variation
Range

Simplest measure of variation


Difference between the largest and the
smallest observations:

Range = Xlargest – Xsmallest


Example:

0 1 2 3 4 5 6 7 8 9 10 11 12
13 14
Range = 14 - 1 = 13
The following represents the current year’s Return on
Equity of the 25 companies in an investor’s portfolio.

-8.1 3.2 5.9 8.1 12.3


-5.1 4.1 6.3 9.2 13.3
-3.1 4.6 7.9 9.5 14.0
-1.4 4.8 7.9 9.7 15.0
1.2 5.7 8.0 10.3 22.1

Highest value: 22.1 Lowest value: -8.1

Range = Highest value – lowest value


= 22.1-(-8.1)
= 30.2
Disadvantages of the Range

Ignores the way in which data are


distributed
7 8 9 10 11 7 8 9 10 11
12
Range = 12 - 7 = 5 12Range = 12 - 7 = 5

Sensitive to outliers
1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,3,3,3,3,4
Range = 5 - 1 = 4
,5
1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,3,3,3,3,4
Range = 120 - 1 =
,120 119
Mean The main features of the
Deviation mean deviation are:
The arithmetic  All values are used in the
mean of the
calculation.
absolute values
 It is not unduly influenced by
of the
deviations from large or small values.
 The absolute values are
the arithmetic
mean. difficult to manipulate.

MD = X -X
n
Mean Deviation
The weights of a sample of crates containing books
for the bookstore (in pounds ) are:
103, 97, 101, 106, 103
Find the mean deviation.

X = 102

The mean deviation is:

X X 103  102  ...  103  102


MD  
n 5
1  5 1  4 1
 2.4
5
OC Airport 20, 40, 50, 60,
80
LAX Airport 20, 49, 50, 51,
80
Advantage:
Ituses all the values in the computations.
Easy to understand.

Drawback:
Absolutes values are difficult to work with.

Example 7
Variance: the arithmetic mean of the squared
deviations from the mean. It is non-negative
and is zero if all observations are the same.

Standard deviation: The square root of the


variance.

Variance and standard Deviation


The major characteristics of the
Population Variance are:

Not influenced by extreme values.


The units are awkward, the square of the

original units.
All values are used in the calculation.

Population Variance
Population Variance formula:

 
=  (X - ) 2
N
X is the value of an observation in the population
mu is the arithmetic mean of the population
N is the number of observations in the population

Traffic citations issued during last five


months – 38, 26, 13, 41, 22
Population Standard Deviation formula:


2
Most commonly used measure of
variation
Shows variation about the mean
Has the same units as the original
data Variance and standard deviation
Sample variance (s )
2 (X - X ) 2
s2 = n -1

Why is this change made in denominator?

Sample standard deviation (s)

2
s s
Sample variance and standard deviation
The hourly wages earned by a sample of five students are:
$7, $5, $11, $8, $6.
Find the sample variance and standard deviation.
X 37
X   7.40
n 5

X  X  7  7.4   ...  6  7.4 


2 2 2
2
s  
n 1 5 1
21.2
 5.30
5 1

2
s s  5.30 2.30
Chebyshev’s theorem: For any set of
observations, the minimum proportion of the values
that lie within k standard deviations of the mean is at
least:
1
1 2
k
where k is any constant greater than 1.

Chebyshev’s theorem
1. A population data set of size N = 500
has mean μ = 5.2 and standard
deviation σ = 1.1. Find the minimum
number of observations :
between 3 and 7.4;
between 1.9 and 8.5.

2. A sample data set of size n = 30 has


mean x=6 and standard deviation s = 2.
Find observations that can lie outside
the interval (2,10)?

Chebyshev’s theorem
The Empirical Rule

If the data distribution is bell-


shaped, then the interval:
 μ 1σ contains about 68% of the values
in the population or the sample

68%

μ
μ 1σ
The Empirical Rule

 μ 2σ contains about 95% of the values


in
the population or the sample
μ 3σ
 contains about 99.7% of the
values in the population or
the sample

95% 99.7%

μ 2σ μ 3σ
The Empirical Rule

Sample mean = 500, SD = 20..Calculate

Remember, Range/6 = SD
Mean & SD of Grouped Data

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy