0% found this document useful (0 votes)
43 views

Chapter 02-Describing Distributions With Numbers

The document discusses various methods for numerically describing data distributions including measures of center such as the mean and median, and measures of variability such as range, interquartile range, and standard deviation. These numerical summaries are used to characterize distributions and can be combined in tools like five-number summaries and boxplots.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
43 views

Chapter 02-Describing Distributions With Numbers

The document discusses various methods for numerically describing data distributions including measures of center such as the mean and median, and measures of variability such as range, interquartile range, and standard deviation. These numerical summaries are used to characterize distributions and can be combined in tools like five-number summaries and boxplots.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 21

Chapter 02: Describing Distributions

with Numbers

Graphical displays provide a visual impression of the data

and often reveal features that summary numerical descrip-

tive measures cannot. On the other hand, numerical sum-

maries provide information that graphical displays cannot

easily provide. Thus, we pair numerical descriptions with

graphical displays to describe data more fully.

Learning Objectives:

• Find the mean and median of a set of observations and

interpret it as the center of the set.

• Compare the mean and median values of a data set and

1
distinguish their meanings.

• Calculate and use quartiles to describe the variation of

a data set.

• Use the five-number summary (the minimum, the max-

imum, the quartiles, and the median) and box-plot to

characterize a distribution.

• Calculate and use the standard deviation to describe the

variation of a data set.

• Discriminate between the five-number summary and the

mean and standard deviation for describing the distri-

bution of data, depending on features of the data set.

• Interpret presentations of descriptive statistics output

2
by using Minitab.

Example: Thle American Community Survey asks, among

much else, workers’ travel times to work. Here are the travel

times in minutes for 15 workers in North Carolina, chosen

at random by the U.S. Census Bureau:

20, 35, 8, 70, 5, 15, 25, 30, 40, 35, 10, 12, 40, 15, 20

We aren’t surprised that most people estimate their travel

time in multiples of five minutes. Here is a stemplot of these

data:

3
The distribution is single-peaked and right-skewed. The

longest travel time (70 minutes) may be an outlier. Our

goal in this chapter is to describe with numbers the center

and variability of this and other distributions.

Measure of Center

Two numbers can be used to describe the centre: mean

and median

• The (sample) mean (x̄) is the average. To find the mean

of a variable (x), add their values and divide by the

number of observations, n (sample size).

• The median, M , is the midpoint of a distribution, the

number such that half of the observations are smaller

and the other half are larger (i.e. 50% on each side).
4
Because the mean cannot resist the influence of extreme

observations, it is not a resistant measure of center.

To find the median of a distribution:

1. Arrange all observations in order of size, from small-

est to largest.

2. If the number of observations n is odd, the median,

M is the center observation in the ordered list. If the

number of observations n is even, the median, M is

midway between the two center observations in the

ordered list.

Note: You can always locate the median in the ordered

list of observations by counting up (n + 1)/2 observations

from the start of the list.


5
Eg: Calculate the mean and median of the weight (in lbs)
di↵erences (before - after) for a random sample of gym-
goers: 5, 3, 0, -1, 3

Eg: Calculate the mean and median of the weight (in lbs)
di↵erences (before - after) for a random sample of gym-
goers: 5, 3, 0, -1, 3, -2

6
Eg: Find the median.

7
Eg: Let’s consider the following graphical summaries of ran-

dom sample of n students’ marks in a quiz. Find the mean

and median approximately.

8
9
Comparing the Mean and the Median

The mean and median of a roughly symmetric distribu-

tion are close together. If the distribution is exactly sym-

metric, the mean and median are exactly the same. In a

skewed distribution, the mean is usually farther out in the

long tail than the median

10
Measure of Variability

• A measure of center alone can be misleading.

Eg: Consider the following two data sets.


– Data set 1: 2, -1, -2, 0, 1

– Data set 2: -500, 0, 300, 500, -300

• A useful numerical description of a distribution requires

both a measure of center and a measure of spread. We

could look at the largest and smallest values, but like

the mean, they are (obviously) a↵ected by extreme val-

ues—so we will examine other percentiles.

11
• A percentile provides information about how the data

are spread over the interval from the smallest value to

the largest value. The pth percentile of a data set is a

value such that at least p% of the items take on this

value or less (and at least (100 p)% percent of the

items take on this value or more).

12
– 25th percentile (P25) ⌘ first quartile (Q1)

– 75th percentile (P75) ⌘ third quartile (Q3)

Three statistics that describe variability (i.e. spread): Range,

Inter-quartile range (IQR), standard deviation

1. Range = max min.

2. Inter-quartile range (IQR) = Q3 Q1. This is the spread

of the middle 50% of the data.

13
Note: To calculate the quartiles (by hand for a smaller

data set):

• Arrange the observations in increasing order and lo-

cate the median, M .

• The first quartile, Q1, is the median of the observa-

tions located to the left of the median in the ordered

list.

• The third quartile, Q3, is the median of the observa-

tions located to the right of the median in the ordered

list.

14
Five-Number Summary

• The minimum and maximum values alone tell us lit-

tle about the distribution as a whole. Likewise, the

median and quartiles tell us little about the tails of

a distribution.

• To get a quick summary of both center and spread,

combine all five numbers.

The five number summary consists of the

Min, Q1, median, Q3, Max

15
3. The most common measure of spread looks at how far

each observation is from the mean. This measure is

called the standard deviation (s). The variance, s2,

set of observations is an average of the squares devia-

tions of the observations from their mean. Note that

the standard deviation, s, is the square root of the vari-

ance, s2.

n
X
s2 = (xi x̄)2/(n 1)
i=1

Properties of s.

• s is always zero or greater than zero. s = 0 only when

there is no variability. This happens only when all ob-

servations have the same value. Otherwise, s > 0.


16
• As the observations become more variable about their

mean, s gets larger.

• s has the same units of measurement as the original

observations.

• Like the mean x̄, s is not resistant. A few outliers can

make s very large.

• The sum of the (xi x̄) is always zero. So this becomes

a “restriction” - once we know (n–1) of the data values

and the value of the sample mean, we can calculate the

value of the data point we do not know. (n 1) is called

the degrees of freedom.

17
Boxplots

The five-number summary divides the distribution roughly

into quarters. This leads to a new way to display quantita-

tive data, the boxplot.

How to Make a Boxplot:

• A central box spans the quartiles Q1 and Q3.

• A line in the box marks the median M .

• Lines extend from the box out to the smallest(min) and

largest (max) observations.

Note: Use Minitab to draw a boxplot

18
Spotting suspected outliers and modified box-

plots

In addition to serving as a measure of spread, the in-

terquartile range (IQR) is used as part of a rule of thumb

for identifying outliers.

The 1.5 ⇥ IQR Rule for Outliers*

Call an observation a suspected outlier if it falls more

than 1.5 ⇥ IQR above the third quartile or below the first

quartile. i.e. below Q1 1.5 ⇥ IQR and above Q3 + 1.5 ⇥

IQR

19
Choosing a graph: histogram vs. stem-plot vs.

box-plot

• Use histogram when you have a larger data set.

• Stemplots (and dot-plots) are only for smaller data sets

(not as popular as histogram, but useful).

• Boxplot is useful to compare data in groups (i.e. One

quantitative variable and one categorical variable)

20
Choosing measures of center and variability

We now have a choice between two descriptions for center

and variability:

1. mean and standard deviation for symmetric distribu-

tions

2. median and interquartile range for skewed distributions

21

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy