PDF Notes
PDF Notes
Chapter 2
Purpose
In this chapter we will study several ways to summarize
data. In this chapter we discuss three complementary
aspects of data description: frequency distributions,
measures of center, and measures of dispersion. The three
help us “paint a picture” of our data by giving us
information about the shape, center, and spread.
1
Chapter Two
Types of Data
Quantitative (Numeric) variable have measurements that are
recorded on a naturally occurring numerical scale.
Discrete variables arise from a counting process.
Continuous variables arise from a measuring process.
2
Chapter Two
Graphical Methods
3
Chapter Two
Frequency Distribution
A frequency distribution is a table that displays the number of
occurrences (frequency) of each category or class in a data set.
Relative frequency =
7 Conduction 18 Conduction
8 Anomic 19 Broca’s
9 Conduction 20 Anomic
10 Anomic 21 Conduction
8
11 Conduction 22 Anomic
4
Chapter Two
Bar Chart
Bar chart – a series of bars, with each bars representing the class
frequency/class relative frequency/class percentage.
• Can be used for two or three variables simultaneously
Percentage
Class frequency frequency percentage 60
Anomic 10 0.455 45.5
Broca's 5 0.227 22.7 40
Conduction 7 0.318 31.8 20
TOTALS 22 1.000 100.0
0
Anomic Broca's Conduction
Type
Pie Chart
Pie chart – uses sections of a circle to represent the class
frequency/class relative frequency/class percentage.
22.7%
10
10
5
Chapter Two
11
11
Dotplot
A dotplot is a graph that is used to show the distribution of a
numeric variable when the sample size is small.
Example: A group of thirty-six 2-year old sows of the same breed were
bread to Yorkshire boars. The number of piglets surviving to 21 days of
age was recorded for each sow
12
12
6
Chapter Two
Histogram
A histogram is a graphical display that results when we
replace the dots of a dotplot with bars.
In histograms, the bars usually touch. If there is a space, it is not
arbitrary like in a bar chart.
Example: A group of thirty-six 2-year old sows of the same breed
were bread to Yorkshire boars. The number of piglets surviving to 21
days of age was recorded for each sow
13
13
14
14
7
Chapter Two
Example CK Serum:
15
15
25 classes 5 classes
16
16
8
Chapter Two
17
17
18
18
9
Chapter Two
19
19
20
20
10
Chapter Two
21
21
22
22
11
Chapter Two
Boxplots
23
23
Terminology
PERCENTILE: the pth percentile is a value such that p% of
the observations fall below (or at) that value and (100-p)% fall
above (or at) that value
24
24
12
Chapter Two
Terminology
INTERQUARTILE RANGE: describes the middle 50% of
data.
Robust measure of variability(resistant to extreme
values)
IQR = Q3 - Q1
25
25
Terminology
Outlier- a data point that differs so much from the rest of
the data.
STAT 205 26
26
13
Chapter Two
27
27
28
28
14
Chapter Two
29
29
𝐈𝐐𝐑 = 77 − 69 = 8
( ) ( )
Q1 = = 69 Q3 = = 77
30
30
15
Chapter Two
Boxplot from R
IQR
5-number summary
31
62, 69, 74, 77, 80
31
Box Plot
32
32
16
Chapter Two
Q1 Q2 Q3 Q1 Q2 Q3 Q1 Q2 Q3
33
33
DESCRIPTIVE STATISTICS:
MEASURES OF CENTER
Section 2.3
34
34
17
Chapter Two
Definitions
Statistics: A numerical measure that is calculated from
the sample data.
35
35
36
36
18
Chapter Two
Mean
Mean of a variable is computed by determining the sum of all the values of
the variable in the data set divided by the number of observations.
∑ 𝑦 𝑦 + 𝑦 + 𝑦 +⋯+ 𝑦
𝑦= =
𝑛 𝑛
where
𝑦 is the 𝑖 value of variable Y
𝑛 is the sample size
37
37
Median
Median: the middle value of the data set. (At most 50% of data is
greater than M and at most 50% of data is less than M)
Steps to calculate M:
o Order n data values from smallest to largest.
o Observation in position in the ordered list is the median M
38
38
19
Chapter Two
Mode
The mode of a variable is the most frequent observation
of the variable that occurs in the data set.
39
39
Find the mean, median, and mode of this dataset (by hand).
40
40
20
Chapter Two
1 2 10 11 13 19
1 2 10 11 13 19 100
41
41
Extreme Values
MEAN is STRONGLY AFFECTED by extreme
values
42
42
21
Chapter Two
Shapes of Distributions
43
43
Which To Use?
The most appropriate measure of central tendency depends
on the data set:
Skewed
Categorical
44
44
22
Chapter Two
MEASURES OF DISPERSIONS
2.4 & 2.6
45
45
Measures of Variation
Measures of dispersion give us an idea about the
spread of a distribution. Are the observations all
nearly equal or do they differ substantially from each
other.
Measures of Dispersion
Range
Standard deviation & Variance
IQR
46
46
23
Chapter Two
Range
Simplest measure of variation.
RANGE = largest value – smallest value
Does not consider how the values cluster or distribute between the
extremes.
Example: The data below represents the waiting time at a local urban
outpatient facility. Waiting time is measured from the time when the patient
registered to the time when he or she received the care service. Data was
collected for a sample of 10 patients. Determine the range.
Values 29 31 35 39 39 40 43 44 44 52
Ranks 1 2 3 4 5 6 7 8 9 10
47
47
∑ 𝑦 −𝑦
The sample variance (𝑆 ) is 𝑠 =
𝑛−1
where,
𝑦 is the sample mean
𝑦 is the 𝑖 value of variable Y
𝑛 is the sample size
𝑑𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛 = 𝑦 − 𝑦 (difference between the observation and sample mean)
48
24
Chapter Two
49
49
INTERQUARTILE RANGE:
Describes the middle 50% of data.
IQR = Q3 - Q1
50
50
25
Chapter Two
Extreme Values
Range and Standard Deviation are AFFECTED by
extreme values
51
51
52
52
26
Chapter Two
53
53
Example
The Health and Nutrition Examination Study of 1976-1980 (HANES)
studied the heights of adults (aged 18-24) is bell-shaped with a
Women Mean (𝒚): 65.0 inches standard deviation (s): 2.5 inches
Men Mean (𝒚): 70.0 inches standard deviation (s): 2.8 inches
Approximately 68%:
Approximately 95%:
Approximately 99.7%:
61.6 64.4 67.2 70 72.8 75.6 78.4
54
54
27
Chapter Two
Summary
The End!!
55
55
28