Math 101 Statistics
Math 101 Statistics
Introduction
Definition of Statistics
• In its plural sense, statistics is a set of numerical data (e.g., vital statistics in a
beauty contest, monthly sales of a company, daily P-$ exchange rate).
• In its singular sense, Statistics is that branch of science which deals with the
collection, presentation, analysis, and interpretation of data.
• provides comparison
• explains action that has taken place
• justifies a claim or assertion
• predicts future outcome
• estimates unknown quantities
• In the social sciences, it can guide and help researchers support theories and
models that cannot stand on rationale alone.
2 CHAPTER 1. INTRODUCTION
Fields of Statistics
Descriptive Inferential
• A bowler wants to find his bowling • A bowler wants to estimate his chance
average for the past 12 games. of winning a game based on his current
season averages and the averages of
his opponents.
Example: In order to estimate the true proportion of students at a certain college who
smoke cigarettes, the administration polled a sample of 200 students and
determined that the proportion of students from the sample who smoke
cigarettes is 0.12. Identify the parameter and the statistic.
CHAPTER 2
2.1 PRELIMINARIES
Classification of Variables
1. Discrete vs Continuous
2. Qualitative vs Quantitative
Levels of Measurement
Examples:
The ordinal level of measurement contains the properties of the nominal level,
and in addition, the numbers assigned to categories of any variable may be
ranked or ordered in some low-to-high-manner.
Examples:
3. Interval Level
The interval level is that which has the properties of the nominal and ordinal
levels, and in addition, the distances between any two numbers on the scale
are of known sizes. An interval scale must have a common and constant unit
of measurement. Furthermore, the unit of measurement is arbitrary and there
is no “true zero” point.
Examples:
IQ
Temperature (in Celsius)
4. Ratio Level
The ratio level of measurement contains all the properties of the interval level,
and in addition, it has a “true zero” point.
.
Examples:
Classification of Data
Example: The publications of the National Statistics Office are primary sources and
all subsequent publications of other agencies are secondary sources.
a. Internal data - information that relates to the operations and functions of the
organization collecting the data
Example: The sales data of SM is internal data for SM but external data for any other
organization such as Robinson’s.
4. Use of existing studies - e.g., census, health statistics, and weather bureau
reports
Two types:
Definition. Survey sampling is the process of obtaining information from the units in
the selected sample.
• reduced cost
• greater speed
• greater scope
• greater accuracy
Definition. A sampling procedure that gives every element of the population a (known)
nonzero chance of being selected in the sample is called probability
sampling. Otherwise, the sampling procedure is called non-probability
sampling.
Definition. The target population is the population from which information is desired.
Definition. The sampled population is the collection of elements from which the sample
is actually taken.
Definition. The population frame is a listing of all the individual units in the
population.
ELEMENTARY STATISTICS 9
1. purposive sampling - sets out to make a sample agree with the profile of
the population based on some pre-selected
characteristic
3. Systematic sampling
4. Cluster sampling
5. Multistage sampling
Simple random sampling (SRS) is a method of selecting n units out of the N units in
the population in such a way that every distinct sample of size n has an equal chance
of being drawn. The process of selecting the sample must give an equal chance of
selection to any one of the remaining elements in the population at any one of the n
draws.
Step 1: Make a list of the sampling units and number them from 1 to N.
Step 2: Select n numbers from 1 to N using some random process, for example, the
table of random numbers. n is distinct for SRSWOR , not necessarily distinct
for SRSWR.
Step 3: The sample consists of the units corresponding to the selected random
numbers.
Advantages
• The theory involved is much easier to understand than the theory behind other
sampling designs.
Disadvantages
• The sample chosen may be widely spread, thus entailing high transportation costs.
Step 1: Divide the population into strata. Ideally, each stratum must consist of more
or less homogeneous units.
Step 2: After the population has been stratified, a simple random sample is selected
from each stratum.
Advantages
• It allows for more comprehensive data analysis since information is provided for
each stratum.
• It is administratively convenient.
Disadvantages
• The stratification of the population may require additional prior information about
the population and its strata.
12 CHAPTER 2. COLLECTION & PRESENTATION OF DATA
Method A
Method B
Advantages
• It is easier draw the sample and often easier to execute without mistakes than
simple random sampling.
• It is possible to select a sample in the field without a sampling frame.
• The systematic sample is spread evenly over the population.
Disadvantages
• If periodic regularities are found in the list, a systematic sample may consist only
of similar types. (Example: Store sales over seven days of the week – estimating
total sales based on a systematic sample every Tuesday would be unwise.)
• Knowledge of the structure of the population is necessary for its most effective
use.
ELEMENTARY STATISTICS 13
• Cluster Sampling
Clusters may be of equal or unequal size. When all of the clusters are of the same
size, the number of elements in a cluster will be denoted by M while the number of
clusters in the population will be denoted by N.
Sample-Selection Procedure
Advantages
Disadvantages
• Multistage Sampling
Advantages
Disadvantages
• Estimation procedure is difficult, especially when the primary stage units are not
of the same size.
Textual Presentation
Example
At last count, 38 airlines were operating Boeing 707’s, 720’s, and 727’s over the
world’s airlines. The far-flung Boeing fleet has now logged an estimated
1,803,704,000 miles (22,855,948,000 kms.) and has massed approximately
4,096,000 revenue flight hours. Passenger totals stand at upwards of 71.6 million.
Advantages
Disadvantages
• When a large mass of quantitative data are included in a text or paragraph, the
presentation becomes almost incomprehensible
• Paragraphs can be tiresome to read especially if the same words are repeated so
many times
Tabular Presentation
Advantages
2. Box Head - the portion of the table that contains the column heads which
describe the data in each column, together with the needed
classifying and qualifying spanner heads.
3. Stub - the portion of the table usually comprising the first column on
the left, in which the stubhead and row captions, together with
the needed classifying and qualifying center head and subheads
are located. The stubhead describes the stub listing as a whole
in terms of the classification presented. The row caption is a
descriptive title of the data on the given line.
4. Field - main part of the table; contains the substance or the figures of
one’s data
5. Source note - an exact citation of the source of data presented in the table
(should always be placed when the figures are not original)
heading Table 4.4 – CRIME VOLUME AND RATE BY TYPE: 1991 – 1993
(Rate per 100,000 population)
Guidelines
• The title should be concise, written in telegraphic style, not in complete sentence.
• Column labels should be precise. Stress differences rather than similarities between
adjacent columns. As much as possible, two or more adjacent columns should not
begin nor end with the same phrase. This is frequently a signal that a spanner head is
needed.
• The arrangement of lines in the stub depends on the nature of classification, purpose
of presentation or limitations of space.
• Indicate if the data were taken from another publication by including a source note.
Graphical Presentation
Advantages
50 Coca-cola
Pepsi
40
% Shares
30
20
10
0
1989 1990 1991 1992 1993 1994 1995
Year
2. Pie Chart - a circular graph that is useful in showing how a total quantity is
distributed among a group of categories. The “pieces of the pie” represent the
proportions of the total that fall into each category.
Sprite
Sarsi 5% Others
7-up 5% 12%
8%
Pepsi Coca-Cola
30% 40%
20 CHAPTER 2. COLLECTION & PRESENTATION OF DATA
3. Bar Chart - consists of a series of rectangular bars where the length of the bar
represents the quantity or frequency for each category if the bars are arranged
horizontally. If the bars are arranged vertically, the height of the bar
represents the quantity.
Others
Sprite
Sarsi
7-up
Pepsi
Coca-Cola
0 10 20 30 40
Market Shares (in % )
Definition. The raw data is the set of data in its original form.
82 82 83 79 72 71 84 59 77 50 87
83 82 63 75 50 85 76 79 68 69 62
79 69 74 53 73 71 50 76 57 81 62
72 88 84 80 68 50 74 84 71 73 68
71 80 72 60 81 89 94 80 84 81 50
84 76 75 82 76 53 91 69 60 89 79
59 62 79 82 72 81 60 84 68 66 94
77 78 87 75 86 82 74 73 72 84 51
50 69 75 70 77 87 86 77 75 96 66
87 73 84 68 85 62 87 92 69 52 65
50 57 63 69 72 74 77 80 82 84 87
50 59 65 69 72 75 77 80 82 84 87
50 59 66 69 72 75 77 80 82 85 88
50 60 66 69 72 75 77 81 83 85 89
50 60 68 70 73 75 78 81 83 86 89
50 60 68 71 73 75 79 81 84 86 91
51 62 68 71 73 76 79 81 84 87 92
52 62 68 71 73 76 79 82 84 87 94
53 62 68 71 74 76 79 82 84 87 94
53 62 69 72 74 76 79 82 84 87 96
Advantages:
In the construction of a frequency distribution, the various items of a series are classified
into groups. The frequency distribution table shows the number of items falling into
each group.
Definition of terms
Examples:
2. Determine the approximate class size. Whenever possible, all classes should be
of the same size. The following steps can be used to determine the class size.
3. Determine the lowest class limit. The first class must include the smallest value
in the data set.
4. Determine all class limits by adding the class size, C, to the limit of the previous
class.
5. Tally the frequencies for each class. Sum the frequencies and check against the
total number of observations.
Greater than CFD – shows the no. of observations greater than the LCB
Less than CFD – shows the no. of observations less than the UCB
24 CHAPTER 2. COLLECTION & PRESENTATION OF DATA
Example:
1.Frequency Histogram - a bar graph that displays the classes on the horizontal axis and
the frequencies of the classes on the vertical axis; the vertical lines of the bars are
erected at the class boundaries and the height of the bars correspond to the class
frequency
25
20
No. of 15
Students
10
0
49.5 54.5 59.5 64.5 69.5 74.5 79.5 84.5 89.5 94.5 99.5
Grades
ELEMENTARY STATISTICS 25
2. Relative Frequency Histogram - a graph that displays the classes on the horizontal
axis and the relative frequencies on the vertical axis
Note: The relative frequency histogram has the same shape as the frequency
histogram but has a different vertical axis.
0 .2 5
0 .2
Relative 0 .1 5
Freq.
0 .1
0 .0 5
0
49.5 54.5 59.5 64.5 69.5 74.5 79.5 84.5 89.5 94.5 99.5
Grades
25
20
15
No. of
students 10
0
47 52 57 62 67 72 77 82 87 92 97 102
Grades
26 CHAPTER 2. COLLECTION & PRESENTATION OF DATA
.5
.5
.5
.5
.5
44
54
64
74
84
94
G ra d e s
In creating a stem-and-leaf display, we divide each observation into two parts, the
stem and the leaf. For example, we could divide the observation 244 as follows:
Stem Leaf
2 | 44 Grades
Alternatively, we could choose the point of division between the units and tens,
whereby
Stem Leaf
24 | 4
The choice of the stem and leaf coding depends on the nature of the data set.
ELEMENTARY STATISTICS 27
3. For each observation, record the leaf portion of that observation in the row
corresponding to the appropriate stem
4. Reorder the leaves from lowest to highest within each stem row. Maintain
uniform spacing for the leaves so that the stem with the most number of
observations has the longest line.
5. If the number of leaves appearing in each row is too large, divide the stem into
two groups, the first corresponding to leaves beginning with digits 0 through 4
and the second corresponding to leaves beginning with digits 5 through 9. this
subdivision can be increased to five groups if necessary.
6. Provide a key to your stem-and-leaf coding so that the reader can recreate the
actual measurements from your display.
Example: Typing speeds (net words per minute) for 20 secretarial applicants
68 72 91 47
52 75 63 55
65 35 84 45
58 61 69 22
46 55 66 71
Note: The stem-and-leaf display should include a reminder indicating the units
of the data value.
Example:
Unit = 0.1 1 | 2 represents 1.2
Unit = 1 1 | 2 represents 12
Unit = 10 1 | 2 represents 120
CHAPTER 3
Definition. A measure of central tendency is any single value that is used to identify
the “center” or the typical value of a data set. It is often referred to as the
average.
1. easily understood
- not a distant mathematical abstraction
3. stable
- not affected materially by minor variations in the groups of items
Suppose that a variable X is the variable of interest, and that n measurements are
taken. The notation X1, X2, . . . ,Xn will be used to represent the n observations.
Let the Greek letter Σ indicate the “summation of,” thus, we can write the sum of
n
n observations as ∑X
i =1
i = X 1 + X 2 + ... + X n .
The numbers 1 and n are called the lower and the upper limits of summation,
respectively.
ELEMENTARY STATISTICS 29
n n n
∑(X
i =1
i + Yi ) = ∑ X i + ∑ Yi
i =1 i =1
n n n n
∑ (a
i =1
i + bi + ... + z i ) = ∑ ai + ∑ bi + ... + ∑ z i
i =1 i =1 i =1
2. If c is a constant, then
n n
∑ cX i = c∑ X i
i =1 i =1
3. If c is a constant then
∑ c = nc
i =1
Examples
Given:
i 1 2 3 4
Xi 2 4 6 8
Yi 1 2 1 2
Show:
3 4 4
1. ∑ X i = 10
i=2
4. ∑ X i ∑ Yi = 120
i =1 i =1
3 4
Xi
2. ∑ (X
i=2
i + Yi ) = 13 5. ∑ Y
i =1 i
= 14
4 ∑X i
20
3. ∑ X iYi = 32 6. i =1
n
=
6
=31
3
i =1
∑Y
i =1
i
30 CHAPTER 3. MEASURES OF CENTRAL TENDENCY
AND MEASURES OF LOCATION
The population mean for a finite population with N elements, denoted by the Greek
N
∑X i
letter µ (mu), is computed as µ = i =1
.
N
n
∑X i
The sample mean X (read as “X bar”) of n observations is computed as X = i =1
.
n
Examples:
1. The number of employees at 5 different drug stores are 10, 12, 6, 8, and 4.
Treating the data as a population, find the mean number of employees for the 5
stores.
2. Scores in the Statistics 101 first exam for a sample of 10 students are as follows:
60, 55, 30, 90, 88, 79, 45, 66, 93, and 80. Find the mean.
3. Refer to the example on the final grades of 110 Statistics 101 students. The
110
∑X i
sample mean is given by X = i =1
= 74.1 .
110
ELEMENTARY STATISTICS 31
Definition. The weighted mean is a modification of the usual mean that assigns
weights (or measures of relative importance) to the observations to be
averaged. If each observation Xi is assigned a weight Wi, i = 1, 2,…, n,
n
∑W X i i
the weighted mean is given by X = i =1
n
.
∑W i =1
i
Examples:
Assignment 15%
Project 25%
Midterm Exam 20%
Final Exam 40%
The maximum score a student may obtain for each component is 100. Jeffry
obtains marks of 83 for assignments, 72 for the project, 41 for the midterm exam,
and 47 for the final exam. Find his mean mark for the course.
History 1.0
Humanities 1.0
Math 19 3.0
Math 53 3.0
Philosophy 1.0
Math 53 is a 5-unit course and all others are 3-unit courses. Find Alex’s GWA for
the semester.
32 CHAPTER 3. MEASURES OF CENTRAL TENDENCY
AND MEASURES OF LOCATION
1. It is the most familiar measure used, and it employs all available information.
3. Since the mean is a calculated number, it may not be an actual number in the data
set.
Example:
Given 5 temperature readings measured in Fahrenheit: 98, 100, 107, 90, 92. The
mean temperature is X F = 97.4 .
5
The mean temperature in centigrade is X c = (97.4 − 32) = 36.3 .
9
- possible only when the class mark can be assumed to be representative of all
the values in that class. If the assumption holds, the following equation may
be used to approximate the mean from a frequency distribution.
k
∑fX i i
X = i =1
10
∑fX i i
8145
X = i =1
10
= = 74.0
110
∑f
i =1
i
Remarks:
The first step in calculating the median, denoted as Md, is to arrange the data in an
array.
If n is odd, the median position equals (n+1)/2, and the value of the (n+1)/2 th
observation in the array is taken as the median, i.e.,
Md = X ([n +1] / 2 )
If n is even, the mean of the two middle values in the array is the median, i.e.,
X ( n / 2) + X (( n / 2 )+1)
Md =
2
Examples :
1. Given the following heights ( in inches ): 71, 72, 75, 75, and 67 . Find the median
height.
3. Refer to the example on the grades of 110 Statistics 101 students. The median is
X (55) + X (56 ) 75 + 75
given by Md = = = 75 .
2 2
2. The median is affected by the position of each item in the series but not by the
value of each item. This means that extreme values affect the median less than
the arithmetic mean.
ELEMENTARY STATISTICS 35
- possible only when the values of the observations falling in the median class can
be assumed to be evenly spaced throughout the class. (The median class is the
class containing the median.)
n / 2− < CFmd −1
Md = LCBmd + c
f md
Example:
Refer to the example on the final grades of 110 Statistics 101 students.
50 – 54 10 10
55 – 59 3 13
60 – 64 8 21
65 – 69 13 34
70 – 74 17 51 < cum. freq.
Median class 75 – 79 19 70 greater than n/2=55
80 – 84 22 92 for the first time
85 – 89 13 105
90 – 94 4 109
95 – 99 1 110
(110 / 2) − 51
Md = 74.5 + 5 = 75.6
19
36 CHAPTER 3. MEASURES OF CENTRAL TENDENCY
AND MEASURES OF LOCATION
The mode is determined by counting the frequency of each value and finding the
value with the highest frequency of occurrence.
Examples:
1. 2, 5, 2, 3, 5, 2, 1, 4, 2, 2, 2, 1, 2, 2, 2, 3, 2, 2, 2, 2
2. 2, 5, 5, 2, 2, 5, 1, 3, 5, 4, 2, 5, 5, 2, 2, 5, 5, 2, 2, 1
3. 1, 2, 3, 3, 2, 1, 2, 3, 1, 4, 4, 5, 5, 1, 2, 3, 4, 5, 4, 5
4. Refer to the example on the final grades of 110 Statistics 101 students. The mode
is Mo=84.
1. It does not always exist; and if it does, it may not be unique. A data set is said to
be unimodal if there is only one mode, bimodal if there are two modes, trimodal if
there are three modes, and so on.
Step 1: Locate the modal class. The modal class is the class with the highest
frequency.
Step 2: Approximate the mode using the following formula:
f mo − f1
Mo = LCBmo + c
2 f mo − f1 − f 2
Example :
Refer to the example on the final grades of 110 Statistics 101 students.
Class Freq.
50 – 54 10
55 – 59 3
60 – 64 8
65 – 69 13
70 – 74 17
75 – 79 19
Modal class 80 – 84 22
85 – 89 13
90 – 94 4
95 – 99 1
22 − 19
Mo = 79.5 + 5 = 80.8
2(22) − 19 − 13
38 CHAPTER 3. MEASURES OF CENTRAL TENDENCY
AND MEASURES OF LOCATION
Definition. Percentiles are values that divide a set of observations in an array into 100
equal parts. Thus,
P1, read as first percentile, is the value below which 1% of the values fall.
P2, read as second percentile, is the value below which 2% of the values fall.
•
•
•
P99, read as ninety-ninth percentile, is the value below which 99% of the
values fall.
i (n + 1)
Pi = the value of the th observation in the array
100
The Pi th class is the class where the less than cumulative frequency is equal
to, or exceeds for the first time, in/100.
ELEMENTARY STATISTICS 39
1. Deciles
D1, read as first decile, is the value below which 10% of the values fall.
D2, read as second decile, is the value below which 20% of the values fall.
•
•
•
D9, read as ninth decile, is the value below which 90% of the values fall.
2. Quartiles
Q1, read as first quartile, is the value below which 25% of the values fall.
Q2, read as second quartile, is the value below which 50% of the values fall.
Q3, read as third quartile, is the value below which 75% of the values fall.
2. D3 = 69 2. D3 = 69.1
3. Q2 = 75 3. Q2 = 75
CHAPTER 4
Measures of Dispersion
and
Measures of Skewness
• to determine the extent of the scatter so that steps may be taken to control the
existing variation
• The Range
Definition. The range of a set of measurements is the difference between the largest
and the smallest values.
Examples:
1. The IQ’s of 5 members of a certain family are 108, 112, 127, 116, and 113. Find
the range.
2. Refer to the example on the final grade of 110 Statistics 101 students. The range
is Range = 96 – 50 = 46.
1. It uses only the extreme values. It fails to communicate any information about the
clustering or the lack of clustering of the values between the extremes.
2. A weakness of the range is that an outlier can greatly alter its value.
∑ (X − µ)
2
i
σ2 = i =1
∑ (X − µ)
2
i
σ= i =1
N
42 CHAPTER 4. MEASURES OF DISPERSION
AND MEASURES OF SKEWNESS
∑ (X )
n
2
i −X
s2 = i =1
n −1
∑ (X )
n
2
i −X
i =1
s=
n −1
Remarks:
Examples:
1. The following scores were given by 6 judges for a gymnast’s performance in the
vault: 7, 5, 9, 7, 8, and 6. Find the standard deviation.
µ= 7 σ = 10 = 1 .3
6
3. Refer to the example on the final grades of 110 Statistics 101 students. The
sample standard deviation is given by
110
∑ (X − 74.11)
2
i
i =1 13798.69
s= = = 11.25
109 109
ELEMENTARY STATISTICS 43
Computational formula:
2
n
n
n∑ X i2 − ∑ X i
s 2 = i =1 i =1
n(n − 1)
∑ f (X )
k
2
i i −X
s2 = i =1
n −1
2
k
k
n∑ fi X − ∑ fi X i
i
2
s =
2 i =1 i =1
n(n − 1)
Example:
50 – 54 10 52 520 27040
55 – 59 3 57 171 9747
60 – 64 8 62 496 30752
65 – 69 13 67 871 58357
70 – 74 17 72 1224 88128
75 – 79 19 77 1463 112651
80 – 84 22 82 1804 147928
85 – 89 13 87 1131 98397
90 – 94 4 92 368 33856
95 – 99 1 97 97 9409
Total 110 8145 616265
3. If each observation of a set of data is transformed to a new set by the addition (or
subtraction) of a constant c, the standard deviation of the new set of data is the
same as the standard deviation of the original data set.
Measures of relative dispersion are unitless and are used when one
wishes to compare the scatter of one distribution with another distribution.
Definition. The coefficient of variation, CV, is the ratio of the standard deviation to
the mean and is usually expressed in percentage. It is computed as
σ
CV = × 100%
µ
and its sample counterpart is
s
CV = × 100%
X
Examples.
1. The foreign exchange rate is an indicator of the stability of the peso and is also an
indicator of the economic performance. In 1992 Bangko Sentral ng Pilipinas
(BSP) put the peso on a floating rate basis. Market forces and not government
policy have determined the level of the peso since. Government intervenes
through the BSP, only when there are speculative elements in the market. Given
below are the means and standard deviations of the quarterly P-$ exchange rate
for the periods 1989 to 1991 and 1992 to 1994. Which of the two periods is more
stable?
Mean s.d.
1.84
CV89 − 91 = × 100% = 8.21%
22.4
1.15
CV92 − 94 = × 100% = 4.36%
26.4
46 CHAPTER 4. MEASURES OF DISPERSION
AND MEASURES OF SKEWNESS
2. Two of the quality criteria in processing butter cookies are the weight and color
development in the final stages of oven browning. Individual pieces of cookies
are scanned by a spectrophotometer calibrated to reflect yellow-brown light. The
readout is expressed in per cent of a standard yellow-brown reference plate and a
value of 41 is considered optimal (golden-yellow). The cookies were also
weighed in grams at this stage. The means and standard deviations of 30 sample
cookies are presented below.
Mean s.d.
Color 41.1 10
Weight 17.7 3.2
10
CVcolor = × 100% = 24.33%
41.1
3.2
CVweight = × 100% = 18.08%
17.7
X −µ
Z=
σ
X−X
Z=
s
Remarks:
1. The standard score is not a measure of relative dispersion per se but is somewhat
related.
2. It is useful for comparing two values from different series specially when these
two series differ with respect to the mean or standard deviation or both are
expressed in different units.
ELEMENTARY STATISTICS 47
Examples:
1. Robert got a grade of 75% in Stat 101 and a grade of 90% in Econ 11. The mean
grade in Stat 101 is 70% and the standard deviation is 10%, whereas in Econ 11,
the mean grade is 80% and the standard deviation is 20%. Relative to the other
students, where did he perform better?
75 − 70
Z Stat101 = = 0 .5
10
90 − 80
Z Econ11 = = 0 .5
20
2. In problem (1), if the mean grade in Stat 101 is 65%, in which subject did Robert
perform better?
75 − 65
Z Stat101 = = 1 .0
10
3. Different typing skills are required for secretaries depending on whether one is
working in a law office, an accounting firm, or for a mathematical research group
at a major university. In order to evaluate candidates for these positions, an
agency administers 3 distinct standardized typing samples. A time penalty has
been incorporated into the scoring of each sample based on the number of typing
errors. The mean and standard deviation for each test, together with the scores
achieved by Nancy, an applicant, are given in the following table.
141 − 180 7 − 10 33 − 26
ZL = = −1.3 ZA = = − 1 .5 ZS = = 1 .4
30 2 5
48 CHAPTER 4. MEASURES OF DISPERSION
AND MEASURES OF SKEWNESS
X − Mo
1. Sk =
s
where X = mean
Mo = mode
s = standard deviation
2. Sk =
(
3 X − Md )
s
where X = mean
Md = median
s = standard deviation
Remarks:
X = 74.1 Md = 75 Mo = 84 s = 11.25
74.1 − 84
Sk = = −0.88
11.25
3(74.1 − 75)
Sk = = −0.24
11.25
50 CHAPTER 4. MEASURES OF DISPERSION
AND MEASURES OF SKEWNESS
Definition. The boxplot is a graph that is very useful for displaying the following
features of the data:
• location
• spread
• symmetry
• extremes
• outliers
1. Construct a rectangle with one end at the first quartile and the other end at the third
quartile.
2. Put a vertical line across the interior of the rectangle at the median.
3. Compute for the interquartile range (IQR), lower fence (FL) and upper fence (FU)
given by:
IQR = Q3 - Q1
FL = Q1 - 1.5 IQR
FU = Q3 + 1.5 IQR
4. Locate the smallest value contained in the interval [FL , Q1]. Draw a line from this
value to Q1.
5. Locate the largest value contained in the interval [Q3,FU]. Draw a line from this value
to Q3.
6. Values falling outside the fences are considered outliers and are usually denoted by
“x”.
Remarks:
1. The height of the rectangle is usually arbitrary and has no specific meaning. If several
boxplots appear together, however, the height is sometimes made proportional to the
different sample sizes.
Examples:
1. Set A: 1 15 21 22 24
10 18 22 23 25
14 20 22 24 28
Q1 = 15 IQR = 9
Q3 = 24 FL = 1.5
Md = 22 FU = 37.5
Set B: 3 10 11 12 19
8 10 12 16 19
9 10 12 16 30
Q1 = 10 IQR = 6
Q3 = 16 FL = 1
Md = 12 FU = 25
Set A x
Set B x
0 5 10 15 20 25 30 35
p
50 55 60 65 70 75 80 85 90 95 100
CHAPTER 5
Probability
Definition of Terms
5. Null space/Empty space a subset of the sample space that contains no elements
and denoted by the symbol φ.
6. Simple event an event which contains only one element of the sample
space
8. Mutually exclusive events Two events A and B are mutually exclusive if A∩B=φ;
that is, A and B have no elements in common
Remarks:
• An event is said to have occurred if the outcome of the experiment is one of the
sample points in the event.
• The empty space can be viewed as an event that will never happen. It is called the
impossible event.
• The sample space S, as an event, always occurs, and is referred to as the certain or
sure event.
ELEMENTARY STATISTICS 53
Examples:
a. The probability that it will rain tomorrow is 0.40 and the probability that it will
not rain tomorrow is 0.52.
d. On a single draw from a deck of playing cards the probability of selecting a heart
is 1/4, the probability of selecting a black card is 1/2, and the probability of
selecting both a heart and a black card is 1/8.
2. a. In tossing a fair coin, what is the probability of getting a head? Of either a head or
tail? Of neither a head nor tail?
b. In tossing a fair die, what is the probability of getting a 3? Of getting an even
number? Of getting a number greater than 6?
3. A coin is biased so that a head is twice as likely to occur as a tail. If the coin is tossed
once, what is the probability of getting a head?
Theorem. If an operation can be performed in n1 ways, and for each of these a second
operation can be performed in n2 ways, then the two operations can be
performed in n1n2 ways.
Example: How many sample points are there in the sample space when a pair of
balanced dice is thrown once?
Examples:
1. How many even three-digit numbers can be formed from the digits 1, 2, 5, 6, and
9 if each digit can be used only once?
n (n-1)(n-2) . . . (2)(1) = n!
Note. 0! = 1.
Example: How many different orders or sequences can we arrange the letters A, B,
C, and D?
Examples:
1. Two lottery tickets are drawn from 20 for the first and second prize. Find the
number of sample points in the space S.
2. In how many ways can the 5 starting positions on a basketball team be filled with
8 men who can play any position?
Theorem. The number of distinct permutations of n things of which n1 are of one kind,
n2 are of a second kind, . . . , nk of a kth kind is
k
n!
n1!n 2 !...n k !
where ∑n
i =1
i =n
Examples:
2. In how many different ways can 3 red, 4 yellow, and 2 blue bulbs be arranged in a
string of Christmas tree lights with 9 sockets?
56 CHAPTER 5. PROBABILITY
Examples:
1. In a Stat 101 exam, a student has a choice of 8 questions out of 10. In how many
ways can he choose a set of 8 questions if he chooses arbitrarily?
2. Find the number of ways of selecting the 6 winning numbers in the original
version of the game of lotto.
P(A) + P(Ac) = 1.
Examples:
1. The probability that a student passes Stat 101 is 0.60, and the probability that he
passes Comm II is 0.85. If the probability that he passes at least one of the two
courses is 0.95, what is the probability that he will pass both courses? fail both Stat
101 and Comm II?
2. An oil-prospecting firm plans to drill two exploratory wells. Past evidence shows that
the probability that neither well produces oil is 0.8; the probability that exactly one
well produces oil is 0.18; and, the probability that both wells produce oil is 0.02.
What is the probability that at most one well produces oil? At least one?
3. In the toss of a fair coin 4 times, what is the probability of no head in the toss? At
least one head?
ELEMENTARY STATISTICS 57
Definition. The probability of an event B occurring when it is known that some event A
has occurred is called a conditional probability. It is defined by the
equation
P( A ∩ B )
P ( B | A) = if P(A)>0
P ( A)
Examples:
1. A random sample of 100 insurance claims are classified below according to the type
of policy and whether the claim is fraudulent or not.
a. Find the probability of a fraudulent claim given that such a claim is for a fire
policy.
b. Find the probability that a claim for a fire policy is selected given that such a
claim is fraudulent.
Type of Policy
Category Fire Auto Others Total
Fraudulent 6 1 3 10
Nonfraudulent 14 29 47 90
Total 20 30 50 100
2. The probability that a student passes Stat 101 is 0.60, the probability that he passes
Comm II is 0.85, the probability that he passes both subjects is 0.5. If the student
passes Stat 101, what is the probability that the student will pass Comm II?
58 CHAPTER 5. PROBABILITY
Definition. Two events A and B are said to be independent if any one of the following
conditions is satisfied:
Examples:
2. The probability that Robert will correctly answer the toughest question in an exam is
1/4. The probability that Ana will correctly answer the same question is 4/5. Find the
probability that both will answer the question correctly, assuming that they do not
copy from each other.
CHAPTER 6
Probability Distributions
Definition. A function whose value is a real number determined by each element in the
sample space is called a random variable.
Remark. We shall use an uppercase letter, say X, to denote a random variable and its
corresponding lowercase letter, x in this case, for one of its values.
Examples:
Sample Points x y
HHH 3 3
HHT 2 1
HTH 2 1
HTT 1 -1
THH 2 1
THT 1 -1
TTH 1 -1
TTT 0 -3
2. (Experiment No. 2) A hatcheck girl returns 3 hats at random to 3 customers who had
previously checked them. If Jason, Charlie, and Ohmar, in that order, receives one of
the hats, list the sample points for the possible orders of returning the hats and find
the values m of the random variable M, that represents the number of correct matches.
60 CHAPTER 6. PROBABILITY DISTRIBUTIONS
Definition. A random variable defined over a discrete sample space is called a discrete
random variable.
Definition. A table or formula listing all possible values that a discrete random variable
can take on, along with the associated probabilities, is called a discrete
probability distribution.
Remark. The probabilities associated with all possible values of a discrete random
variable must sum to 1.
Examples:
1. For Experiment No. 1, the discrete probability distributions of the random variables X
and Y are
x 0 1 2 3
P(X=x) 1/8 3/8 3/8 1/8
y -3 -1 1 3
P(Y=y) 1/8 3/8 3/8 1/8
2. Construct the discrete probability distribution for the random variable M defined in
Experiment No. 2.
ELEMENTARY STATISTICS 61
Definition. The function with values f(x) is called a probability density function for
the continuous random variable X, if
• the total area under its curve and above the horizontal axis is equal to 1; and
• the area under the curve between any two ordinates x = a and x = b gives the
probability that X lies between a and b.
Remarks:
1. A continuous random variable has a probability of zero of assuming exactly any of its
values, that is, if X is a continuous random variable, then P(X=x) = 0 for all real
numbers x.
Example:
A continuous random variable X that can assume values between 0 and 2 has a
density function given by
f(x)
1/2
0 1 2
x x1 x2 ... xn
P(X=x) f(x1) f(x2) ... f(xn)
Examples:
X 0 1 2 3
P(X=x) 1/8 3/8 3/8 1/8
Y -3 -1 1 3
P(Y=y) 1/8 3/8 3/8 1/8
3. In a gambling game a man is paid P50 if he gets all heads or all tails when 3 coins are
tossed, and he pays out P30 if either 1 or 2 heads show. What is his expected gain?
x x1 x2 ... xn
P(X=x) f(x1) f(x2) ... f(xn)
Example: A used car dealer finds that in any day, the probability of selling no car is
0.4, one car is 0.2, two cars is 0.15, 3 cars is 0.10, 4 cars is 0.08, five cars is
0.06 and six cars is 0.01. Let g(X) = 500 + 1500X represent the salesman’s
daily earnings, where X is the number of cars sold. Find the salesman’s
expected daily earnings.
ELEMENTARY STATISTICS 63
σ 2 = Var ( X ) = E ( X − µ ) 2
X x1 x2 ... xn
P(X=x) f(x1) f(x2) ... f(xn)
The variance of X is
n
σ 2 = Var ( X ) = E ( X − µ ) 2 = ∑ ( xi − µ ) 2 f ( xi )
i =1
Example :
E(X) = 1.5
4
Var(X) = ∑ (x
i =1
i − 1.5) 2 f ( xi )
= 3 – (1.5)2 = 0.75
64 CHAPTER 6. PROBABILITY DISTRIBUTIONS
Let X and Y be random variables (discrete or continuous) and let a and b be constants.
1. E(aX + b) = a E(X) + b
Special Cases:
a. if b = 0, then E(aX) = a E(X).
b. if a = 0, then E(b) = b.
4. E[ X - E(X) ] = 0.
5. Var(aX + b) = a2Var(X).
Special Cases:
a. if b = 0, then Var(aX) = a2Var(X).
b. if a = 0, then Var(b) = 0.
Example :
a. E(3X + 5)
b. Var(3X +5)
c. E(XY)
d. Var(3X - 2Y)
ELEMENTARY STATISTICS 65
for -∞ < x < ∞ and for constants µ and σ, where -∞ < µ < ∞ , σ>0 and
e≈2.71828 and π ≈ 3.14159.
Properties:
1. The curve is bell-shaped and symmetric about a vertical axis through the mean µ.
3. The total area under the curve and above the horizontal axis is equal to 1.
66 CHAPTER 6. PROBABILITY DISTRIBUTIONS
Definition. The distribution of a normal random variable with mean zero and standard
deviation equal to 1 is called a standard normal distribution.
X −µ
Z=
σ
Hence, whenever X is between the values x1 and x2, the random variable Z will
fall between the corresponding values
x1 − µ x2 − µ
z1 = and z2 =
σ σ
Examples :
2. Given the normally distributed random variable X with mean 18 and standard
deviation 2.5, find
3. The achievement scores for a college entrance examination are normally distributed
with mean 75 and standard deviation equal to 10. What fraction of the scores would
one expect to lie between 70 and 90?
4. A softdrink machine is regulated so that it dispenses an average of 200 ml. per cup. If
the amount of drink dispensed is normally distributed with a standard deviation equal
to 15 ml.,
• Binomial Distribution
Examples:
1. A multiple-choice quiz has 15 questions, each with 4 possible answers of which only
1 is the correct answer. What is the probability that sheer guesswork yields
2. Suppose that airplane engines operate independently in flight and fail with probability
1/5. Assuming that a plane makes a safe flight if at least one-half of its engines run,
which between a 4-engine plane and a 2-engine plane has the higher probability for a
successful flight?
68 CHAPTER 6. PROBABILITY DISTRIBUTIONS
Remark: If n is small relative to N the probability of “success” for each draw will
change only slightly. Hence, the hypergeometric distribution can be
approximated by the binomial distribution with p = k/N.
Examples:
1. What is the probability that a person’s 6 number bet wins the second prize in a game
of lotto?
2. A lot of 20 personal computers was delivered to the Statistical Center. Ten computers
were selected at random without replacement and tested for defects. If at least 2 of
these 10 are defective, the entire lot of 20 computers will be returned. What is the
probability that the lot will be returned if 5 of the 20 computers are indeed defective?
3. A production lot of 2000 units contains 50 units that do not meet the specifications.
What is the probability that a random sample of 20 units without replacement will
contain no nonconforming item?
ELEMENTARY STATISTICS 69
e−µ µ x
P( X = x) = f ( x) = , x = 0,1,2,….
x!
Examples:
1. On the average a certain intersection results in 3 traffic accidents per month. Suppose
that the number of accidents per month follows a Poisson distribution, what is the
probability that in any given month at this intersection,
2. The probability that a person dies from a certain respiratory infection is 0.002. Find
the probability that fewer than 5 in a random sample of 2000 so infected will die.
70 CHAPTER 6. PROBABILITY DISTRIBUTIONS
Theorem. If X~Bi(n, p) with mean np and variance npq, then the distribution of
X − np
Z=
npq
Remarks:
1. The normal distribution gives a very good approximation of the Binomial distribution
when n is large and p is close to 1/2.
2. Since a continuous distribution (in this case, the Normal) is used to approximate a
discrete distribution, then we must adjust for continuity. For example:
(a − 0.5) − np (a + 0.5) − np
P ( X = a ) ≈ P <Z<
npq npq
Example:
A certain pharmaceutical company knows that, on the average, 45% of a certain type of
pill has an ingredient that is below the minimum strength and thus unacceptable. What is
the probability that fewer than 10 in a sample of 200 pills will be unacceptable?
CHAPTER 7
Sampling Distributions
• The sampling distribution of a statistic will depend on the size of the population,
the size of the sample, and the method of choosing the sample.
• The standard deviation of the sampling distribution is called the standard error
of the statistic. It tells us the extent to which we expect the values of the statistic
to vary from different possible samples.
f(x) = 1/ 4 , x = 0, 1, 2, 3
Suppose we list all possible samples of size 2, with replacement, and for each sample
compute for the value of the sample mean, X :
X f( X )
0 1/16
0.5 2/16
1.0 3/16
1.5 4/16
2.0 3/16
2.5 2/16
3.0 1/16
Theorems:
1. If all possible random samples of size n are drawn with replacement from a finite
population of size N with mean µ and standard deviation σ, then the sample mean
will have mean and variance given by:
E( X ) = µ and Var( X ) = σ2 /n .
2. If all possible random samples of size n are drawn without replacement from a finite
population of size N with mean µ and standard deviation σ, then the sample mean
will have mean and variance given by:
σ 2 N −n
E( X ) = µ and Var( X ) = .
n N −1
N −n
• The factor in the formula of the variance of X is called the finite
N −1
population correction factor. For large N relative to the sample size n, this
factor will be close to 1 and the variance of X is approximately equal to σ2 /n.
ELEMENTARY STATISTICS 73
X −µ
Z=
σ n
• If n < 30, the approximation is good only if the population is not too different
from the normal.
• If the distribution of the population is normal then the sampling distribution will
also be exactly normal, no matter how small the size of the sample.
Example:
An electrical firm manufactures electric light bulbs that have a length of life
which is normally distributed with mean and standard deviation equal to 500 and
50 hours, respectively. Find the probability that a random sample of 15 bulbs
will have an average life of less than 475 hours.
4. The t-distribution.
If X and S2 are the mean and variance, respectively, of a random sample of size n
taken from a population which is normally distributed with mean µ and variance σ2 ,
then
X −µ
T=
S n
• Notation: T~ tv=n-1
74 CHAPTER 7. SAMPLING DISTRIBUTIONS
3. When the sample size is large, i.e. n ≥ 30, the t-distribution can be well
approximated by the standard normal distribution.
Just like any continuous probability distribution, the probability that a random
sample produces a t-value falling between any two specified values is equal to the
area under the curve of the t-distribution between any two ordinates corresponding
to the specified values
Examples:
2. Find k such that P(k < T < 2.807) = 0.945 when T ~ t(23)
3. A manufacturing firm claims that the batteries used in their electronic games will last
an average of 30 hours. To maintain this average, 16 batteries are tested each month.
If the computed t-value falls between -t0.025 and t0.025, the firm is satisfied with its
claim. What conclusion should the firm draw from a sample that has mean X = 27.5
hours and standard deviation S = 5 hours? Assume the distribution of battery lives to
be approximately normal.
CHAPTER 8
Estimation
1. Estimation
- point estimation
- interval estimation
2. Hypothesis Testing
Point Estimation
Remarks:
Examples: Under random sampling, the sample mean is an unbiased estimator of the
population mean, that is, E( X ) = µ.
2. A parameter can have more than one unbiased estimator. We would naturally choose
the unbiased estimator with the smallest variance.
76 CHAPTER 8. ESTIMATION
Interval Estimation
Example. The running time (in minutes) of a sample of films produced by Star-Regal
Theater are as follows: 103 94 110 87 98.
A 95% confidence interval for the mean running time of films produced by
Star-Regal Theater is (87.6, 109.2).
• The number 0.95 in the example is called the confidence coefficient or the
degree of confidence.
• The endpoints 87.6 and 109.2 are called the lower and upper confidence
limits.
Remarks:
If we take repeated samples of size n and if for each one of these samples we compute
the (1-α)100% confidence interval then (1-α)100% of the resulting confidence
intervals will contain the unknown value of the parameter.
3. The confidence coefficient is not “the probability that the true value of the parameter
falls in the interval estimate” since once a sample is drawn and a confidence interval
constructed, the resulting interval estimate either encloses the true value of the
parameter or it doesn’t. Rather, the confidence coefficient is “the probability that the
interval estimator encloses the true value of the parameter.”
4. A good confidence interval is one that is as narrow as possible and has a large
confidence coefficient, near 1. The narrower the interval, the more exactly we have
located the parameter; whereas, the larger the confidence coefficient, the more
confidence we have that a particular interval encloses the true value of the parameter.
However, for a fixed sample size, as the confidence coefficient increases, the length
of the interval also increases.
ELEMENTARY STATISTICS 77
a. when σ is known
σ σ
X − zα / 2 , X + zα / 2
n n
b. when σ is unknown
S S
X − tα / 2 , X + tα / 2
n n
Remarks:
1. The above formulas hold strictly for random samples from a normal distribution.
However, they provide good approximate (1-α)100% confidence intervals when the
distribution is not normal provided the sample size is large, i.e. n > 30.
S S
X − zα / 2 , X + zα / 2
n n
Examples:
1. An electrical firm manufactures light bulbs that have a length of life that is normally
distributed, with a standard deviation of 40 hours. If a random sample of 25 bulbs has
a mean life of 780 hours, find a 95% confidence interval for the population mean of
all bulbs produced by this firm.
If we have two populations with means µ1 and µ2 and standard deviations σ1 and
σ2, respectively, a point estimator of the difference between µ1 and µ2 is the
statistic X 1 − X 2 .
Types of Sampling:
a. σ 12 and σ 22 known
σ 12 σ 22 σ 12 σ 22
(X 1 − X 2 ) − z + , ( X − X ) + z +
n 2
α/2 1 2 α /2
n 1 n 2 n 1
b. σ 12 = σ 22 but unknown
( X 1 − X 2 ) − tα / 2 ( v ) S p 1 + 1 , ( X 1 − X 2 ) + tα / 2 ( v ) S p 1 + 1
n1 n 2 n1 n 2
(n1 − 1) S12 + (n 2 − 1) S 22
where S p = and v = n1 + n2 - 2
n1 + n2 − 2
c. σ 12 ≠ σ 22 but unknown
S12 S 22 S12 S 22
(X 1 − X 2 ) − t + , ( X 1 − X 2 ) + tα / 2 ( v ) +
α / 2(v )
n n n n 2
1 2 1
where v =
(S 1
2
n1 + S 22 n 2 )
2
(S n1
1
2
+
) (
2
S 22 n2 )
2
n1 − 1 n2 − 1
Remarks:
1. These formulas hold strictly for independent samples selected from Normal
populations. However, they provide good approximate (1-α)100% confidence
intervals when the distributions are not Normal provided both n1 and n2 are
greater than 30.
2. If σ 12 and σ 22 are unknown but n1 and n2 are greater than 30, use
S 12 S 22 S 12 S 22
(X 1 − X 2 ) − z + , ( X − X ) + z +
α /2 α /2
n1 n 2
1 2
n1 n 2
3. Even if the population variances are considerably different, formula (b) will
still provide a good estimate provided that n1=n2 and both populations are
normal. Therefore, in a planned experiment, one should make every effort to
equalize the size of the samples.
80 CHAPTER 8. ESTIMATION
Examples:
1. A statistics test was given to a random sample of 50 girls and another random
sample of 75 boys. The mean score of the girls is 80 with a standard deviation
of 4 and the mean score of the boys is 86 with a standard deviation of 6. Find
a 95% confidence interval for the difference µB - µG.
2. Students may choose between a 3-unit course in Physics without lab and a 4-
unit course with lab. The final written examination is the same for each
section. The mean score of a random sample of 12 students in the section
with lab is 84 with a standard deviation of 4, and the mean score of another
random sample of 18 students in the section without lab is 77 with a standard
deviation of 6. Find a 99% confidence interval for the difference between the
mean grades for the two courses. Assume the populations to be approximately
normally distributed with equal variances.
3. The following data represent the running time of a random sample of films
produced by two motion picture companies:
Time (minutes)
Compute a 90% confidence interval for the difference between the mean
running time of films produced by the two companies. Assume that the
running times for each of the companies are approximately normally
distributed with unequal variances.
S S
d − tα / 2( v ) d , d + tα / 2( v ) d
n n
where di = xi - yi
2
n n
n
∑d i
n∑ d − ∑ d i
i
2
d= i =1
Sd =
i =1 i =1
n n(n − 1)
Examples:
1. It is claimed that a new diet will reduce a person’s weight by 4.5 kilograms on
the average in a period of 2 weeks. The weights of a random sample of 7
women who followed this diet were recorded before and after a 2-week
period:
Woman
1 2 3 4 5 6 7
Compute a 95% confidence interval for the mean difference in the weight.
Assume the distribution of weights to be approximately normal.
2. Twenty college freshmen were divided into 10 pairs, each member of the pair
having approximately the same IQ. One of each pair was selected at random
and assigned to a mathematics section using programmed materials only. The
other member of each pair was assigned to a section in which the professor
lectured. At the end of the semester each group was given the same
examination and the following results were recorded.
Pair 1 2 3 4 5 6 7 8 9 10
Programmed 76 60 85 58 91 75 82 64 79 88
Materials
Lectures 81 52 87 70 86 77 90 63 85 83
Find a 98% confidence interval for the mean difference in scores of the two
learning procedures. Assume normality.
82 CHAPTER 8. ESTIMATION
X
In a binomial experiment a point estimator of the proportion p is pˆ = , where X
n
represents the number of successes in n trials.
pˆ qˆ pˆ qˆ
pˆ − zα / 2 , pˆ + zα / 2
n n
Example:
In a random sample of 200 students who enrolled in Math 17, 138 passed on their
first take. Construct a 95% confidence interval for the population proportion of
students who passed Math 17 on their first take.
where X is the number of successes in n1 trials (first sample) and Y is the number
of successes in n2 trials (second sample).
Example:
In a random sample of 200 students, 78 of the 120 females and 60 of the 80 males
passed Math 17 on their first take. Construct a 95% confidence interval for p1- p2,
where p1 and p2 are the true proportions of females and males, respectively, who
passed Math 17 on their first take.
ELEMENTARY STATISTICS 83
z σ
2
n = α /2
e
Example:
An electrical firm manufactures light bulbs that have a length of life that is
approximately normally distributed, with a standard deviation of 40 hours. How
large a sample is needed if we wish to be 95% confident that the sample mean will
be within 10 hours of the true mean?
If p̂ will be used to estimate p, then we can be (1-α)100% confident that the error
will not exceed a specified amount, e, when the sample size is
zα2 / 2 pq
n=
e2
Example:
Use the conservative formula to determine the sample size needed if we want to
be 95% confident that our estimate of p is within 0.05 of the true value.
CHAPTER 9
Tests of Hypothesis
Definition of Terms
2. The null hypothesis (Ho) is the hypothesis that is being tested; it represents what the
experimenter doubts to be true.
3. The alternative hypothesis (Ha) is the operational statement of the theory that the
experimenter believes to be true and wishes to prove. It is the contradiction of the
null hypothesis.
Examples:
A two-tailed test of hypothesis is a test where the alternative hypothesis does not
specify a directional difference for the parameter of interest.
Examples:
5. A test statistic is a statistic whose value is calculated from sample measurements and
on which the statistical decision will be based.
ELEMENTARY STATISTICS 85
6. The critical region or rejection region is the set of values of the test statistic for
which the null hypothesis will be rejected. The acceptance region is the set of values
of the test statistic for which the null hypothesis will not be rejected. The acceptance
and rejection regions are separated by a critical value of the test statistic.
7. The Type I error is the error made by rejecting the null hypothesis when it is true.
The probability of a Type I error is denoted by α.
The Type II error is the error made by accepting (not rejecting) the null hypothesis
when it is false. The probability of a Type II error is denoted by β.
Null Hypothesis
Decision True False
1. State the null hypothesis (Ho) and the alternative hypothesis (Ha).
2. Choose the level of significance α.
3. Select the appropriate test statistic and establish the critical region.
4. Collect the data and compute the value of the test statistic from the sample data.
5. Make the decision. Reject Ho if the value of the test statistic belongs in the critical
region. Otherwise, do not reject Ho.
86 CHAPTER 9. TESTS OF HYPOTHESIS
Remarks:
1. The above tests are exact α-level tests for samples from a normal distribution.
However, they provide good approximate α-level test when the distribution is not
normal provided that the sample size is large, i.e. n > 30.
2. If σ is unknown and n > 30, use the test in (a) replacing the test statistic by
X − µo
Z=
S n
Examples:
1. Test Ho: µ=50 vs. Ha: µ≠50 if a random sample of 16 subjects had mean 48 and
standard deviation of 5.8 at 0.05 level of significance. Assume that the sample was
taken from a Normal population with standard deviation of 6.
2. It is claimed that an automobile is driven on the average less than 25,000 kilometers
per year. To test this claim, a random sample of 100 automobile owners are asked to
keep a record of the kilometers they travel. Would you agree with this claim if the
random sample showed an average of 23,500 kilometers and a standard deviation of
3,900 kilometers? Use a 0.01 level of significance.
3. According to Dietary Goals for the United States (1977), high sodium intake may be
related to ulcers, stomach cancer, and migraine headaches. The human requirement
for salt is only 230 milligrams per day, which is surpassed in most single servings of
ready-to-eat cereals. A random sample of 20 similar servings of Special K had mean
sodium content of 244 milligrams of sodium and a standard deviation of 24.5
milligrams. Is there sufficient evidence to believe that the average sodium content for
single servings of Special K exceeds the human requirement for salt at α=0.025? at α
= 0.05? at α = 0.10? Assume normality.
ELEMENTARY STATISTICS 87
1. For the same data set, as α increases the size of the critical region also increases.
Consequently, if Ho is rejected at α-level of significance then Ho will also be rejected
at a higher level of significance using the same data. For example, if Ho is rejected at
α = 0.05 then testing at α = 0.1 will also lead to the rejection of Ho. However, Ho
will not necessarily be rejected at α = 0.01.
2. The Type I error and Type II error are related. For a fixed sample size n, a decrease in
the probability of one will result in an increase in the probability of the other.
However, increasing the sample size will result in the reduction of both probabilities.
3. An alternative way to report the results of the test is to report the p-value. The p-
value is the smallest value of α for which Ho will be rejected based on sample
information. Reporting the p-value will allow the reader of the published research to
evaluate the extent to which the data disagree with Ho. In particular, it enables each
reader to choose their personal value of α.
( X1 − X 2 ) − do µ1 - µ2 < d o z < - zα
µ1 - µ2 = do Z= µ1 - µ2 > do z > zα
(σ 12 n1 ) + (σ 22 n2 )
µ1 - µ2 ≠ do | z | > zα/2
b. σ 12 = σ 22 but unknown
( X1 − X 2 ) − do
µ1 - µ2 = do t= µ1 - µ2 < d o t < - tα
S p (1 n1 ) + (1 n2 )
µ1 - µ2 > do t > tα
υ = n1 + n2 − 2 µ1 - µ2 ≠ do | t | > tα/2
(n1 − 1) S12 + (n 2 − 1) S 22
S =
2
n1 + n 2 − 2
p
c. σ 12 ≠ σ 22 and unknown
( X 1 − X 2 ) − do
µ1 - µ2 = do t= µ1 - µ2 < d o t < - tα
( S12 n1 ) + ( S22 n2 )
µ1 - µ2 > do t > tα
( S12 n1 + S 22 n2 ) 2 µ1 - µ2 ≠ do | t | > tα/2
υ=
( S12 n1 ) 2 ( S 22 n2 ) 2
+
n1 − 1 n2 − 1
88 CHAPTER 9. TESTS OF HYPOTHESIS
Remark: The remarks made in Chapter 8.3 relative to the use of a given statistic apply
to the tests described here.
Examples:
1. A statistics test was given to 50 girls and 75 boys. The girls made an average of 80
with a standard deviation of 4 and the boys had an average of 86 with a standard
deviation of 6. Is there sufficient evidence at 0.05 level of significance that the
average grades of girls and boys differ?
2. A study was made to determine if the subject matter in a physics course is better
understood when a lab constitutes part of the course. Students were allowed to
choose between a 3-unit course without lab and a 4-unit course with lab. In the
section with lab, a sample of 11 students had an average grade of 85 with a standard
deviation of 4.7, and in the section without lab, a sample of 17 students had an
average grade of 79 with a standard deviation of 6.1. Would you say that the
laboratory course increases the average grade by more than 5 points? Use a 0.01 level
of significance and assume the populations to be approximately normally distributed
with equal variances.
3. The following data represent the running time of films produced by two motion
picture companies:
Time (minutes)
Company 1 103 94 110 87 98
Company 2 97 82 123 92 175 88 118
Test the hypothesis that the average running time of films produced by company 2
exceeds the average running time of films produced by company 1 by 10 minutes
against the one-sided alternative that the difference is more than 10 minutes. Use a
0.1 level of significance and assume the distributions of times to be approximately
normal with unequal variances.
4. A taxi company is trying to decide whether the use of radial tires instead of regular
belted tires improves fuel economy. Twelve cars were driven twice over a prescribed
test course, each time using a different type of tires (radial and belted) in random
order. The mileage, in kilometers per liter, were recorded as follows:
ELEMENTARY STATISTICS 89
1 4.2 4.1
2 4.7 4.9
3 6.6 6.2
4 7.0 6.9
5 6.7 6.8
6 4.5 4.4
7 5.7 5.7
8 6.0 5.8
9 7.4 6.9
10 4.9 4.7
11 6.1 6.0
12 5.2 4.9
At the 0.025 level of significance, can we conclude that cars equipped with radial tires
give better fuel economy than those equipped with belted tires? Assume the
populations to be normally distributed.
Consider the problem of testing the hypothesis that the proportion of successes in a
binomial experiment equals some specified value.
Example:
Example:
In a survey of 200 students, 78 of the 120 females in the sample passed Math 17
on their first take while this figure is 60 among the 80 males. Will you agree that the
proportion of males who passed Math 17 on their first take is higher than the
proportion of females who passed the same course on their first take? Test at α=0.05.
ELEMENTARY STATISTICS 91
The test for independence is used to determine whether two variables are related or
not. For example, we might test whether a person’s music preference is related to his
intelligence as measured by IQ. We then take a random sample and for each subject
determine his music preference and classify his IQ into different categories (high,
medium, low). The observed frequencies are presented in what is known as a
contingency table shown below:
Music IQ
Preference High Medium Low Total
Classical 40 26 17 83
Pop 47 59 25 131
Rock 83 104 79 266
Total 170 189 121 480
Procedure:
i =1 j =1 Eij
where Oij= observed number of cases in the ith row of the jth column
Eij = expected number of cases under Ho
=
( column total) x( row total)
grand total
Remarks:
1. The test is valid if at least 80% of the cells have expected frequencies of at least 5 and
no cell has an expected frequency ≤ 1.
3. For a 2x2 contingency table, a correction called Yates’ correction for continuity is
applied. The formula then becomes
c ( O −E
ij − 0.5)
r 2
χ = ∑∑
2 ij
i =1 j = 1 Eij
Example:
Music IQ
Preference High Medium Low Total
3 3 ( Oij − Eij ) 2
χ = ∑∑
2
i =1 j =1 Eij
= 12.38
at α = 0.05, χ 42 = 9.488
Decision: Since 12.38 > 9.488, reject Ho. There is sufficient evidence at 0.05
level of significance that music preference and intelligence are not
independent.
CHAPTER 10
Remarks:
• -1 < ρ < 1
• A positive ρ means that the line slopes upward to the right; a negative ρ means
that it slopes downward to the right.
• When ρ is 1 or –1, there is perfect linear relationship between X and Y and all
the points (x,y) fall on a straight line. A ρ close to 1 or –1 indicates a strong
linear relationship but it does not necessarily imply that X causes Y or Y
causes X. It is possible that a third variable may have caused the change in
both x and y, producing the observed relationship.
• If ρ = 0 then there is no linear correlation between X and Y. A value of ρ = 0,
however, does not mean a lack of association. Hence, if a strong quadratic
relationship exists between X and Y, we will still obtain a zero correlation to
indicate a nonlinear relationship.
n
n n
n ∑ X i Yi − ∑ X i ∑ Yi
r=
i =1 i =1 i =1
2
2
n X 2 − X n Y 2 − Y
n n
∑
i =1 i ∑ ∑
i =1 i =1
i i ∑
i =1
i
Remarks:
• r is used to estimate ρ based on a random sample of n pairs of measurements
(Xi, Yi), i=1,…,n.
• -1 < r < 1
• Just like ρ, when r = 1 or –1, all the points (xi,yi), i=1,…,n, fall on a straight
line; when r=0, they are scattered and give no evidence of a linear relationship.
Any other value of r suggests the degree to which the points tend to be linearly
related.
94 CHAPTER 10. REGRESSION AND CORRELATION
y *
**
* *
*
**
*
y *
**
* *
*
**
*
x
y * * * *
* * * * *
* * *
* *
*
y *
* *
* *
* *
* *
x
ELEMENTARY STATISTICS 95
Example: Consider the data given below. Let X represent the lot size and Y
represent the man hours required.
100
80
60
40
20
0
0 10 20 30 40 50 60 70 80 90
LOT SIZE
ΣX = 500
ΣY = 1100
ΣXY = 61800
Σ X2 = 28400
Σ Y2 = 134660
r = 0.99780
96 CHAPTER 10. REGRESSION AND CORRELATION
Y = β o + β 1X + ∈
Linear regression models that involve two or more explanatory variables are
called multiple regression models.
ELEMENTARY STATISTICS 97
For any given value x, the response variable Y possesses a normal distribution,
with a mean value given by the equation E(Y|X=x) = βo + β1x and with a variance
of σ2. Furthermore, any one value of Y is independent of every other value.
Estimating βo and β1
The formulas for bo (estimate of βo) and b1 (estimate of β1) are derived using the
method of least squares where the “best-fitting” line is selected as the one that
minimizes the sum of squares of the deviations of the observed value of Y from
those predicted by the model. The formulas are
n
n n
n∑ X i Yi − ∑ X i ∑ Yi
b1 =
i =1 i =1 i =1
2
n
n
n ∑ X i2 − ∑ X i
i =1 i =1
bo = y − b1 x
yˆ = bo + b1 x
Remarks:
• The calculated prediction equation is appropriate only for the relevant range of
X that includes all values of X used in developing the regression model.
Hence, when predicting Y for a given value of X, one may interpolate only
within this relevant range of the X values. Extrapolation in predicting Y for
values of X outside the relevant range would result in a serious prediction
error.
• If X = 0 is not included in the range of the sample data, then b0 will not have a
significant interpretation.
98 CHAPTER 10. REGRESSION AND CORRELATION
Coefficient of Determination
An estimator for σ2 is
n
SSE
∑(y i − yˆ i ) 2
S2 = = i =1
n−2 n−2
s2
where sb1 = 2
n
∑ X i
X i2 − i =1
n
∑
i =1 n
Example:
Starting
Individual GPI Salary
No. (X) (Y)
1 2.7 17.0
2 3.1 17.7
3 3.0 18.6
4 3.3 20.5
5 3.1 19.1
6 2.4 16.4
7 2.9 19.3
8 2.1 14.5
9 2.6 15.7
10 3.2 18.6
11 3.0 19.5
12 2.2 15.0
13 2.8 18.0
14 3.2 20.0
15 2.9 19.0
16 3.0 17.4
17 2.6 17.3
18 3.3 18.1
19 2.9 18.0
20 2.4 16.2
21 2.8 17.5
22 3.7 21.3
23 3.1 17.2
24 2.8 17.0
25 3.5 19.6
26 2.7 16.6
27 2.6 15.0
28 3.2 18.4
29 2.9 17.3
30 3.0 18.5
100 CHAPTER 10. REGRESSION AND CORRELATION
20.0
STARTING SALARY
15.0
10.0
5.0
0.0
0.0 1.0 2.0 3.0 4.0
GRADE-POINT INDEX
b0 = 6.418245
b1 = 3.928191
r = 0.865088
R2 = 0.748377
ΣX = 87.0
ΣY = 534.3
ΣXY = 1564.24
Σ X2 = 256.06
Σ Y2 = 9593.41