0% found this document useful (0 votes)
10 views251 pages

Probability and Statistics For STEM

The document is a book titled 'Probability and Statistics for STEM: A Course in One Semester' authored by E.N. Barron and J.G. Del Greco, aimed at engineers and scientists familiar with calculus. It covers foundational topics in probability and statistics, including confidence intervals, hypothesis testing, and linear regression, structured for a one-semester course. The book is part of the Synthesis Lectures on Mathematics and Statistics series and is available in multiple formats.

Uploaded by

boopsnoot911
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views251 pages

Probability and Statistics For STEM

The document is a book titled 'Probability and Statistics for STEM: A Course in One Semester' authored by E.N. Barron and J.G. Del Greco, aimed at engineers and scientists familiar with calculus. It covers foundational topics in probability and statistics, including confidence intervals, hypothesis testing, and linear regression, structured for a one-semester course. The book is part of the Synthesis Lectures on Mathematics and Statistics series and is available in multiple formats.

Uploaded by

boopsnoot911
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 251

Series ISSN: 1938-1743

BARRON • DEL GRECO


Probability and
Series Editor: Steven G. Krantz, Washington University in St. Louis

Probability and Statistics for STEM


A Course in One Semester
E.N. Barron, Loyola University, Chicago Statistics for STEM


J.G. Del Greco, Loyola University, Chicago

One of the most important subjects for all engineers and scientists is probability and statistics.
A Course in One Semester

PROBABILITY AND STATISTICS FOR STEM


This book presents the basics of the essential topics in probability and statistics from a rigorous
standpoint. The basics of probability underlying all statistics is presented first and then we cover
the essential topics in statistics, confidence intervals, hypothesis testing, and linear regression.
This book is suitable for any engineer or scientist who is comfortable with calculus and is meant
to be covered in a one-semester format.

E.N. Barron
J.G. Del Greco


About SYNTHESIS
This volume is a printed version of a work that appears in the Synthesis
Digital Library of Engineering and Computer Science. Synthesis
books provide concise, original presentations of important research and

MORGAN & CLAYPOOL


development topics, published quickly, in digital and print formats.

store.morganclaypool.com
Probability and Statistics
for STEM
A Course in One Semester
Synthesis Lectures on
Mathematics and Statistics
Editor
Steven G. Krantz, Washington University, St. Louis

Probability and Statistics for STEM: A Course in One Semester


E.N. Barron and J.G. Del Greco
2020

Affine Arithmetic Based Solution of Uncertain Static and Dynamic Problems


Snehashish Chakraverty and Saudamini Rout
2020

Time Fractional Order Biological Systems with Uncertain Parameters


Snehashish Chakraverty, Rajarama Mohan Jena, and Subrat Kumar Jena
2020

Fast Start Integral Calculus


Daniel Ashlock
2019

Fast Start Differential Calculus


Daniel Ashlock
2019

Introduction to Statistics Using R


Mustapha Akinkunmi
2019

Inverse Obstacle Scattering with Non-Over-Determined Scattering Data


Alexander G. Ramm
2019

Analytical Techniques for Solving Nonlinear Partial Differential Equations


Daniel J. Arrigo
2019
iii

Aspects of Differential Geometry IV


Esteban Calviño-Louzao, Eduardo García-Río, Peter Gilkey, JeongHyeong Park, and Ramón
Vázquez-Lorenzo
2019

Symmetry Problems. Thne Navier–Stokes Problem.


Alexander G. Ramm
2019
An Introduction to Partial Differential Equations
Daniel J. Arrigo
2017

Numerical Integration of Space Fractional Partial Differential Equations: Vol 2 –


Applicatons from Classical Integer PDEs
Younes Salehi and William E. Schiesser
2017

Numerical Integration of Space Fractional Partial Differential Equations: Vol 1 –


Introduction to Algorithms and Computer Coding in R
Younes Salehi and William E. Schiesser
2017

Aspects of Differential Geometry III


Esteban Calviño-Louzao, Eduardo García-Río, Peter Gilkey, JeongHyeong Park, and Ramón
Vázquez-Lorenzo
2017

The Fundamentals of Analysis for Talented Freshmen


Peter M. Luthy, Guido L. Weiss, and Steven S. Xiao
2016

Aspects of Differential Geometry II


Peter Gilkey, JeongHyeong Park, Ramón Vázquez-Lorenzo
2015

Aspects of Differential Geometry I


Peter Gilkey, JeongHyeong Park, Ramón Vázquez-Lorenzo
2015

An Easy Path to Convex Analysis and Applications


Boris S. Mordukhovich and Nguyen Mau Nam
2013
iv
Applications of Affine and Weyl Geometry
Eduardo García-Río, Peter Gilkey, Stana Nikčević, and Ramón Vázquez-Lorenzo
2013

Essentials of Applied Mathematics for Engineers and Scientists, Second Edition


Robert G. Watts
2012
Chaotic Maps: Dynamics, Fractals, and Rapid Fluctuations
Goong Chen and Yu Huang
2011

Matrices in Engineering Problems


Marvin J. Tobias
2011

The Integral: A Crux for Analysis


Steven G. Krantz
2011

Statistics is Easy! Second Edition


Dennis Shasha and Manda Wilson
2010

Lectures on Financial Mathematics: Discrete Asset Pricing


Greg Anderson and Alec N. Kercheval
2010

Jordan Canonical Form: Theory and Practice


Steven H. Weintraub
2009

The Geometry of Walker Manifolds


Miguel Brozos-Vázquez, Eduardo García-Río, Peter Gilkey, Stana Nikčević, and Ramón
Vázquez-Lorenzo
2009

An Introduction to Multivariable Mathematics


Leon Simon
2008

Jordan Canonical Form: Application to Differential Equations


Steven H. Weintraub
2008

Statistics is Easy!
Dennis Shasha and Manda Wilson
2008
v
A Gyrovector Space Approach to Hyperbolic Geometry
Abraham Albert Ungar
2008
Copyright © 2020 by Morgan & Claypool

All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted in
any form or by any means—electronic, mechanical, photocopy, recording, or any other except for brief quotations
in printed reviews, without the prior permission of the publisher.

Probability and Statistics for STEM: A Course in One Semester


E.N. Barron and J.G. Del Greco
www.morganclaypool.com

ISBN: 9781681738055 paperback


ISBN: 9781681738062 ebook
ISBN: 9781681738079 hardcover

DOI 10.2200/S00997ED1V01Y202002MAS033

A Publication in the Morgan & Claypool Publishers series


SYNTHESIS LECTURES ON MATHEMATICS AND STATISTICS

Lecture #33
Series Editor: Steven G. Krantz, Washington University, St. Louis
Series ISSN
Print 1938-1743 Electronic 1938-1751
Probability and Statistics
for STEM
A Course in One Semester

E.N. Barron
Loyola University, Chicago

J.G. Del Greco


Loyola University, Chicago

SYNTHESIS LECTURES ON MATHEMATICS AND STATISTICS #33

M
&C Morgan & cLaypool publishers
ABSTRACT
One of the most important subjects for all engineers and scientists is probability and statistics.
This book presents the basics of the essential topics in probability and statistics from a rigorous
standpoint. The basics of probability underlying all statistics is presented first and then we cover
the essential topics in statistics, confidence intervals, hypothesis testing, and linear regression.
This book is suitable for any engineer or scientist who is comfortable with calculus and is meant
to be covered in a one-semester format.

KEYWORDS
probability, random variables, sample distribution, confidence intervals, prediction
intervals, hypothesis testing, linear regression
ix

Dedicated to Christina
– E.N. Barron

For Jim
– J.G. Del Greco
xi

Contents
Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xv

Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xvii

1 Probability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1 The Basics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Conditional Probability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.3 Appendix: Counting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
1.4 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

2 Random Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
2.1 Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
2.2 Important Discrete Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
2.3 Important Continuous Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
2.4 Expectation, Variance, Medians, Percentiles . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
2.4.1 Moment-Generating Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
2.4.2 Mean and Variance of Some Important Distributions . . . . . . . . . . . . 35
2.5 Joint Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
2.5.1 Independent Random Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
2.5.2 Covariance and Correlation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
2.5.3 The General Central Limit Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . 44
2.5.4 Chebychev’s Inequality and the Weak Law of Large Numbers . . . . . . 47
2.6 2 .k/, Student’s t- and F-Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
2.6.1 2 .k/ Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
2.6.2 Student’s t-Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
2.6.3 F -Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
2.7 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50

3 Distributions of Sample Mean and Sample SD . . . . . . . . . . . . . . . . . . . . . . . . . . 59


3.1 Population Distribution Known . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
3.1.1 The Population X  N.;  / . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
3.1.2 The Population X is not Normal but has Known Mean and Variance 63
xii
3.1.3 The Population is Bernoulli, p Known . . . . . . . . . . . . . . . . . . . . . . . . . 64
3.2 Population Variance Unknown: Sampling Distribution of the Sample
Variance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
3.2.1 Sampling Distribution of Differences of Two Samples . . . . . . . . . . . . 70
3.3 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71

4 Confidence and Prediction Intervals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77


4.1 Confidence Intervals for a Single Sample . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
4.1.1 Controlling the Error of an Estimate Using Confidence Intervals . . . 78
4.1.2 Pivotal Quantities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
4.1.3 Confidence Intervals for the Mean and Variance of a Normal
Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
4.1.4 Confidence Intervals for a Proportion . . . . . . . . . . . . . . . . . . . . . . . . . 84
4.1.5 One-Sided Confidence Intervals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
4.2 Confidence Intervals for Two Samples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
4.2.1 Difference of Two Normal Means . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
4.2.2 Ratio of Two Normal Variances . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
4.2.3 Difference of Two Binomial Proportions . . . . . . . . . . . . . . . . . . . . . . . 94
4.2.4 Paired Samples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
4.3 Prediction Intervals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
4.4 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100

5 Hypothesis Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105


5.1 A Motivating Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
5.2 The Basics of Hypothesis Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
5.3 Hypotheses Tests for One Parameter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
5.3.1 Hypotheses Tests for the Normal Parameters, Critical Value
Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
5.3.2 The p-Value Approach to Hypothesis Testing . . . . . . . . . . . . . . . . . . 113
5.3.3 Test of Hypotheses for Proportions . . . . . . . . . . . . . . . . . . . . . . . . . . 115
5.4 Hypotheses Tests for Two Populations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117
5.4.1 Test of Hypotheses for Two Proportions . . . . . . . . . . . . . . . . . . . . . . 120
5.5 Power of Tests of Hypotheses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123
5.5.1 Factors Affecting Power of a Test of Hypotheses . . . . . . . . . . . . . . . 125
5.5.2 Power of One-Sided Tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127
5.6 More Tests of Hypotheses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128
5.6.1 Chi-Squared Statistic and Goodness-of-Fit Tests . . . . . . . . . . . . . . . 129
xiii
5.6.2 Contingency Tables and Tests for Independence . . . . . . . . . . . . . . . . 135
5.6.3 Analysis of Variance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142
5.7 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148
5.8 Summary Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162

6 Linear Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 169


6.1 Introduction and Scatter Plots . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 169
6.2 Introduction to Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 170
6.2.1 The Linear Model with Observed X . . . . . . . . . . . . . . . . . . . . . . . . . 172
6.2.2 Estimating the Slope and Intercept from Data . . . . . . . . . . . . . . . . . 174
6.2.3 Errors of the Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 176
6.3 The Distributions of aO and bO . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 180
6.4 Confidence Intervals for Slope and Intercept and Hypothesis Tests . . . . . . . 182
6.4.1 Confidence and Prediction Bands . . . . . . . . . . . . . . . . . . . . . . . . . . . 186
6.4.2 Hypothesis Test for the Correlation Coefficient . . . . . . . . . . . . . . . . 190
6.5 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 194

A Answers to Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201


A.1 Answers to Chapter 1 Problems ................................... 201
A.2 Answers to Chapter 2 Problems ................................... 203
A.3 Answers to Chapter 3 Problems ................................... 208
A.4 Answers to Chapter 4 Problems ................................... 210
A.5 Answers to Chapter 5 Problems ................................... 212
A.6 Answers to Chapter 6 Problems ................................... 225

Authors’ Biographies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 227

Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 229
xv

Preface
Every student anticipating a career in science and technology will require at least a working
knowledge of probability and statistics, either for use in their own work, or to understand the
techniques, procedures, and conclusions contained in scholarly publications and technical re-
ports. Probability and statistics has always been and will continue to be a significant component
of the curricula of mathematics and engineering science majors, and these two subjects have
become increasingly important in areas that have not traditionally included them in their under-
graduate courses of study like biology, chemistry, physics, and economics. Over the last couple
of decades, methods originating in probability and statistics have found numerous applications
in a wide spectrum of scientific disciplines, and so it is necessary to at least acquaint prospective
professionals and researchers working in these areas with the fundamentals of these important
subjects. Unfortunately, there is little time to devote to the study of probability and statistics in a
science and engineering curriculum that is typically replete with required courses. What should
be a comprehensive two-semester course in probability and statistics has to be, out of necessity,
reduced to a single-semester course.This book is an attempt to provide a text that addresses both
rigor and conciseness of the main topics for undergraduate probability and statistics.
It is intended that this book be used in a one-semester course in probability and statis-
tics for students who have completed two semesters of calculus. It is our goal that readers gain
an understanding of the reasons and assumptions used in deriving various statistical conclusions.
The presentation of the topics in the book is intended to be at an intermediate, sophomore, or
junior level. Most two-semester courses present the subject at a higher level of detail and ad-
dress a wider range of topics. On the other hand, most one-semester, lower-level courses do not
present any derivations of statistical formulas and provide only limited reasoning motivating the
results. This book is meant to bridge the gap and present a concise but mathematically rigorous
introduction to all of the essential topics in a first course in probability and statistics. If you are
looking for a book which contains all the nooks and crannys of probability or statistics, this book
is not for you. If you plan on becoming (or already are) a practicing scientist or engineer, this
book will certainly contain much of what you need to know. But, if not, it will give you the
background to know where and how to look for what you do need, and to understand what you
are doing when you apply a statistical method and reach a conclusion.
The book provides answers to most of the problems in the book. While this book is
not meant to accompany a strictly computational course, calculations requiring a computer or
at least a calculator are inevitably necessary. Therefore, this course requires the use of a TI-
83/84/85/89, or any standard statistical package like Excel. Tables of things like the standard
normal, t-distribution, etc., are not provided.
xvi PREFACE
All experiments result in data. These data values are particular observations of underlying
random variables. To analyze the data correctly the experimenter needs to be equipped with the
tools that an understanding of probability and statistics provide. That is the purpose of this book.
We will present basic, essential statistics, and the underlying probability theory to understand
what the results mean.
The book begins with three foundational chapters on probability. The fundamental types
of discrete and continuous random variables and their basic properties are introduced. Moment-
generating functions are introduced at an early stage and used to calculate expected values and
variances, and also to enable a proof of the Central Limit Theorem, a cornerstone result. Much
of statistics is based on the Central Limit Theorem, and our view is that students should be
exposed to a rigorous argument for why it is true and why the normal distribution plays such a
central role. Distributions related to the normal distribution, like the 2 , t , and F distributions,
are presented for use in the statistical methods developed later in the book.
Chapter 3 is the prelude to the statistical topics included in the remainder of the text.
This chapter includes the analysis of sample means and sample standard deviations as random
variables.
The study of statistics begins in earnest with a discussion of confidence intervals in Chap-
ter 4. Both one-sample and two-independent-samples confidence intervals are constructed as
well as confidence intervals for paired data. Chapter 5 contains the topics at the core of statis-
tics, particularly important for experimenters. We introduce tests of hypotheses for the major
categories of experiments. Throughout the chapter, the dual relationship between confidence
intervals and hypotheses tests is emphasized. The power of a test of hypotheses is discussed
in some detail. Goodness-of-fit tests, contingency tables, tests for independence, and one-way
analysis of variance is presented.
The book ends with a basic discussion of linear regression, an extremely useful tool in
statistics. Calculus students have more than enough background to understand and appreciate
how the results are derived. The probability theory introduced in earlier chapters is sufficient to
analyze the coefficients derived in the regression.
The book has been used in the Introduction to Statistics & Probability course at our uni-
versity for several years. It has been used in both two lectures per week and three lectures per
week formats. Each semester typically involves at least two midterms (usually three) and a com-
prehensive final exam. In addition, class time includes time for group work as well as in-class
quizzes. It makes it a busy semester to finish.

E.N. Barron and J.G. Del Greco


August 2020
xvii

Acknowledgments
We gratefully acknowledge Susanne Filler at Morgan & Claypool.

E.N. Barron and J.G. Del Greco


August 2020
1

CHAPTER 1

Probability
There are two kinds of events which occur: deterministic and stochastic (or random as a syn-
onym). Deterministic phenomena are such that the same inputs always gives the exact same
outputs. Newtonian physics deals with deterministic phenomena, but even real-world science
is subject to random effects. Engineering design usually has a criteria which is to be met with
95% certainty because 100% certainty is impossible and too expensive. Think of designing an
elevator in a high-rise building. Does the engineer know with certainty how many people will
get on the elevator? What happens to the components of the elevator as they age? Do we know
with certainty when they will collapse? These are examples of random events and we need a way
to quantify them.
Statistics is based on probability which is a mathematical theory used to make sense out of
random events and phenomena. In this chapter we will cover the basic concepts and techniques
of probability we will use throughout this book.

1.1 THE BASICS


Every experiment whether it is designed by some experimenter or not, results in a possible
outcome. If you deal five cards from a deck of cards, the hand you get is an outcome.

Definition 1.1 The sample space is the set S of all possible outcomes of an experiment.

S could be a finite set (like the number of all possible five card hands), a countably infinite set
(like f0; 1; 2; 3 : : : g in a count of the number of users logging on to a computer system), or a
continuum (like an interval Œa; b, like selecting a random number from a to b ).

Definition 1.2 An event A is any subset of S; A  S:


The set S is also called the sure event. The empty set ; is called the impossible event.
The class of all possible events is denoted by F D fA j A  Sg:
If S is a finite set with N elements, we write jS j D N and then the number of possible events
is 2N . (Why?)

Example 1.3

• If we roll a die the sample space is S D f1; 2; 3; 4; 5; 6g: Rolling an even number is the
event A D f2; 4; 6g:
2 1. PROBABILITY
• If we want to count the number of customers coming to a bakery the sample space is
S D f0; 1; 2; : : : g; and the event we get between 2 and 7 customers is A D f2; 3; 4; 5; 6; 7g:

• If we throw a dart randomly at a circular board of radius 2 feet, the sample space is the
set of all possible positions of the dart S D f.x; y/ j x 2 C y 2  4g: The event that the dart
landed in the first quadrant is A D f.x; y/ j x 2 C y 2  4; x  0; y  0:g:

Eventually we want to find the probability that an event will occur. We say that an event
A occurs if any outcome in the set A actually occurs when the experiment is performed.

Combinations of events:
Let A; B 2 F be any two events. From these events we may describe the following events:

(a) A [ B is the event A occurs, or B occurs, or they both occur.

(b) A \ B , also written as AB , is the event A occurs and B occur, i.e., they both occur.

(c) Ac D S A is the event A does not occur. This is all the outcomes in S and not in A.

(d) A \ B c is the event A occurs and B does not occur.

(e) A \ B D ; means the two events cannot occur together, i.e., they are mutually exclusive.
We also say that A and B are disjoint. Mutually exclusive events cannot occur at the same
time.

(f ) A [ Ac D S means that no matter what event A we pick, either A occurs or Ac occurs,


and not both. A and Ac are mutually exclusive.

Many more such relations hold if we have three or more events. It is useful to recall the
following set relationships.

• A \ .B [ C / D .A \ B/ [ .A \ C / and A [ .B \ C / D .A [ B/ \ .A [ C /:

• .A \ B/c D Ac [ B c and .A [ B/c D Ac \ B c : (DeMorgan’s Rules)

These relations can be checked by using Venn diagrams.


Now we are ready to define what we mean by the probability of an event.

Definition 1.4 A probability function is a function P W F ! R satisfying

• P .A/  0; 8A 2 F , probabilities cannot be negative,

• P .S / D 1 , the probability of the sure event is 1,


1.1. THE BASICS 3

• P .A [ B/ D P .A/ C P .B/ for all events A; B 2 F such that A \ B D ;:

This is called the disjoint event sum rule.

Whenever we write P we will always assume it is a probability function.

Remark 1.5 Immediately from the definition we can see that P .;/ D 0: In fact, since we have
the disjoint sum rule

S \ ; D ;; P .S/ D 1 D P .S [ ;/ D P .S / C P .;/ D 1 C P .;/ H) P .;/ D 0:

Since 1 D P .S/ D P .A [ Ac / D P .A/ C P .Ac / we also see that

P .Ac / D 1 P .A/

for any event A 2 F :


It is also true that no matter what event A 2 F we take 0  P .A/  1: In fact, by definition
P .A /  0; and since P .Ac / D 1 P .A/  0; it must be that 0  P .A/  1:
c

Remark 1.6 One of the most important and useful rules is the Law of Total Probability:

P .A/ D P .A \ B/ C P .A \ B c /; for any events A; B 2 F : (1.1)

To see why this is true, we use some basic set theory decomposing A;

A D A \ S D A \ .B [ B c / D .A \ B/ [ .A \ B c /

and A \ B is disjoint from A \ B c : Therefore, by the disjoint event sum rule,

P .A/ D P ..A \ B/ [ .A \ B c // D P .A \ B/ C P .A \ B c /:

A main use of this Law is that we may find the probability of an event A if we know what
happens when A \ B occurs and when A \ B c occurs. A useful form of this is P .A \ B c / D
P .A/ P .A \ B/:

The next theorem gives us the sum rule when the events are not mutually exclusive.

Theorem 1.7 General Sum Rule. If A; B are any two events, then

P .A [ B/ D P .A/ C P .B/ P .A \ B/:


4 1. PROBABILITY
Proof. A union can always be written as a disjoint union. That is, A [ B D A [ .Ac \ B/: Then
by the disjoint sum rule P .A [ B/ D P .A/ C P .Ac \ B/: But, by the Law of Total Probability
P .Ac \ B/ D P .B/ P .A \ B/: Putting these together we have

P .A [ B/ D P .A/ C P .Ac \ B/ D P .A/ C P .B/ P .A \ B/:

The next example gives one of the most important probability functions for finite sample
spaces.

Example 1.8 When the sample space is finite, say jSj D N; and all individual outcomes in S
are equally likely, we may define a function

n.A/
P .A/ D ; where n.A/ D number of outcomes in A:
N
To see that this is a probability function we only have to verify the conditions of the definition.

• 0  P .A/  1 since 0  n.A/  N for any event A  S:


n.S /
• P .S / D N
D 1:
n.A[B/ n.A/Cn.B/
• P .A [ B/ D N
D N
D P .A/ C P .B/; if A \ B D ;:

The requirement that individual outcomes be equally likely is essential. For example, suppose we
roll two dice and sum the numbers on each die. We take the sample space S D f2; 3; 4; : : : ; 12g:
If we use this sample space and we assume the outcomes are equally likely then we would get
that P .roll a 7/ D 1=11 which is clearly not correct. The problem is that with this sample space,
the individual outcomes are not equally likely. If we want equally likely outcomes we need to
change the sample space to account for the result on each die:

S D f.1; 1/; .1; 2/; : : : ; .1; 6/; .2; 1/; .2; 2/ : : : ; .2; 6/; : : : ; .6; 1/; .6; 2/; : : : ; .6; 6/g: (1.2)

This sample space has 36 outcomes and the event of rolling a 7 is

A D f.1; 6/; .6; 1/; .2; 5/; .5; 2/; .3; 4/; .4; 3/g:

Then P .A/ D P .roll a 7/ D 6=36 is the correct probability of rolling a 7.

Example 1.9 Whenever the sample space can easily be written it is often the best way to find
probabilities. As an example, we roll two dice and we let D1 denote the number on the first die
and D2 the number on the second. Suppose we want to find P .D1 > D2 /: The easiest way to
1.2. CONDITIONAL PROBABILITY 5
solve this is to write down the sample space as we did in (1.2) and then use the fact that each
outcome is equally likely. We have

fD1 > D2 g D f.2; 1/; .3; 2/; .3; 1/; .4; 3/; .4; 2/; .4; 1/;
.5; 4/; .5; 3/; .5; 2/; .5; 1/; .6; 5/; .6; 4/; .6; 3/; .6; 2/; .6; 1/g:
15
This event has 15 outcomes which means P .D1 > D2 / D 36
:

1.2 CONDITIONAL PROBABILITY


It is important to take advantage of information about the occurrence of a given event in calcu-
lating the probability of a separate event. The way to do that is to use conditional probability.

Definition 1.10 The conditional probability of event A; given that event B has occurred is

P .A \ B/
P .AjB/ D if P .B/ > 0:
P .B/

If P .B/ D 0; B does not occur.

One of the justifications for this definition can be seen from the case when the sample
space is finite (and equally likely individual outcomes). We have, if jSj D N;

n.A \ B/
n.A \ B/ N P .A \ B/
D D D P .AjB/:
n.B/ n.B/ P .B/
N
The left-most side of this string is the fraction of outcomes in A \ B from the event B: In other
words, it is the probability of A using the reduced sample space B: That is, if the outcomes in S
are equally likely, P .AjB/ is the proportion of outcomes in both A and B relative to the number
of outcomes in B .
The introduction of conditional probability gives us the following which follows by rear-
ranging the terms in the definition.

Multiplication Rule: P .A \ B/ D P .AjB/P .B/ D P .BjA/P .A/:

Example 1.11 In a controlled experiment to see if a drug is effective 71 patients were given the
drug (event D ), while 75 were given a placebo (event D c ). A patient records a response (event
R) or not (event Rc ). The following table summarizes the results.
6 1. PROBABILITY
Drug Placebo Subtotals Probability
Response 26 13 39 0.267
No Response 45 62 107 0.733
Subtotals 71 75 146
Probability 0.486 0.514

This is called a two-way or contingency table .


The sample space consists of 146 outcomes of the type (Drug, Response), (Placebo, Re-
sponse), (Drug, No Response), or (Placebo, No Response), assumed equally likely. The numbers
in the table are recorded after the experiment is performed and we estimate the probability of
each event. For instance,
71 39
P .D/ D D 0:486; P .R/ D ;
146 146
and so on. For example, P .R/ is obtained from 39 of the equally likely chosen patients exhibit
a response (whether to the drug or the placebo).
We can use the Law of Total Probability to also calculate these probabilities. If we want
the chance that a randomly chosen patient will record a response we use the fact that R D
.R \ D/ [ .R \ D c /; so
26 13 39
P .R/ D P .R \ D/ C P .R \ D c / D C D D :267
146 146 146
and
26 45 71
P .D/ D P .D \ R/ C P .D \ Rc / D C D D 0:486:
146 146 146
We may answer various questions using conditional probability.
• If we choose at random a patient and we observe that this patient exhibited a response,
what is the chance this patient took the drug? This is
26
26 P .D \ R/
P .DjR/ D D D 146 :
39 P .R/ 39
146
Using the reduced sample space R is how we got the first equality.
• If we choose a patient at random and we observe that this patient took the drug, what is
the chance this patient exhibited a response? This is P .RjD/ D 26
71
: Notice that P .RjD/ ¤
P .DjR/:
• Find P .Rc jD/ D 45
71
: Observe that since P .D/ D P .R \ D/ C P .Rc \ D/; we have
c c
P .R jD/ D P .R \ D/=P .D/ D .P .D/ P .R \ D// D 1 P .RjD/:
1.2. CONDITIONAL PROBABILITY 7
Using the Law of Total Probability we get an important formula and tool for calculating
probabilities of events.

Theorem 1.12 P .A/ D P .AjB/P .B/ C P .AjB c /P .B c /:

Proof. The Law of Total Probability combined with the multiplication rule says

P .A/ D P .A \ B/ C P .A \ B c / D P .AjB/P .B/ C P .AjB c /P .B c /;

which is the statement in the theorem. 


Frequently, problems arise in which we want to find the conditional probability of some
event and we have yet another event we want to take into account. The next corollary tells us
how to do that.

Corollary 1.13 Let A; B; C be three events. Then

P .AjB/ D P .AjB \ C /P .C jB/ C P .AjB \ C c /P .C c jB/;

assuming each conditional probability is defined.

Proof. Simply write out each term and use the theorem.

P .A \ B/ D P .A \ B \ C / C P .A \ B \ C c /
P .A \ B \ C / P .A \ B \ C c /
D P .B \ C / C P .B \ C c /
P .B \ C / P .B \ C c /
D P .AjB \ C /P .B \ C / C P .AjB \ C c /P .B \ C c /:

Divide both sides by P .B/ > 0 to get


P .A \ B/ P .B \ C / P .B \ C c /
D P .AjB/ D P .AjB \ C / C P .AjB \ C c /
P .B/ P .B/ P .B/
D P .AjB \ C /P .C jB/ C P .AjB \ C c /P .C c jB/:


Another very useful fact is that conditional probabilities are actually probabilities and
therefore all rules for probabilities apply to conditional probabilities as long as the given in-
formation remains the same.

Corollary 1.14 Let B be an event with P .B/ > 0: Then Q.A/ D P .AjB/; A 2 F ; is a proba-
bility function.
8 1. PROBABILITY
Proof. We have to verify Q./ satisfies the axioms of Definition 1.4. Clearly, Q.A/  0 for any
event A and Q.S/ D P .SjB/ D PP.S.B/\B/
DP .B/
P .B/
D 1: Finally, let A1 \ A2 D ;;

P ..A1 [ A2 / \ B/ P ..A1 \ B/ [ .A2 \ B//


P .A1 [ A2 jB/ D D
P .B/ P .B/
P .A1 \ B/ C P .A2 \ B/
D D P .A1 jB/ C P .A2 jB/:
P .B/

This means the disjoint sum rule holds. 


Conditional probability naturally leads us to what it means when information about B
doesn’t help with the probability of A: This is an important concept and will be very helpful
throughout probability and statistics.

Definition 1.15 Two events A; B are said to be independent, if the knowledge that one of the
events occurred does not affect the probability that the other event occurs. That is,

P .AjB/ D P .A/ and P .BjA/ D P .A/:

Using the definition of conditional probability, an equivalent definition is P .A \ B/ D


P .A/P .B/:

Example 1.16 1. Suppose an experiment has two possible outcomes a; b; so the sample space
is S D fa; bg: Suppose P .a/ D p and P .b/ D 1 p: If we perform this experiment n  1 times
with identical conditions from experiment to experiment, then the events of individual experi-
ments are independent. We may calculate

P .na0 s in a row/ D p n ; P .na0 s and then b/ D p n .1 p/:

In particular, the chance of getting five straight heads in five tosses of a fair coin is . 21 /5 D 1=32:
2. The following two-way table contains data on place of residence and political leaning.

Moderate Conservative Total


Urban 200 100 300
Rural 75 225 300
Total 275 325 600

Is one’s political leaning independent of place of residence? To answer this question, let U D
furbang; R D fruralg; M D fmoderateg; C D fconservativeg: Then P .U \ M / D 200=600 D
1=3; P .U / D 300=600 D 1=2; P .M / D 275=600: Since P .U \ M / ¤ P .U /  P .M /; they are
not independent.
1.2. CONDITIONAL PROBABILITY 9
When events are not independent we can frequently use the information about the oc-
currence of one of the events to find the probability of the other. That is the basis of conditional
probability. The next concept allows us to calculate the probability of an event if the entire sam-
ple space is split (or partitioned) into pieces and decomposing the event we are interested in into
the parts occurring in each piece. Here’s the idea.
If we have events B1 ; : : : ; Bn such that Bi \ Bj D ;; for all i; j and [niD1 Bi D S; then
the collection fBi gniD1 is called a partition of S: In this case, the Law of Total Probability says

n
X n
X
P .A/ D P .A \ Bi /; and P .A/ D P .AjBi /P .Bi /
i D1 i D1

for any event A 2 F : We can calculate the probability of an event by using the pieces of A that
intersect each Bi : It is always possible to partition S by taking any event B and the event B c :
Then for any other event A;

P .A/ D P .A \ B/ C P .A \ B c / D P .AjB/P .B/ C P .AjB c /P .B c /:

Example 1.17 Suppose we draw the second card from the top of a well-shuffled deck. We
want to know the probability that this card is an Ace.
This seems to depend on what the first card is. Let B D f1st card is an Aceg and consider
the partition fB; B c g: We condition on what the first card is.

P .2nd card is an Ace/ D P .2nd and 1st are Aces/


C P .1st is not an Ace and 2nd is an Ace/
D P .2nd is AcejB/P .B/ C P .2nd is AcejB c /P .B c /
3 4 4 48 4
D  C  D :
51 52 51 52 52
Amazingly, the chances the second card is an ace is the same as the chance the first card is an ace.
This makes sense because if we don’t know what the first card is, the second card should have
the same chance as the first card. In fact, the chance the 27th card is an ace is also 4=52 as long
as we don’t know any of the preceding 26 cards.

The next important theorem tells us how to find P .Bk jA/ if we know how to find P .AjBi /
for each event Bi in the partition of S: It shows us how to find the probability that if A occurs,
it was due to Bk :

Theorem 1.18 Bayes’ Rule. Let fBi gniD1 be a partition of S: Then for each k D 1; 2; : : : ; n:

P .Bk \ A/ P .AjBk /P .Bk / P .AjBk /P .Bk /


P .Bk jA/ D D D :
P .A/ P .A/ P .AjB1 /P .B1 / C    C P .AjBn /P .Bn /
10 1. PROBABILITY

The proof is in the statement of the theorem using the definition of conditional probability
and the Law of Total Probability.

Example 1.19 This example shows the use of both the Law of Total Probability and Bayes’
Rule. Suppose there is a box with 10 coins, 9 of which are fair coins (probability of heads is 1/2),
and 1 of which has heads on both sides. Suppose a coin is picked at random and it is tossed 5
times. Given that all 5 tosses result in heads, what is the probability the 6th toss will be a head?
Let A D ftoss 6 is a Hg; B D f1st 5 tosses are Hg; and C D fcoin chosen is fairg: The
problem is we can’t calculate P .A/ or P .B/ until we know what kind of coin we have. We
need to condition on the type of coin. Here’s what we know. P .C / D 9=10; and

1
P .AjC / D P .toss 6 is a Hjcoin chosen is fair/ D
2
P .AjC c / D P .toss 6 is a Hjcoin chosen is not fair/ D 1:

Now let’s use Corrollary 1.13:

P .AjB/ D P .AjB \ C /P .C jB/ C P .AjB \ C c /P .C c jB/


1 P .BjC /P .C /
D C 1  .1 P .C jB//
2 P .BjC /P .C / C P .BjC c /P .C c /
using Bayes’ Formula and P .C c jB/ D 1 P .C jB/
1 .1=2/5  9=10
D C 1  .1 P .C jB//
2 .1=2/5  9=10 C 1  .1=10/
1 9 32 73
D C D D 0:8902:
2 41 41 82

Example 1.20 Tests for a medical condition are not foolproof. To see what this implies, sup-
pose a test for a virus has sensitivity 0:95 and specificity 0:92: This means

Sensitivity D P .test positivejhave the disease/ D P .TP jD/ D 0:95

and

Specificity D P .test negativejdo not have the disease/ D P .TP c jD c / D 0:92:

Suppose the prevalence of the disease is 5%, which means P .D/ D 0:05: The question is if
someone tests positive for the disease, what are the chances this person actually has the disease?
1.2. CONDITIONAL PROBABILITY 11
c c
This is asking for P .DjTP / but what we know is P .TP jD/ and P .TP jD /: This is a
perfect use of Bayes’ rule. We also use Corrollary 1.14:
P .D \ TP / P .TP jD/P .D/
P .DjTP / D D
P .TP / P .TP /
P .TP jD/P .D/
D
P .TP jD/P .D/ C P .TP jD c /P .D c /
P .TP jD/P .D/
D
P .TP jD/P .D/ C .1 P .TP c jD c //P .D c /
0:95  0:05
D D 0:3846:
0:95  0:05 C .1 0:92/  0:95
This is amazing. Only 38% of people who test positive actually have the disease.

Example 1.21 Suppose there is a 1% chance of contracting a rare disease. Let D be the event
you have the disease and TP the event you test positive for the disease. We know P .TP jD/ D
0:98; and P .TP c jD c / D 0:95: As in the previous example, we first ask: given that you test
positive, what is the probability that you really have the disease? We know how to work this out:
0:98.0:01/
P .DjTP / D D 0:165261:
0:98.0:01/ C .1 0:95/.0:99/
Now suppose there is an independent repetition of the test. Suppose the second test is also
positive and now you want to know the probability that you really have the disease given the
two positives.
To solve this let TPi ; i D 1; 2 denote the event you test positive on test i D 1; 2: These
events are assumed conditionally independent.1 Therefore, again by Bayes’ formula we have
P .TP1 \ TP2 jD/P .D/
P .DjTP1 \ TP2 / D
P .TP1 \ TP2 /
P .TP1 \ TP2 jD/P .D/
D
P .TP1 \ TP2 jD/P .D/ C P .TP1 \ TP2 jD c /P .D c /
P .TP1 jD/P .TP2 jD/P .D/
D D 0:795099:
P .TP1 jD/P .TP2 jD/P .D/ C .1 P .TP1c jD c //.1 P .TP2c jD//P .D c /

This says that the patient who tests positive twice now has an almost 80% chance of actually
having the disease.

Example 1.22 Simpson’s Paradox. Suppose a college has two majors, A and B. There are
2000 male applicants to the college with half applying to each major. There are 1100 female
1 Conditional independence means independent conditioned on some event, i.e., P .A \ BjC / D P .AjC /  P .BjC /.
In our case, conditional independence means P .TP1 \ TP2 jD/ D P .TP1 jD/P .TP2 jD/.
12 1. PROBABILITY
applicants with 100 applying to A, and the rest to B. Major A admits 60% of applicants while
major B admits 30%. This means that the percentage of men and women who apply to the
college must be the same, right? Wrong.
In fact, we know that a total of 900 male applicants to the college were admitted giving
900/2000=0.45 or 45% of men admitted. For women the percentage is 360/1100=0.327 or 33%.
Aggregating the men and women covers the fact that a larger percentage of women applied to
the major which has a lower acceptance rate. This is an example of Simpson’s paradox. Here’s
another example.
Two doctors have a record of success in two types of surgery, Low Risk and High Risk.
Here’s the table summarizing the results.

Doctor A Doctor B
Low Risk 93% (81/87) 87% (234/270)
High Risk 73% (192/263) 69% (55/80)
Total 78% (273/350) 83% (289/350)

The data show that conditioned on either low- or high-risk surgeries, Doctor A has a
better success percentage. However, aggregating the high- and low-risk groups together pro-
duces the opposite conclusion. The explanation of this is arithmetic, namely, for numbers
A
a; b; c; d; A; B; C; D; it is not true that B > ab and D
C
> dc implies BCD
ACC aCc
> bCd : In the example,
81=87 > 234=270 and 192=263 > 55=80 but .81 C 192/=.87 C 263/ < .234 C 55/=.270 C 80/:

1.3 APPENDIX: COUNTING


When we have a finite sample space jSj D N with equally likely outcomes, we calculate P .A/ D
n.A/
N
: It is sometimes a very difficult task to calculate both N and n.A/: In this appendix we give
some basic counting principles to help with this.

Basic Counting Principle:


If there is a task with two steps and step one can be done in k different ways, and step two in j
different ways, then the task can be completed in k  j different ways.

Permutations:
The number of ways to arrange k objects out of n distinct objects is


n.n 1/.n 2/    .n .k 1// D D Pn;k :
.n k/Š
1.3. APPENDIX: COUNTING 13
For instance, if we have 3 distinct objects fa; b; cg; there are 6 D 3  2 ways to pick 2 objects out
of the 3, since there are 3 ways to pick the first object and then 2 ways to pick the second. They
are .a; b/; .a; c/; .b; a/; .b; c/; .c; a/; .c; b/:

Combinations:
The number of ways to choose k objects out of n when we don’t care about the order of the
objects is
!
nŠ n
Cn;k D D :
kŠ.n k/Š k

For example, in the paragraph on permutations, the choices .a; b/ and .b; a/ are different per-
mutations but they are the same combination and so should not be counted separately. The way
to get the number of combinations is to first figure out the number of permutations, namely

.n k/Š
; and then get rid of the number of ways to arrange the selection of k objects, namely kŠ:
In other words, ! !
n n nŠ
Pn:k D kŠ H) D :
k k .n k/ŠkŠ

Example 1.23 Poker Hands. We will calculate the probability of obtaining some of the
common 5 card poker hands to illustrate the counting principles. A standard 52-card deck
has 4 suits (Hearts, Clubs, Spades, Diamonds) with each suit consisting of 13 cards labeled
2; 3; 4; 5; 6; 7; 8; 9; 10; J; Q; K; A: Five cards from the deck are chosen at random (without re-
placement). We now want to find the probabilities of various poker hands.
The sample space S is all possible 5-card hands where order of the cards does not mat-
ter. These are combinations of 5 cards from the 52, and there are 52 5
D 2;598;960 D jS j D N
possible hands, all of which are equally likely.
Probability of a Royal Flush, which is A; K; Q; J; 10 all the same suit. Let A D
froyal flushg: How many royal flushes  are there? It should be obvious there are exactly 4, one for
each suit. Therefore, P .A/ D 4= 52
5
D 0:00000153908; an extremely rare event.
Probability of a Full House, which is 3 of a kind and a pair. Let A D ffull houseg: To get
a full house we break this down into steps.

(a) Pick a card for the 3 of a kind. There are (c) Choose another type distinct from the
13 types one could choose. first type. There are 12 ways to do that.
(b) Choose 3 out of the 4 cards of the same (d) Choose 2 cards of the same type chosen

type chosen in the first step. There are 43 in the previous step. There are 42 ways
ways to do that. to do that.
14 1. PROBABILITY
4
 4

We conclude that the number
 of full house hands is n.A/ D 13  3
 12  2
D 3744:
Consequently P .A/ D 3744= 52
5
D 0:00144:
Probability of 3 of a Kind. This is a hand of the form aaabc where b; c are cards neither
of which has the same face value as a: Let A be the event we get 3 of a kind. The number of
hands in A is calculated using the multiplication rule with these steps:

(a) Choose a card type (c) Choose 2 more distinct types


(b) Choose 3 of that type (d) Choose 1 card of each type
13
 4
 12
 4
 4

The number
 of ways to do this is 1
 3
 2
 1
 1
D 54;912; and so P .A/ D
54;912= 52
5
D 0:0211
Probability of a Pair. This is a hand like aabcd where b; c; d; are distinct cards without
the same face values. A is the event that we get a pair. To get one pair and make sure the other
3 cards don’t match the pair is a bit tricky.

(a) Choose a card type (c) Choose 3 types from the remaining types
(b) Choose 2 of that type (d) Choose 1 card from each of these types
   3
The number of ways to do that is 131
 42  123
 41 D 1;098;240, Therefore, P .A/ D
0:4225: An exercise asks for the probability of getting 2 pairs.

1.4 PROBLEMS
1.1. Suppose P .A/ D p; P .B/ D 0:3; P .A [ B/ D 0:6. Find p so that P .A \ B/ D 0: Also,
find p so that A and B are independent.
1.2. When P .A/ D 1=3; P .B/ D 1=2; P .A [ B/ D 3=4; what is (a)P .A \ B/? and (b) what
is P .Ac [ B/?
1.3. Show P .AB c / D P .A/ P .AB/ and P (exactly one of A or B occur) D P .A/ CP .B/
2P .A \ B/.
1.4. 32% of Americans smoke cigarettes, 11% smoke cigars, 7% smoke both.

(a) What percent smoke neither cigars (b) What percent smoke cigars but not
nor cigarettes? cigarettes?

1.5. Let A; B; C be events. Write the expression for


1.4. PROBLEMS 15
(a) only A occurs (f ) none occur
(b) both A and C occur but not B (g) at most 2 occur
(c) at least one occurs (h) at most 1 occurs
(d) at least 2 occur (i) exactly 2 occur
(e) all 3 occur (j) at most 3 occur

1.6. Suppose n.A/ is the number of times A occurs if an experiment is performed N times.
Set FN .A/ D n.A/
N
: Show that FN satisfies the definition to be a probability func-
tion. This leads to the frequency definition of the probability of an event P .A/ D
limN !1 FN .A/; i.e., the probability of an event is the long term fraction of time the
event occurs.

1.7. Three events A; B; C cannot occur simultaneously. Further it is known that P .A \ B/ D


P .B \ C / D P .A \ C / D 1=3: Can you determine P .A/? Hint: A  .B \ C /c .

1.8. (a) Give an example to illustrate P .A/ C P .B/ D 1 does not imply A \ B D ;: (b)
Give an example to illustrate P .A [ B/ D 1 does not imply A \ B D ;: (c) Prove that
P .A/ C P .B/ C P .C / D 1 if and only if P .AB/ D P .AC / D P .BC / D 0:

1.9. A box contains 2 white balls and an unknown amount (finite) of non-white balls. Sup-
pose 4 balls are chosen at random without replacement and suppose the probability of
the sample containing both white balls is twice the probability of the sample containing
no white balls. Find the total number of balls in the box.

1.10. Let C and D be two events for which one knows that P .C / D 0:3; P .D/ D 0:4; P .C \
D/ D 0:2: What is P .C c \ D/?

1.11. An experiment has only two possible outcomes, only one of which may occur. The first
has probability p to occur, the second probability p 2 : What is p ?

1.12. We repeatedly toss a coin. A head has probability p , and a tail probability 1 p to occur,
where 0 < p < 1: What is the probability the first head occurs on the 5th toss? What
is the probability it takes 5 tosses to get two heads?

1.13. Show that if A  B; then P .A/  P .B/:

1.14. Analogously to the finite sample space case with equally likely outcomes we may define
P .A/ D area of A=area of S; where S  R2 is a fixed two-dimensional set (with equally
likey outcomes) and A  S: Suppose that we have a dart board given by S D fx 2 C y 2 
9g and A is the event that a randomly thrown dart lands in the ring with inner radius 1
and outer radius 2. Find P .A/:
16 1. PROBABILITY
1.15. Show that P .A [ B [ C / D P .A/ C P .B/ C P .C / P .A \ B/ P .A \ C /
P .C \ B/ C P .A \ B \ C /:
1.16. Show that P .A \ B/  P .A/ C P .B/ 1 for all events A; B 2 F : Use this to find a
lower bound on the probability both events occur if the probability of each event is 0.9.
1.17. A fair coin is flipped twice. We know that one of the tosses is a Head. Find the proba-
bility the other toss is a Head. (Hint: The answer is not 1/2.).
1.18. Find the probability of two pair in a 5-card poker hand.
1.19. Show that DeMorgan’s Laws .A [ B/c D Ac \ B c and .A \ B/c D Ac [ B c hold and
then find the probability neither A nor B occur and the probability either A does not
occur or B does not but one of the two does occur. Your answer should express these in
terms of P .A/; P .B/, and P .A \ B/:
1.20. Show that if A and B are independent events, then so are A and B c as well as
Ac and B c :
1 1
1.21. If P .A/ D and P .B c / D is it possible that A \ B D ;? Explain.
3 4
1.22. Suppose we choose one of two coins C1 or C2 in which the probability of getting a head
with C1 is 1=3; and with C2 is 2=3: If we choose a coin at random what is the probability
we get a head when we flip it?
1.23. Suppose two cards are dealt one at a time from a well-shuffled standard deck of cards.
Cards are ranked 2 < 3 <    < 10 < J < Q < K < A.
P
(a) Find the probability the second card beats the first card. Hint: Look at k P .C2 >
C1 jC1 D k/P .C1 D k/:
(b) Find the probability the first card beats the second and the probability the two
cards match.
1.24. A basketball team wins 60% of its games when it leads at the end of the first quarter,
and loses 90% of its games when the opposing team leads. If the team leads at the end
of the first quarter about 30% of the time, what fraction of the games does it win?
1.25. Suppose there is a box with 10 coins, 8 of which are fair coins (probability of heads is
1/2), and 2 of which have heads on both sides. Suppose a coin is picked at random and
it is tossed 5 times. Given that we got 5 straight heads, what are the chances the coin
has heads on both sides?
1.26. Is independence for three events A; B; C the same as: A; B are independent; B; C are
independent; and A; C are independent? Consider the example: Perform two indepen-
dent tosses of a coin. Let A Dheads on toss 1, B Dheads on toss 2, and C Dthe two
tosses are equal.
1.4. PROBLEMS 17
(a) Find P .A/; P .B/; P .C /; P .C jA/; P .BjA/; and P .C jB/. What do you conclude?
(b) Find P .A \ B \ C / and P .A \ B \ C c /. What do you conclude?

1.27. First show that P .A [ B/ D P .A/ C P .Ac \ B/ and then calculate.

(a) P .A [ B/ if it is given that P .A/ D 1=3 and P .BjAc / D 1=4:


(b) P .B/ if it is given that P .A [ B/ D 2=3; P .Ac jB c / D 1=2:

1.28. The events A; B; and C satisfy: P .AjB \ C / D 1=4; P .BjC / D 1=3; and P .C / D
1=2: Calculate P .Ac \ B \ C /:

1.29. Two independent events A and B are given, and P .BjA [ B/ D 2=3; P .AjB/ D 1=2:
What is P .B/?

1.30. You roll a die and a friend tosses a coin. If you roll a 6, you win. If you don’t roll a 6 and
your friend tosses a H, you lose. If you don’t roll a 6, and your friend does not toss a H,
the game repeats. Find the probability you Win.

1.31. You are diagnosed with an uncommon disease. You know that there only is a 4% chance
of having the disease. Let D Dyou have the disease, and T Dthe test says you have it.
It is known that the test is imperfect: P .T jD/ D 0:9 and P .T c jD c / D 0:85:

(a) Given that you test positive, what is the probability that you really have the disease?
(b) You obtain a second and third opinion: two more (conditionally) independent
repetitions of the test. You test positive again on both tests. Assuming conditional
independence, what is the probability that you really have the disease?

1.32. Two dice are rolled. What is the probability that at least one is a six? If the two faces
are different, what is the probability that at least one is a six?

1.33. 15% of a group are heavy smokers, 30% are light smokers, 55% are nonsmokers. In a 5-
year study it was determined that the death rates of heavy and light smokers were 5 and
3 times that of nonsmokers, respectively. What is the probability a randomly selected
person was a nonsmoker, given that he died?

1.34. A, B, and C are mutually independent and P .A/ D 0:5; P .B/ D 0:8; P .C / D 0:9: Find
the probabilities (i) all three occur, (ii) exactly 2 of the 3 occur, or (iii) none occurs.

1.35. A box has 8 red and 7 blue balls. A second box has an unknown number of red and 9
blue balls. If we draw a ball from each box at random we know the probability of getting
2 balls of the same color is 151/300. How many red balls are in the second box?

1.36. Show that:


18 1. PROBABILITY
(a) P .AjA [ B/  P .AjB/: Hint: A D .A \ B/ [ .A \ B c / and A [ B D B [ .A \
B c /.
(b) If P .AjB/ D 1 then P .B c jAc / D 1:
(c) P .AjB/  P .A/ H) P .BjA/  P .B/:
1.37. Coin 1 has H with probability 0.4; Coin 2 has H with probability 0.7. One of these
coins is chosen at random and flipped 10 times. Find
(a) P .coin lands H on exactly 7 of the 10 flips/ and
(b) given the first of these 10 flips is H, find the conditional probability that exactly 7
of the 10 flips is H.
1.38. Show the extended version of the Law of Total Conditional Probability
X
P .AjB/ D P .AjEi \ B/P .Ei jB/; S D [i Ei ; Ei \ Ej D ; i ¤ j:
i

1.39. There are two universities. The breakdown of males and females majoring in Math at
each university is given in the tables.

Univ 1 Math Major Other Univ 2 Math Major Other


Males 200 800 Males 30 70
Females 150 850 Females 1000 3000

Show that this is an example of Simpson’s paradox.


1.40. The table gives the result of a drug trial:

M Recover M Die F Recover F Die O Recover O Die


Drug 15 40 90 50 105 90
No Drug 20 40 20 10 40 50

Here M D male, F D female, and O D overall. Show that this is an example of Simpson’s
paradox.
19

CHAPTER 2

Random Variables
In this chapter we study the main properties of functions whose domain is an outcome of an
experiment with random outcomes, i.e., a sample space. Such functions are called random vari-
ables.

2.1 DISTRIBUTIONS
The distribution of a random variable is a specification of the probability that the random variable
takes on any set of values. What is a random variable? It is just a function defined on the sample
space S of an experiment.

Definition 2.1 A random variable (rv) is a function X W S ! R such that E D fs 2 S j X.s/ 


ag 2 F ; the set of all possible events, for every real number a: In other words, we want every set
of the type fs 2 S j X.s/  ag to be an event.
As a function, a random variable has a range R.X / D fy 2 R j X.s/ D y; for some s 2 S g: If
R.X/ is a finite or countable set, we say X is discrete. If R.X / contains an interval, then we say
X is not discrete, but either continuous, or mixed.

Definition 2.2 If X is a discrete random variable with range R.X / D fx1 ; x2 ; : : : ; g, the prob-
ability mass function (pmf ) of X is p.xi / D P .X D xi /; i D 1; 2; : : : : We write1 fX D xi g for
the event fX D xi g D fs 2 S j X.s/ D xi g:

P
Remark 2.3 Any function p.xi / which satisfies (i) 0  p.xi /  1 for all i , and (ii) i p.xi / D
1 is called a pmf. The pmf of a rv is also called its distribution.

The next two particular discrete rvs are fundamental.

Distribution 2.4 A rv X which takes on only two values, a; b , with P .X D a/ D p; P .X D


b/ D 1 p; is said to have a Bernoulli.a; b; p/ distribution, or be a Bernoulli.a; b; p/ rv, and
we write X  Bernoulli.a; b; p/. In particular, if we have an experiment with two outcomes,
success or failure, we may set a D 1; b D 0 to represent these, and p is the probability of success.

1 In general, write fX  ag D fs 2 S jX.s/  ag and similarly for fX > ag:


20 2. RANDOM VARIABLES
An experiment like this is called a Bernoulli.p/ trial. The pmf of a Bernoulli.a; b; p/ rv is

p; if x D a;
P .X D x/ D p.x/ D
1 p; if x D b .
A rv X which counts the number of successes in an independent set of n Bernoulli trials is
called a Binomial.n; p/ rv, written X  Binom.n; p/. The range of X is R.X / D f0; 1; 2; : : : ; ng:
The pmf of X is
!
n x
P .X D x/ D p.x/ D p .1 p/n x
; x D 0; 1; 2 : : : ; n:
x

Remark 2.5 Here’s where this comes from. If we have a particular sequence of n Bernoulli
trials with x successes, say 10011101 : : : 1; then x 1’s must be in this sequence and n-x 0’s must
also be in there. By independence of the trials, the probability of any particular sequence of
x 1’s and n ! x 0’s is p x .1 p/n x : How many sequences with x 1’s out of n are there? That
n nŠ
number is D .
x xŠ.n x/Š
It should be clear that a Binomial.n; p/ rv X is a sum of (independent) n Bernoulli.p/
rvs, X D X1 C X2 C  C Xn : Independent rvs will be discussed later.

18
Example 2.6 A bet on red for a standard roulette wheel has 38 chances of winning. Sup-
pose a gambler will bet $5 on red each time for 100 plays. Let X be the total amount won
or lost as a result of these 100 plays. X will be a discrete random variable with range R.X/ D
f0; ˙5; ˙10; : : : ; ˙500g: In fact, if M denotes the number of games won (which is also a random
variable with values from 0 to 100), then our net amount won or lost is X D 10M 500: The
random variable M is an example of a Binomial.100; 18=38/ rv.
50 20 50 100
The chance you win exactly 50 games is P .M D 50/ D 18 38 38 50
D 0:0693; so the
chance you break even is P .X D 0/ is also 0:0693:

Now we define continuous random variables.


Definition 2.7 A random variable X is continuous
Z 1 if there is a function f W R ! R associated
with X such that f .x/  0; for all x , and f .x/ dx D 1: The function f is called a prob-
1
ability density function (pdf ) of X: It is important to note that a pdf does not have to satisfy
f .x/  1 in general.

Remark 2.8 Later it will turn out to be very useful to also use the notation for a pdf f .x/ D
P .X D x/ but we have to be careful with this because, as we will see, the probability a continuous
2.1. DISTRIBUTIONS 21

0.25

0.20

0.15

0.10

0.05

-4 -2 2 4 6 8 10

Figure 2.1: Normal Distribution with  D 3;  D 1:5.

rv is any particular value is 0. This notation is purely to simplify statements and is intuitive as
long as one keeps this in mind.

The next example defines the most important pdf in statistics.

Distribution 2.9 A rv X is said to have a Normal distribution with parameters .; /;  > 0
if
Z 1
1 1 2
. x   / dx:
f .x/ D p e 2
 2 1

We write X  N.; / where, in general  is to be read as is distributed as. If  D 0;  D


1; the rv Z  N.0; 1/ is said to be a standard normal rv.

Figure 2.1 is a graph of the normal pdf with  D 3;  D 1:5:

Remark 2.10 The line of symmetry of a N.;  / is always at x D ; and it provides the
point of maximum of the pdf. One can check this using the second derivative test and f 0 ./ D
0; f 00 ./ < 0: It is also a calculus exercise to check that x D  C  and x D   both provide
points of inflection (where concavity changes) of the pdf.

Remark 2.11 It is not an easy task to check that f .x/ really is a density function. It is obviously
always nonnegative but why is the integral equal to one? That fact uses the following formula
which is verified in calculus,
Z 1
2 p
e x dx D :
1

Using this formula and a simple change of variables, one can verify that f is indeed a pdf.
22 2. RANDOM VARIABLES
How do we use pdfs to compute probabilities? Let’s start with finding certain types of
probabilities.

Definition 2.12 The cumulative distribution function (cdf ) of a random variable X is FX .x/ D
P .X  x/: 8X
ˆ
ˆ p.xi /; if X is discrete with pmf p;
<
xi x
Z
FX .x/ D
ˆ x
:̂ f .y/ dy; if X is continuous with pdf f :
1

Every cdf has the properties

(a) limx! 1 FX .x/ D 0; and limx!C1 FX .x/ D 1.

(b) x < y H) FX .x/  FX .y/:(nondecreasing).

(c) limy!xC0 FX .y/ D FX .x/ for all x 2 R: This says a cdf is continuous at every point from
the right.

Using the cdf FX we have for a < b; P .a < X  b/ D FX .b/ FX .a/:


Rb
If X is continuous P .a < X  b/ D FX .b/ FX .a/ D a f .x/ dx:
Proof. P .a < X  b/ D P .fX  bg \ fX  agc / D P .X  b/ P .X  a/ D FX .b/
FX .a/: We have used P .B \ Ac / D P .B/ P .A \ B/; A D fX  ag  B D fX  bg: 
Rb
If X is continuous with density f .x/; P .a < X  b/ D a f .x/ dx represents the area
under the density curve between a and b .
If X is discrete, X can be a single point with positive probability, P .X D x/ > 0. If X is
continuous P .X D x/ D 0 for any x since
Z x
P .X D x/  P .x " < X  x/ D FX .x/ FX .x "/ D f .y/ dy ! 0 as " ! 0:
x "

Therefore, for a continuous rv, P .a < X  b/ D P .a  X  b/ D P .a  X < b/ are all the


same. For a discrete rv

P .a < X  b/ D P .a < X < b/ C P .X D b/:

We have to take endpoints into account.


Rx
Remark 2.13 If X is continuous with pdf f and cdf FX , we know that FX .x/ D 1 f .y/ dy:
Therefore, we can find the pdf if we know the cdf using the fundamental theorem of calculus,
FX0 .x/ D f .x/:
2.2. IMPORTANT DISCRETE DISTRIBUTIONS 23
1.0

0.8

0.6

0.4

0.2

1 2 3 4

Figure 2.2: CDF Discrete FX .x/.

Example 2.14 Suppose X is a random variable with values 1; 2; 3 with probabilities


1=6; 1=3; 1=2; respectively. In Figure 2.2 the jumps are at x D 1; 2; 3: The size of the jump
is P .X D x/; x D 1; 2; 3 and at each jump the left endpoint is not included while the right
endpoint is included because the cdf is continuous from the right. Then we may calculate
P .X < 2/ D P .X D 1/ D 1=6 but P .X  2/ D P .X D 2/ C P .X D 1/ D 1=2:

2.2 IMPORTANT DISCRETE DISTRIBUTIONS


We begin with the pmfs of some of the most important discrete rvs we will use in this book.
!
n x
Distribution 2.15 Binomial.n; p/. P .X D x/ D p .1 p/n x ; x D 0; 1; : : : ; n: If n D 1
x
this is Bernoulli.p/: A Binomial rv counts the number of successes in n independent Bernoulli
trials.2

1 x
Distribution 2.16 Discrete Uniform. P .X D x/ D ; x D 1; 2; : : : ; n:, FX .x/ D , 1  x 
n n
n. A discrete uniform rv picks one of n points at random.

x

Distribution 2.17 Poisson./. P .X D x/ D e ; x D 0; 1; 2 : : : : The parameter  > 0 is

given. A Poisson./ rv counts the number of events that occur at the rate :3

Distribution 2.18 Geometric.p/: P .X D x/ D .1 p/x 1 p; x D 1; 2; : : : : X is the number


of independent Bernoulli trials until we get the first success.4
2 A Binomial pdf can be calculated on a TI-83/4/5 using binompdf.n; p; k/ D P .X D k/ and binomcdf.n; p; k/ D
P .X  k/.
3 A Poisson pdf can be calculated using poissonpdf.; k/ D P .X D k/ and poissoncdf.; k/ D P .X  k/:
4 A Geometric pdf can be calculated using geometpdf.p; k/ D P .X D k/ and geometcdf.p; k/ D P .X  k/:
24 2. RANDOM VARIABLES
!
x 1 r
Distribution 2.19 NegBinomial.r; p/: P .X D x/ D p .1 p/x r , x D r , r C 1,
r 1
r C 2, : : : x represents the number of Bernoulli trials until we get r successes.

Distribution 2.20 Hypergeometric.N; n; k/. Consider a population of total size N consisting


of two types of objects, say Red and Black. Suppose there are k Red objects in the population and
N k Black objects. We choose n objects from the population at random without replacement.
X represents the number of Red objects we obtain in this process. We have
k
 
choose x out of k Reds  choose n x out of N k Black x
 Nn xk
P .X D x/ D D N
 :
choose n out of N n

Distribution 2.21 Multinomial .n; p1 ; : : : ; pk /. Suppose there is an experiment with n inde-


pendent trials with k possible outcomes on each trial, labeled A1 ; A2 ; : : : ; Ak ; with the proba-
bility of outcome i given by pi D P .Ai /. Let Xi be a count of the number of occurrences of Ai .
Then !
n x
P .X1 D x1 ; X2 D x2 ; : : : ; Xk D xk / D p x1 p x2    pk k ;
x1 ; x2 ; : : : ; xk 1 2
where x1 C x2 C    C xk D n; p1 C p2 C    pk D 1: The multinomial coefficient is given by
!
n nŠ
D :
x1 ; x2 ; : : : ; xk x1 Šx2 Š    xk Š

It comes from choosing x1 out of n, then x2 out of n x1 , then x3 out of n x1 x2 ; etc.,


! ! ! !
n n x1 n x1 x2    xk 1 n
    D :
x1 x2 xk x1 ; x2 ; : : : ; xk

This generalizes the binomial distribution to the case when there is more than just a success or
failure on each trial.

The cdf FX .x/ of each of these can be written


! down once we have the pmf. For example,
Xx
n k
for a Binom.n; p/ rv, we have FX .x/ D p .1 p/n k D binomcdf.n; p; x/.
k
kD0

Example 2.22 For the Negative Binomial, X is the number of trials until we get r successes.
We must have at least r trials to get r successes and we get r successes with probability p r and
x r failures with probability .1 p/x r : Since we stop counting when we get the r th success,
2.2. IMPORTANT DISCRETE DISTRIBUTIONS 25
the last trial must be
 a success. Therefore, in the preceding x x1 trials
 r we spread r 1 successes
x 1 1 x r
and there are r 1 ways to do that. That’s why P .X D x/ D r 1 p .1 p/ ; x  r: Here is
an example where the Negative Binomial arises.
Best of seven series. The baseball and NBA finals determines a winner by the two teams
playing up to seven games with the first team to win four games the champ. Suppose team A
wins with probability p each game and loses to team B with probability 1 p:
(a) If p D 0:52; what is the probability A wins the series? For A to win the series, A
can win 4 straight, or in 5 games, or in 6 games, or in 7 games. This is negative binomial with
r D 4; 5; 6; 7 so if X is the number of games to 4 successes for A,

P .A wins series/ D P .X D 4/ C P .X D 5/ C P .X D 6/ C P .X D 7/
! ! ! !
3 4 0 4 4 1 5 4 2 6
D :52 .:48/ C :52 .:48/ C :52 .:48/ C :524 .:48/3
3 3 3 3
D 0:54368:

If p D 0:55; the probability A wins the series goes up to 0:60828 and if p D 0:6 the probability
A wins is 0:7102:
(b) If p D 0:52 and A wins the first game, what is the probability A wins the series?
This is asking for P .A wins seriesjA wins game 1/: Let X1 be the number of games (out
of the remaining 6) until A wins 3. Then
P .A wins 7 game series \ A wins game 1/
P .A wins 7 game seriesjA wins game 1/ D
P .A wins game 1/
P .A wins 6 game series \ A wins game 1/
D
P .A wins game 1/
pP .X1 D 3/ C pP .X1 D 4/ C pP .X1 D 5/ C pP .X1 D 6/
D
p
D P .X1 D 3/ C P .X1 D 4/ C P .X1 D 5/ C P .X1 D 6/
! ! ! !
2 3 4 5
D :523 .:48/0 C :523 .:48/1 C :523 .:48/2 C :523 .:48/3 D 0:6929:
2 2 2 2

An easy way to get this without conditional probability is to realize that once game one is over,
A has to be the first to 3 wins in at most 6 trials.

Example 2.23 Multinomial distributions arise whenever one of two or more outcomes can
occur. Here’s a polling example. Suppose 25 registered voters are chosen at random from a
population in which we know that 55% are Democrats, 40% are Republicans, and 5% are Inde-
pendents. In our sample of 25, what are the chances we get 10 Democrats, 10 Republicans, and
5 Independents?
26 2. RANDOM VARIABLES
This is multinomial with p1 D :55; p2 D :4; p3 D 0:05: Then
!
25
P .D D 10; R D 10; I D 5/ D :5510 :410 :055 D 0:000814
10; 10; 5

which is really small because we are asking for exactly the numbers .10; 10; 5/: It is much more
tedious to calculate but we can also find things like

P .D  15; R  12; I  20/ D 0:6038:

Notice that in the cumulative distribution we don’t require that 15 C 12 C 20 be the number of
trials.
(
60x 2 .1 x/3 ; 0  x  1;
Example 2.24 Consider a random variable X with pdf fX .x/ D
0; otherwise.
Suppose  20independent samples are  drawn from X . An outcome is the sample value falling into
range 0; 15 when i D 1 or i 51 ; 5i , i D 2; 3; 4; 5. What is the probability that 3 observations
fall into the first range, 9 fall into the second range, 4 fall into the third and fourth ranges, and
that there are no observations that fall into the fifth range? To answer this question, let pi
denote the probability of a sample value falling into range i . These probabilities are computed
directly from the pdf. For example,
Z 0:2
p1 D 60x 2 .1 x/3 dx D 0:098:
0

Complete results are displayed in the table.

Range Œ0; 0:2 .0:2; 0:4 .0:4; 0:6 .0:6; 0:8 .0:8; 1:0
Probability 0:098 0:356 0:365 0:162 0:019

If Xi is the number of samples that fall into range i , we have

P .X1 D 3; X2 D 9; X3 D 4; X4 D 4; X5 D 0/
!
20
D .0:098/3 .0:356/9 .0:365/4 .0:162 /4 .0:019 /0 D 0:00205:
3; 9; 4; 4; 0

Example 2.25 Suppose we have 10 patients, 7 of whom have a genetic marker for lung cancer
and 3 of whom do not. We will choose 6 at random (without replacing them as we make our
selection). What are the chances we get exactly 4 patients with the marker and 2 without?
2.2. IMPORTANT DISCRETE DISTRIBUTIONS 27

0.15

0.10

0.05

15 20 25 30

Figure 2.3: P .X D k/: Hypergeometric looks like normal if trials large enough.

This is Hypergeometric(10,6,7). View this as drawing 6 people at random from a group of


10 without replacement, with the probability of success (=genetic marker) changing from draw
to draw. The trials are not Bernoulli. We have with X Dnumber with genetic marker,

! !
7 3
4 2 1
P .X D 4/ D ! D :
10 2
6

If we incorrectly assumed this was X  Binom.6; 0:7/ we would get P .X D 4/ D


binompdf.6; 0:7; 4/ D 0:324:
As a further example, suppose we have a group of 100 with 40 patients possessing the
genetic marker (=success). We draw 50 patients at random without replacement and ask for the
P .X D k/; k D 10; 11; : : : ; 50: Figure 2.3 shows the hypergeometric distribution.
The fact that the figure for the hypergeometric distribution looks like a normal curve is
not a coincidence, as we will see, when the population is large.
28 2. RANDOM VARIABLES
2.3 IMPORTANT CONTINUOUS DISTRIBUTIONS
In this section we will describe the main continuous random variables used in probability and
statistics.
Distribution 2.26 Uniform.a; b/: X  Unif.a; b/ models choosing a random number from a
to b: The pdf is
8 8
< 1 < 0;
ˆ if x < a;
; if a < x < b ; x a
f .x/ D b a and the cdf is FX .x/ D ; if a  x < b ;
: 0; otherwise. :̂ b a
1; if x  b .

Next is the normal distribution which we have already discussed but we record it here
again for convenience.
Distribution 2.27 Normal.; /. X  N.;  / has density
1 1 2
. x  / ;
f .x/ D p e 2 1 < x < 1:
 2
It is not possible to get an explicit expression for the cdf so we simply write
Z x
1 1 y  2
FX .x/ D N.xI ; / D p e 2 .  / dy:
 2 1
We shall also write P .a < X < b/ D normalcdf.a; b; ;  /: 5
(
x
e ; x  0;
Distribution 2.28 Exponential./: X  Exp./;  > 0; has pdf f .x/ D
0; x < 0:
The cdf is Z 
x x
y 1 e ; if x  0;
FX .x/ D e dy D
0 0; if x < 0.
An exponential random variable represents processes that do not remember. For example, if X
represents the time between arrivals of customers to a store, a reasonable model is Exponential./
where  represents the average rate at which customers arrive.

At the end of this chapter we will introduce the remaining important distributions for
statistics including the 2 ; t; and F -distributions. They are built on combinations of rvs. Now
we look at an important transformation of a rv in the next example.
Example 2.29 Change of scale and shift. If we have a random variable X which has pdf f and
cdf FX we may calculate the pdf and cdf of the random variable Y D ˛X C ˇ; where ˛ ¤ 0; ˇ
5 normalcdf.a; b; ; / is a command from a TI-8x calculator which gives the area under the normal density with pa-
rameters ; ; from a to b:
2.3. IMPORTANT CONTINUOUS DISTRIBUTIONS 29
are constants. To do so, start with the cdf:
8   8  
ˆ y ˇ ˆ y ˇ
ˆ
< P X  ; ˛ > 0; ˆ F
< X ; ˛ > 0;
 ˛  ˛
 
FY .y/ D P .Y  y/ D D
ˆ y ˇ ˆ y ˇ
:̂P X  ; ˛ < 0: :̂1 FX ; ˛ < 0:
˛ ˛

Then we find the pdf by taking the derivative:


8  
ˆ y ˇ 1
ˆ
<fX ; ˛ > 0;
d
fY .y/ D FY .y/ D ˛ ˛
dy ˆ y ˇ 1
:̂ fX ; ˛ < 0:
˛ ˛

In particular, if X  N.; /; we have the pdf for Y D ˛X C ˇ; (assuming ˛ > 0),
 2
1 1 y ˛ ˇ
fY .y/ D p e 2 ˛

˛ 2
which we recognize as the pdf of a N.˛ C ˇ; ˛ / random variable. Thus, Y D ˛X C ˇ 
N.˛ C ˇ; ˛/:
If we take ˛ D 1 ; ˇ D  ; Y D 1 X  ; then Y  N.0; 1/: We have shown that given
any X  N.;  /; if we set

X 
Y D H) Y  N.0; 1/:


The rv X  N.; / has been converted to the standard normal rv Y  N.0; 1/: Starting with
X  N.;  / and converting it to Y  N.0; 1/ is called standardizing X .

The reason that a normal distribution is so important is contained in the following special
case of the Central Limit Theorem.

Theorem 2.30 Central Limit Theorem for Binomial. Let Sn  Binom.n; p/: Then
! Z x
Sn n p 1 2
lim P p x D p e z =2 dz D normcdf. 1; x; 0; 1/; for all x 2 R:
n!1 np.1 p/ 2 1

Sn n p p
In short, p  N.0; 1/ for large n: Alternatively, Sn  N.n p; np.1 p/ /.
np.1 p/

This says the number


p of successes in n Bernoulli trials is approximately Normal with parameters
 D np and  D np.1 p/:
30 2. RANDOM VARIABLES
Remark 2.31
(1) By convention we may apply the theorem if both n p  5 and n .1 p/  5: This choice
for n is related to p and these two conditions would exclude small sample sizes and extreme
values of p , i.e., p  0 or p  1:
(2) Since Sn is integer valued and normal random variables are not, we may use the continuity
correction to get a better approximation. What that means is that for any integer x one
should calculate P .Sn  x/ using P .Sn  x C 0:5/ and for P .Sn < x/ use P .Sn  x
0:5/: That is,
p
P .Sn  x/  normcdf. 1; x C 0:5; np; np.1 p//
p
P .Sn < x/  normcdf. 1; x 0:5; np; np.1 p//;
p
P .Sn  x/  normcdf.x 0:5; 1; np; np.1 p//:
We maypapproximate P .Sn D x/  P .x 0:5  Sn  x C 0:5/  normcdf.x 0:5; x C
0:5; np; np.1 p//.

Example 2.32 Suppose you are going to play roulette 25 times, betting on red each time. What
is the probability you win at least 14 games?
Remember that the probability of winning is 18=38 D p: Let X be the number of games
won. Then X  Binom.25; 18=38/ and what we want to find is P .X  14/:
We may calculate this in two ways. First, using the binomial distribution
25
!
X 25
P .X  14/ D .18=38/x .20=38/25 x D 1 binomcdf.25; 18=38; 13/ D 0:2531:
xD14
x

This is the exact answer. Second, we may use the Central Limit Theorem which says
S25 25 .18=38/
ZDp  N.0; 1/:
25 .18=38/ .20=38/
Consequently,
 
P .S25  14/ D P p S25 25 .18=38/  p 14 25 .18=38/  P .Z  0:8644/ D 0:1937:
25 .18=38/ .20=38/ 25.18=38/ .20=38/

This is not a great approximation. We can make it better by using the continuity correction. We
have
p
P .S25  14/  normcdf.13:5; 1; 25 .18=38/; 25 .18=38/ .20=38// D 0:2533;
which is considerably better.

Figure 2.4 shows why the continuity correction gives a better estimate for a binomial.
2.4. EXPECTATION, VARIANCE, MEDIANS, PERCENTILES 31

0.12

0.10

0.08

0.06

0.04

0.02

12 14 16 18 20 22 24

Figure 2.4: Normal approximation to binomial.

2.4 EXPECTATION, VARIANCE, MEDIANS,


PERCENTILES
For any given random variable we are interested in basic properties like its mean value, median,
the spread around the mean, etc. These are measures of central location of the rv and the spreads
around these locations. These concepts are discussed here.

Definition 2.33 The expected value of a random variable X is


8 X
ˆ
< xP .X D x/; if X is discrete;
EŒX  D Z 1
x
:̂ xf .x/ dx; if X is continuous with pdf f .
1

If g W R ! R is a given function, the expected value of the rv g.X/ is6


8 X
ˆ
< g.x/P .X D x/; if X is discrete;
EŒg.X / D Zx
1
:̂ g.x/f .x/ dx; if X is continuous with pdf f .
1

R With this definition, you can see why it is frequently useful to write E.g.X// D
g.x/P .X D x/ dx even when X is a continuous rv. This abuses notation a lot and you have
to keep in mind that f .x/ ¤ P .X D x/; which is zero when X is continuous.
From calculus
R we know that if we have a one-dimensional object with density f .x/ at
each point, then xf .x/ dx D E.X / gives the center of gravity of the object. If X is discrete,

6 We frequently write E Œg.X / D E.g.X // D Eg.X / and drop the braces or parentheses.
32 2. RANDOM VARIABLES
the expected value is an average of the values of X; weighted by the probability it takes on each
value. For example, if X has values 1, 2, 3 with probabilities 1=8; 3=8; 1=2; respectively, then

EŒX D 1  1=8 C 2  3=8 C 3  1=2 D 19=8:

On the other hand, the straight average of the 3 numbers is 2. The straight average corresponds
to each value with equal probability.
Now we have a definition of the expected value of any function of X: In particular,
Z 1
EŒX 2  D x 2 f .x/ dx:
1

We need this if we want to see how the random variable spreads its values around the mean.
Definition 2.34 The variance of a rv X is Var.X / D E.X .EŒX //2 : Written out, the first
step is to find the constant  D EŒX  and then
8 X
ˆ
< .x /2 P .X D x/; if X is discrete;
Var.X / D Zx 1
:̂ .x /2 f .x/ dx; if X is continuous.
1
p
The standard deviation, abbreviated SD, of X; SD.X/ D Var.X/:

Another measure of the spread of a distribution is the median and the percentiles. Here’s
the definition.
Definition 2.35 The median m D med.X / of a random variable X is defined to be the real
number such that P .X  m/ D P .X  m/ D 21 : The median is also known as the 50th per-
centile.
Given a real number 0 < q < 1; the 100q th percentile of X is the number xq such that
P .X  xq / D q:
The interquartile range of a rv is IQR D Q3 Q1 ; i.e., the 75th percentile minus the
25th percentile. Q1 is the first quartile, the 25th percentile, and Q3 is the third quartile, the
75th percentile. The median is also known as Q2 ; the second quartile.

In other words, 100q% of the values of X are below xq : Percentiles apply to any random
variable and give an idea of the shape of the density. Note that percentiles do not have to be
unique, i.e., there may be several xq ’s resulting in the same q:
Z 1
1 2
Example 2.36 If Z  N.0; 1/ we may calculate EŒX  D x p e x =2 dx using substi-
1 2 Z
1
2 1 2
tution (z D x 2 =2) and obtain EŒZ D 0: Then we calculate EŒX  D x 2 p e x =2 dx
1 2
2.4. EXPECTATION, VARIANCE, MEDIANS, PERCENTILES 33
2 2 2
using integration by parts. We get EŒZ  D 1 and then VarŒZ D EŒZ  .EŒZ/ D 1: The
parameters  D 0 and  D 1 represent the mean and SD of Z . In general, if X  N.; / we
write X D Z C  with Z  N.0; 1/; and we see that EŒX  D   0 C  D  and VarŒX D
 2 VarŒZ D  2 so that SD.X/ D :

Example 2.37 Suppose we know that LSAT scores follow a normal distribution with mean
155 and SD 13. You take the test and score 162. What percent of people taking the test did
worse than you?
This is asking for P .X  162/ knowing X  N.155; 13/: That’s easy since P .X  162/ D
normalcdf. 1; 162; 155; 13/ D 0:704: In other words, 162 is the 70.4 percentile of the scores.
Suppose instead someone told you that her score was in the 82nd percentile and you want
to know her actual score. To find that, we are looking to solve P .X  x:82 / D 0:82: To find this
using technology7 we have x:82 D invNorm.:82; 155; 13/ D 166:89; so she scored about 167.

Now here’s a proposition which says that the mean is the best estimate of a rv X in the
mean square sense, and the median is the best estimate in the mean absolute deviation sense.

Proposition 2.38
(1) We have the alternate formula Var.X/ D EŒX 2  .EŒX /2 :

(2) The mean of X; EŒX D  is the unique constant a which minimizes EŒX a2 : Then
mina EŒX a2 D EŒX 2 D Var.X/:

(3) A median med.X/ is a constant which provides a minimum for EjX aj: In other words,
mina EjX aj D EjX med.X/j.

The second statement says that the variance is the minimum of the mean squared distance
of the rv X to its mean. The third statement says that a median (which may not be unique)
satisfies a similar property for the absolute value of the distance.
Proof. (1) Set  D EŒX 
 
Var.X / D E.X EŒX /2 D E X 2 2X C 2
   
D E X2 2EŒX  C 2 D E X 2 2 :

(2) We will Rassume X is a continuous rv with pdf f: Then, with


1
G.a/ D 1 .x a/2 f .x/ dx;
Z 1 Z 1
0 d 2
G .a/ D .x a/ f .x/ dx D 2.x a/f .x/ dx D 0
da 1 1
7 On a TI-84 the command is invNorm(q,mean,SD).
34 2. RANDOM VARIABLES
R R
implies xf .x/ dx D af .x/ dx D a: This R assumes we can interchange the derivative
00
and the integral. Furthermore, G .a/ D 2 f .x/ dx D 2 > 0: Consequently, a D EŒX 
provides a minimum for G: It is unique since G is strictly concave up.
(3) This is a little trickier since we can’t take derivatives at first. We get rid of absolute value
signs first.
Z 1 Z a Z 1
EjX aj D jx ajf .x/ dx D .x a/f .x/ dx C .x a/f .x/ dx  H.a/:
1 1 a

Now we take derivatives


Z a Z 1 Z a Z 1
0
H .a/ D f .x/ dx f .x/ dx D 0 H) f .x/ dx D f .x/ dx  ˛:
1 a 1 a
R Ra R1 Ra
Since f .x/ dx D 1 D 1 C a ; H) 1 f .x/ dx D 1 ˛: But then 1 ˛D
1
˛ H) ˛ D 2 : We conclude that
Z a Z 1
1
P .X  a/ D f .x/ dx D f .x/ dx D P .X  a/ D ;
1 a 2
and this says that a is a median of X: Furthermore, H 00 .a/ D 2f .a/  0; so a D med.X/
does provide a minimum (but note that H is not necessarily strictly concave up).

2.4.1 MOMENT-GENERATING FUNCTIONS


In the beginning
R of this section we defined the expected value of a function g of a rv X as
Eg.X / D g.x/f .x/ dx; where f is the pdf of X: We now consider a special and very useful
function of X: This will give us a method of calculating means and variances usually in a much
simpler way than doing it directly.
Definition 2.39 The moment-generating function (mgf ) of a rv X is M.t/ D EŒe tX : Explic-
itly, we define 8 Z 1
ˆ
ˆ e tx f .x/ dx; if X is continuous;
ˆ
< 1
M.t/ D X
ˆ
ˆ
:̂ e tx P .X D x/; if X is discrete.
x
We assume the integral or sum exists for all t 2 . ı; ı/ for some ı > 0:

One reason the mgf is so useful is the following theorem. It says that if we know the mgf,
we can find moments, i.e., E.X n /; n D 1; 2; : : : ; by taking derivatives.
dn
Theorem 2.40 If X has the mgf M.t /; then EŒX n  D dt n
M.t/j t D0 :
2.4. EXPECTATION, VARIANCE, MEDIANS, PERCENTILES 35
Proof. The proof is easy if we assume that we can switch integral and derivatives.
Z 1 n Z 1
dn d tx
M.t / D e f .x/ dx D x n e tx f .x/ dx:
dt n 1 dt n
1
R 1 n tx R1
Plug in t D 0 in the last integral to see 1 x e j t D0 f .x/ dx D 1 x n f .x/ dx D EX n : 

Example 2.41 Let’s use the mgf to find the mean and variance of X  Binom.n; p/:
n n
!
X X n
M.t / D e tx P .X D x/ D e tx p x .1 p/n x
xD0 xD0
x
n
!
X n
D .pe t /x .1 p/n x D .pe t C .1 p//n :
xD0
x

We used the Binomial Theorem from algebra8 in the last line. Now that we know the mgf we
can find any moment by taking derivatives. Here are the first two:

M 0 .t/ D npet .pe t C .1 p//n 1


H) M 0 .0/ D EX D np;

and
n 2 n 1
M 00 .t/ D n.n 1/p 2 e 2t pe t C .1 p/ C npet pe t C .1 p/
H) EX 2 D M 00 .0/ D n.n 1/p 2 C np:

The variance is then Var.X/ D EX 2 .EX/2 D n.n 1/p 2 C np n2 p 2 D np.1 p/:

2.4.2 MEAN AND VARIANCE OF SOME IMPORTANT DISTRIBUTIONS


Now we use the mgf to calculate the mean and variances of some of the important continuous
distributions.
(a) X  UnifŒa; b ; f .x/ D b 1 a ; a < x < b: The mgf is

Z ˇb
b
tx 1 1 1 tx ˇˇ et b e ta
M.t / D e dx D e ˇ D :
a b a b at ˇ t.b a/
a

Then
e at .at 1/ C e bt .1 bt / aCb
M 0 .t/ D and lim M 0 .t/ D :
.a b/t 2 t !0 2
n
!
X n k n
8 .a C b/n D a b k
:
k
kD0
36 2. RANDOM VARIABLES
aCb
We conclude EX D 2
: While we could find M 00 .0/ D EX 2 ; it is actually easier to find this
directly.
Z b
2 1 b 3 a3 .b a/2
EX D x2 dx D H) Var.X / D EX 2 .EX/2 D :
a b a 3.b a/ 12
x
(b) X  Exp./ ; f .x/ D e ; x > 0: We get
Z 1
x 
M.t / D EŒe tX  D e tx e dx D ; if t < :
0  t
 1 2 2
M 0 .t / D ; M 0 .0/ D D EŒX ; and M 00 .t / D ; M 00 Œ0 D 2 D EŒX 2 
. t/2  . t /3 
2 1 1
Var.X / D EX 2 .EX/2 D D 2:
2 2 
x2
(c) X  N.0; 1/ ; f .x/ D p12 e 2 ; 1 < x < 1: The mgf for the standard normal
distribution is
Z 1 Z 1
tx 1 x2 1 x2
M.t / D e p e 2 dx D p e tx 2 dx
1 2 2 1
Z 1 Z 1
1 t 2 =2 1
.x t/2 t 2 =2 1 1 2 2
Dp e e 2 dx D e p e 2 .x t/ dx D e t =2
2 1 2 1
R1 1 2
since p12 1 e 2 .x t/ dx D 1: Now we can find the moments fairly simply.

t2 t2 t2
M 0 .t / D e 2 t H) M 0 .0/ D EX D 0 and M 00 .t / D e 2 t2 C e 2 H) M 00 .0/ D EX 2 D 1
Var.X/ D EX 2 .EX/2 D 1:
X 
(d) X  N.;  / : All we have to do is convert X to standard normal. Let Z D 
:
2 =2
We know Z  N.0; 1/ and we may use the previous part to write MZ .t/ D e t . How do we
get the mgf for X from that? Well, we know X D Z C  and so
1 2 1 2t 2
MX .t/ D Ee tX D Ee .ZC/t D e t Ee .t  /Z D e t e 2 . t / D e t C 2  :
2t2 
Then M 0 .t/ D e 2 Ct  C  2 t so that M 0 .0/ D EX D : Next
2t2
 2 
M 00 .t/ D e 2 Ct  2 C  C  2t H) M 00 .0/ D  2 C 2 :

This gives us Var.X/ D EX 2 .EX/2 D . 2 C 2 / 2 D  2 :


2.4. EXPECTATION, VARIANCE, MEDIANS, PERCENTILES 37
Later we will need the following results. The first part says that if two rvs have the same
mgf, then they have the same distribution. The second part says that if the mgfs of a sequence
of rvs converges to an mgf, then the cdfs must also converge to the cdf of the limit rv.

Theorem 2.42

(1) If X and Y are two rvs such that MX .t/ D MY .t / (for all t close to 0), then X and Y have
the same cdfs.

(2) If Xk ; k D 1; 2; : : : ; is a sequence of rvs with mgf Mk .t /; k D 1; 2; : : : ; and cdf Fk .x/; k D


1; 2; : : : ; respectively, and if limk!1 Mk .t/ D MX .t / and MX .t/ is an mgf then there is a
unique cdf FX and limk!1 Fk .x/ D FX .x/ at each x a point of continuity of FX :

x
Example 2.43 An rv X has density f .x/ D p12 p1x e 2 ; x > 0 and f .x/ D 0 if x < 0: First,
we find the mgf of X:
Z 1
1 1 x
MX .t / D p e tx p e 2 dx
2 0 x
Z 1
2 2 1 p
Dp e u .t 2 / du setting u D x
2 0
Z 1 p p
2 1 2 1
Dp p e z =2 dz setting z D u 2 1 2t ; t <
2 1 2t 0 2
Z 1
1 1 2 2
Dp ; for t < ; since p e z =2 dz D 1:
1 2t 2 2 0

Then,
EX D MX0 .0/ D 1 and EX 2 D MX00 .0/ D 3 H) Var.X/ D 3 12 D 2:

Where does this density come from? To answer this, let Z  N.0; 1/ and let’s find the pdf of
Y D Z2:
Z py
2 p p 1 2
FY .y/ D P .Z  y/ D P . y  Z  y/ D p p
e x =2 dx
2 y
 
1 1 1
f .y/ D FY0 .y/ D p e y=2 p e y=2 p
2 2 y 2 y
 
1 2
Dp e y=2 p ; y > 0:
2 2 y
38 2. RANDOM VARIABLES
If y  0; f .y/ D 0: This shows that the density we started with is the density for Z 2 : Now we
calculate the mgf for Z 2 :
Z 1
tZ 2 1 2 2
MZ 2 .t / D EŒe  D p e t z e z =2 dz; since Z  N.0; 1/
2 1
Z 1
1 2 1
Dp e z .t 2 / dz D MX .t/
2 1
since if we compare the last integral with the second integral in the computation of MX we see
that they are the same. This means that MX .t/ D MZ 2 .t / and part (1) of the Theorem 2.42 says
that X and Z 2 must have the same distribution.

x
Definition 2.44 The rv X D Z 2 with density given by f .x/ D p12 p1x e 2 ; x > 0 and f .x/ D
0 if x < 0; is called 2 with 1 degree of freedom, written as X  2 .1/.

Remark 2.45 We record here the mean and variance of some important discrete distributions.
(a) X  Binom.n; p/; EX D np; Var.X/ D np.1 p/:
1 1 p pe t
(b) X  Geom.p/, EX D ; Var.X / D ; MX .t/ D :
p p2 1 .1 p/e t
N n
(c) X  HyperGeom.N; r; n/; EX D np; p D r=N; Var.X/ D np.1 p/ :
N 1
 r
1 p 1 p pe t
(d) X  NegBinom.r; p/; EX D r , Var.X / D r : MX .t/ D :
p p2 1 .1 p/e t
t
(e) X  Poisson./; EX D ; Var.X/ D ; MX .t / D e .e 1/
:

2.5 JOINT DISTRIBUTIONS


In probability and statistics we are often confronted with a problem involving more than one
random variable which may or may not depend on each other. We have to study jointly dis-
tributed random variables if we want to calculate things like P .X C Y  w/:

Definition 2.46

(1) If X and Y are two random variables, the joint cdf is FX;Y .x; y/ D P .fX  xg \ fY  yg/:
In general, we write this as FX;Y .x; y/ D P .X  x; Y  y/:
(2) If X and Y are discrete, the pmf of .X; Y / is p.x; y/ D P .X D x; Y D y/:
2.5. JOINT DISTRIBUTIONS 39
R1 R1
(3) A joint density function is a function fX;Y .x; y/  0 with 1 1 fX;Y .x; y/ dx dy D 1:
The pair of rvs .X; Y / is continuous if there is a joint density function and then
Z x Z y
FX;Y .x; y/ D fX;Y .u; v/ du dv:
1 1

@2 FX;Y .x;y/
(4) If we know FX;Y .x; y/ then the joint density is fX;Y .x; y/ D @x@y
:

Knowing the joint distribution of X and Y means we have full knowledge of X and Y
individually. For example, if we know FX;Y .x; y/, then

FX .x/ D FX;Y .x; 1/ D lim FX;Y .x; y/; FY .y/ D FX;Y .1; y/:
y!1

The resulting FX and FY are called the marginal cumulative distribution functions. The
marginal densities when there is a joint density are given by
Z 1 Z 1
fX .x/ D fX;Y .x; y/ dy and fY .y/ D fX;Y .x; y/ dx:
1 1

(
8xy; 0  x < y  1;
Example 2.47 The function f .x; y/ D is given. First we verify it is a
0 otherwise:
joint density. Since f  0 all we need to check is that the double integral is one.
Z 1 Z 1 Z 1 Z y Z 1  
1 2
f .x; y/ dx dy D 8xy dx dy D 8y y dy D 1:
1 1 0 0 0 2

To find the marginal densities we have


8 Z 1
Z ˆ
<
1 8xy dy D 4x.1 x 2 /; if 0  x  1
fX .x/ D f .x; y/ dy D x
1 :̂ 0; otherwise.
8 Z y
Z ˆ
<
1 8xy dx D 4y 3 ; if 0  y  1
fY .y/ D f .x; y/ dx D 0
1 :̂ 0; otherwise.

If X and Y are discrete rvs, the joint pmf is p.x; y/ D P .X D x; Y D y/: The marginals
P P
are then given by pX .x/ D P .X D x/ D y p.x; y/ and pY .y/ D P .Y D y/ D x p.x; y/:
40 2. RANDOM VARIABLES
In general, to find the probability that for any set C  R  R; the pair .X; Y / 2 C has
probability defined by
8“
ˆ
ˆ
ˆ
< fX;Y .x; y/ dxdy; if X; Y are continuous;
P ..X; Y / 2 C / D C (2.1)
ˆ P P p .x; y/;
ˆ
if X; Y are discrete:
:̂ X;Y
.x;y/2C

We also have expected values of functions of rvs.


Definition 2.48 If .X; Y / have joint density fX;Y .x; y/; the expected value of a function of
the rvs is
8Z 1 Z 1
ˆ
ˆ g.x; y/ fX;Y .x; y/ dx dy; if X; Y are continuous,
<
1 1
EŒg.X; Y / D X
ˆ g.x; y/P .X D x; Y D y/; if X; Y are discrete.

x;y

Example 2.49 We calculate E.X C Y / assuming we have the joint density of .X; Y / given by
fX;Y .x; y/: By definition

E.X C Y / D .x C y/fX;Y .x; y/ dxdy
“ Z Z
D xfX;Y .x; y/ dxdy C yfX;Y .x; y/ dxdy
Z Z  Z Z 
D x fX;Y .x; y/ dy dx C y yfX;Y .x; y/ dx dy
Z Z
D xfX .x/ dx C yfY .y/ dy D E.X/ C E.Y /:

Notice that the first E uses the joint density fX;Y while the second and third E ’s use fX and fY ;
respectively.

Example 2.50 Suppose .X; Y / have joint density f .x; y/ D 1; 0  x; y  1; and f .x; y/ D 0
otherwise. This models picking a random point .x; y/ in the unit square. If we want to calculate
P .X < Y /, this uses the density.
“ Z 1Z y ˇ1
y 2 ˇˇ 1
P .X < Y / D f .x; y/ dx dy D 1 dx dy D ˇ D :
0 0 2 ˇ 2
0x<y1 0
2.5. JOINT DISTRIBUTIONS 41
Similarly, we may calculate
  “
1 
P X2 C Y 2  D 1 dx dy D area of semicicle in square D :
4 16
1
0x 2 Cy 2  4

Also, Z Z
 1 1 
2 2 2
E X CY D x 2 C y 2  f .x; y/ dx dy D :
0 0 3

Remark 2.51 In general, if we are given a set D  R2 the density


8
< 1
; .x; y/ 2 D;
f .x; y/ D area of D
:0; otherwise;
is called a uniform density on D and the rvs .X; Y /  Unif.D/.

(
8xy; 0  x < y  1;
Example 2.52 For the rvs .X; Y / with joint density f .x; y/ D we’ll
0 otherwise;
find E.X C Y / and E.XY/:
Z 1Z y Z 1 Z 1
4 4
E.X C Y / D 8xy.x C y/ dx dy D and E.XY/ D 8xy.xy/ dy dx D :
0 0 3 0 x 9
Note that
Z 1 Z 1
 8 4
E.X / D 4x 1 x 2 x dx D and E.Y / D 4y 3 y dy D ;
0 15 0 5
so that E.X C Y / D E.X/ C E.Y / but E.XY/ ¤ E.X /  E.Y /:

You see that E.XY/ ¤ E.X /  E.Y / in general, but there are important cases when this
is true. For that we need the notion of independent random variables.

2.5.1 INDEPENDENT RANDOM VARIABLES


Independence of random variables, which has the intuitive meaning that one random variable
doesn’t affect the other, is a central idea in probability.

Definition 2.53 Random variables X; Y are independent if

P .X  x; Y  y/ D P .X  x/ P .Y  y/ ; 8 x 2 R; y 2 R:
42 2. RANDOM VARIABLES

If .X; Y / has a joint density fX;Y ; X has density fX ; and Y has density fY ; then indepen-
dence means that the joint density factors into the individual densities:

fX;Y .x; y/ D fX .x/fY .y/:

One of the main consequences of independence is the following fact. It says the expected
value of a product of rvs is the product of the expected value of each rv.

Proposition 2.54 If X; Y are independent

EŒXY D EŒX   EŒY :

In fact, for any functions g; h; we have EŒg.X /h.Y / D EŒg.X/  EŒh.Y /:

Proof. By definition,
Z 1 Z 1 Z 1 Z 1
EŒXY D xy fX;Y .x; y/ dx dy D xy fX .x/fY .y/ dx dy
1 1 1 1
Z 1 Z 1
D x fX .x/ dx  y fY .y/ dy D EŒX   EŒY :
1 1

The proof of the second statement is almost identical. 

Independence also allows us to find an explicit expression for the joint cumulative distri-
bution of the sum of two random variables.

Proposition 2.55 If X; Y are independent continuous rvs then


Z 1
FX CY .w/ D P .X C Y  w/ D P .X  w y/fY .y/ dy:
1

This is really another application of the Law of Total Probability. To see this
Z
P .X C Y  w/ D P .X C Y  w; Y D y/ dy
Z Z
D P .X  w y/P .Y D y/ dy D FX .w y/fY .y/ dy:
2.5. JOINT DISTRIBUTIONS 43
The first equality uses the Law of Total Probability and the second equality uses the indepen-
dence.

Example 2.56 Suppose X and Y are independent Exp./ rvs. Then, for w  0;
Z 1 Z w 
P .X C Y  w/ D FX .w y/fY .y/ dy D 1 e .w y/ e y dy
0 0
w
D1 .w C 1/e D FXCY .w/:

If w < 0; FXCY .w/ D 0: To find the density we take the derivative with respect to w to get

fX CY .w/ D 2 w e w
; w  0:

It turns out that this is the pdf of a so-called Gamma .; 2/ rv.

2.5.2 COVARIANCE AND CORRELATION


A very important quantity measuring the linear relationship between two rvs is the following.

Definition 2.57 Given two rvs X; Y; the covariance of X; Y is defined by

Cov.X; Y / D EŒXY EŒXEŒY  D EŒ.X EX/ EŒ.Y EY /:

The correlation coefficient is defined by

Cov.X; Y /
.X; Y / D ; X2 D Var.X /; Y2 D Var.Y /:
X Y

X and Y are said to be uncorrelated if Cov.X; Y / D 0 or, equivalently, .X; Y / D 0:

It looks like covariance measures how independent X and Y are. It is certainly true that if X; Y
are independent, then .X; Y / D 0; but the reverse is false.

Example 2.58 Suppose X; Y have the joint pmf P .X D 1; Y D 0/ D P .X D 0; Y D 1/ D


P .X D 0; Y D 1/ D P .X D 1; Y D 0/ D 14 . For all other cases P .X D i; Y D j / D 0: As de-
fined earlier, given the joint density P .X D x; Y D y/;
X X
P .X D x/ D P .X D x; Y D y/ and P .Y D y/ D P .X D x; Y D y/
y x

are the marginals of X; Y: In this example P .X D 1/ D 41 ; P .X D 0/ D 12 ; and P .X D 1/ D


1
4
: Similarly, P .Y D 1/ D 14 ; P .Y D 0/ D 21 ; P .Y D 1/ D 14 : We can arrange all this in a two-
way table
44 2. RANDOM VARIABLES
Y
X
1 0 1 P .X D x/
1 0 1=4 0 1=4
0 1=4 0 1=4 1=2
1 0 1=4 0 1=4
P .Y D y/ 1=4 1=2 1=4 1

The sum of each row is P .X D x/ while the sum of each column is P .Y D y/: Each element
of the matrix is P .X D x; Y D y/: If X; Y are independent, the .x; y/ element of the matrix
must be the product of the marginals, i.e., P .X D x; Y D y/ D P .X D x/P .Y D y/: You can
see from the table that is not true so X; Y are not independent. On the other hand,
X
EŒXY D x yP .X D x; Y D y/ D 0
x;y

1 1 1 1
EŒX  D . 1/ C .C1/ D 0; and EŒY  D . 1/ C .C1/ D 0:
4 4 4 4
which means Cov.X; Y / D 0 and so X; Y are uncorrelated.

Here’s one of the more important implications of independence.

Theorem 2.59 If X; Y are rvs Var.X C Y / D Var.X/ C 2Cov.X; Y / C Var.Y / : If X; Y are


uncorrelated, then Var.X C Y / D Var.X/ C Var.Y /:

Proof. This is a calculation.


Var.X C Y / D E.X C Y /2 .EX C EY /2
D E.X 2 C 2XY C Y 2 / .EX/2 2EXEY .EY /2
D Var.X/ C 2Cov.X; Y / C Var.Y /:
If X; Y are uncorrelated, Cov.X; Y / D 0: 

Remark 2.60 This can be extended to n rvs X1 ; : : : ; Xn : If they are uncorrelated (which is true
if they are independent), Var.X1 C    C Xn / D Var.X1 / C    C Var.Xn /:

2.5.3 THE GENERAL CENTRAL LIMIT THEOREM


For statistics, one of the major applications of independence is the following fact:
0 v 1
n n u n
X X uX
Xi  N .i ; i / ; i D 1; 2; : : : ; n; and independent H) Xi  N @ i ; t i2 A :
i D1 i D1 i D1
2.5. JOINT DISTRIBUTIONS 45
We can see this using mgfs and the following proposition.
Proposition 2.61 Let X1 ; X2 ; : : : ; Xn be independent rvs with mgf MXi .t /; i D 1; 2; : : : ; n:
Let Sn D X1 C    C Xn : Then MSn .t/ D MX1 .t /  MX2 .t/    MXn .t/:

This is directly from the definition and the independence. In fact,


MSn .t/ D Ee .X1 CCXn /t D Ee tX 1    Ee tX n :
Therefore, if Xi  N.i ; i /; i D 1; 2; : : : ; n; and they are independent, we have
Y n    X 
1 t2 X 2
MSn .t / D exp ti C i2 t 2 D exp t i C i :
2 2
i D1

Since
 mgfsqdetermine
 a distribution uniquely according to Theorem 2.42, we see that Sn 
P P 2
N i ; i :

Example 2.62 The sum of independent Geom.p/ random variables is Negative Binomial. In
particular, suppose X is the number of Bernoulli trials until we get r successes with probability p
of success on each trial. Then X D X1 C X2 C    C Xr ; where Xi  Geom.p/; i D 1; 2; : : : ; r;
is the number of trials until the first success. This is true since once we have a success we sim-
ply start counting anew from the last success until we get another success. Now, we have by
independence,
r
X X r
r r.1 p/
EŒX D E ŒXi  D ; and VarŒX D VarŒXi  D :
p p2
i D1 i D1

et p
In addition, using the mgf of Geom.p/; namely, MXi .t/ D 1 e t .1 p/
;t < ln.1 p/ we have
r
Y e rt p r
MX .t / D MXi .t/ D ; t< ln.1 p/:
.1 e t .1 p//r
i D1

and this must be the mgf of a Negative Binomial rv.

We have seen that the sum of independent normal rvs is exactly normal. The Central
Limit Theorem says that even if the Xi ’s are not normal, the sum is approximately normal if the
number of rvs is large. We have already seen the special case of this for Binomials but it is true
in much more generality. The full proof is covered in more advanced courses.
Theorem 2.63 Central Limit Theorem. Let X1 ; X2 ; : : : be a sequence of independent rvs all
having the same distributions and EX 1 D ; Var.X1 / D  2 : Then for any a; b 2 R;
 
X1 C    C Xn n
lim P a  p  b D P .a  Z  b/ ;
n!1  n
46 2. RANDOM VARIABLES
where Z  N.0; 1/:

In other words, for large n; (generally n  30),


p
Sn D X1 C    C Xn  N.n;  n/

n 2
and, dividing by n; since E Snn D n
D ; Var. Snn / D 1 2
n2
 n D n
;
 
Sn 
X  N ; p :
n n

This is true no matter what the distributions of the individual Xi ’s are as long as they all have
the same finite means and variances.

Sketch of Proof of the CLT (Optional): We may assume  D 0 (why?) and we also may
p
assume a D 1: Set Zn D Sn =. n/: Then, if M.t / D MXi .t/ is the common mgf of the rvs
Xi ;   n
t p
MZn .t / D M p D exp.n ln M.t =. n///:
 n
If we can show that
2 =2
lim MZn .t / D e t
n!1

then by Theorem 2.42 we can conclude that the cdf of Zn will converge to the cdf of the random
2
variable that has mgf e t =2 : But that random variable is Z  N.0; 1/: That will complete the
proof. Therefore, all we need to do is to show that
p t2
lim n ln M.t =. n// D :
n!1 2
p
To see this, change variables to x D t=. n/ so that
p t 2 ln M.x/
lim n ln M.t =. n// D lim 2 :
n!1 x!0  x2
Since ln M.0/ D 0 we may use L’Hopital’s rule to evaluate the limit. We get

t 2 ln M.x/ t2 M 0 .x/=M.x/
lim D lim
x!0  2 x2  2 x!0 2x
t2 M 00 .x/
D lim using L’Hopital again
2 2 x!0 xM 0 .x/ C M.x/
t2 M 00 .0/ t2 2 t2
D D D ;
2 2 0 M 0 .0/ C M.0/ 2 2 0 0 C 1 2
2.5. JOINT DISTRIBUTIONS 47
0 00 2 2
since M.0/ D 1; M .0/ D EX D 0; M .0/ D EX D  : This completes the proof. 

Example 2.64 Suppose an elevator is designed to hold 2000 pounds. The mean weight of a
person getting on the elevator is 175 with standard deviation 15 pounds. How many people can
board the elevator so that the chance it is overloaded is 1%?
Let W D X1 C    C Xn be the total weight of n people who board the elevator. We don’t
know the distribution of the weights of individual people (which is probably not normal), but
p
we do know EX D 175 and Var.X/ D 152 : By the central limit theorem, W  N.175n; 15 n/
and we want to find n so that
P .W > 2000/ D 0:01:
If we standardize W we get
   
W 175n 2000 175n 2000 175n
0:01 D P .W > 2000/ D P p > p DP Z> p :
15 n 15 n 15 n
Using a calculator, we get P .Z > z/ D 0:01 H) z D invNorm.0:99/ D 2:326: Therefore, it
must be true that
2000 175n
p  2:326 H) n  11:
15 n
The maximum number of people that can board the elevator and meet the criterion is 11. With-
out knowing the distribution of the weight of people, there is no other way to do this problem.

2.5.4 CHEBYCHEV’S INEQUALITY AND THE WEAK LAW OF LARGE


NUMBERS
Suppose we have a rv X which has an arbitrary distribution but a finite mean  D EX and
variance  2 D Var.X/: Chebychev’s inequality gives an upper bound on the chances X differs
from it’s mean without knowing anything about the distribution of X at all. Here’s the inequality.

2
P .jX j  c/  ; for any constant c > 0: (2.2)
c
The larger c is the smaller the probability can be. The argument for Chebychev is simple. Assume
X has pdf f . Then
Z Z
2 2 2
 D EjX j D jx j f .x/ dx C jx j2 f .x/ dx
jx jc jx jc
Z Z
 jx j2 f .x/ dx  c 2 f .x/ dx D c 2 P .jX j  c/ :
jx jc jx jc
48 2. RANDOM VARIABLES

Chebychev is used to give us the Weak Law of Large Numbers which tells us that the mean of
a random sample should converge to the population mean as the sample size goes to infinity.

Theorem 2.65 Weak Law of Large Numbers. Let X1 ; : : : ; Xn be a random sample, i.e., in-
dependent and all having the same distribution as the rv X which has finite mean EX D  and
finite variance  2 D Var.X/: Then, for any constant c > 0; with X D X1 CCX
n
n
;

lim P jX j  c D 0:
n!1

2
Proof. We know EX D  and Var.X/ D n
: By Chebychev’s inequality,

 Var.X/ 2
P jX j  c  D ! 0 as n ! 1:
c nc


The Strong Law of Large Numbers, which is beyond the scope of this book, says Xn ! 
in a much stronger way than the Weak Law says, so we are comfortable that the sample means
do converge to the population mean.

2.6 2 .k/, STUDENT’S t- AND F-DISTRIBUTIONS


In this section we will record three of the most important distributions for statistics.

2.6.1 2 .k/ DISTRIBUTION


This is known as the 2 distribution with k degrees of freedom. We already encountered the
2 .1/ D Z 2 distribution where we showed it is the same as a standard normal squared. There is
a similar characterization for 2 .k/: In fact, let Z1 ; : : : ; Zk be k independent N.0; 1/ rvs. Define

Y D Z12 C Z22 C    C Zk2 :

Then Y  2 .k/: That is, a 2 .k/ rv is the sum of the squares of k independent normal rvs. In
fact, if we look at the mgf of Y , we have using Example 2.43 and independence,

k
Y k
Y  k=2
1 1 1
MY .t/ D MZ 2 .t / D p D ; t< ;
i 1 2t 1 2t 2
i D1 i D1
2.6. 2 .k/, STUDENT’S t- AND F-DISTRIBUTIONS 49
2
which is the mgf of a  .k/ rv which may be derived directly from the density. From, the mgf
it is easy to see that EY D k and Var.Y / D 2k: The main properties of Y are the following.
Remark 2.66 If X  2 .n/; Y  2 .m/ and X; Y are independent, then X C Y  2 .n C m/:
To see why,
 n=2  m=2  .nCm/=2
t .X CY / 1 1 1
MX CY .t / D Ee D MX .t/MY .t/ D D
1 2t 1 2t 1 2t
1
for t < Since distributions are uniquely determined by their mgf and the mgf of X C Y is
2
:
the mgf of a 2 .n C m/ rv, we know that X C Y  2 .n C m/:

Remark 2.67 The 2 .n/ distribution is not symmetric. Therefore, if we want to find a; b so
that P .a < 2 .n/ < b/ D 1 ˛ for some given 0 < ˛ < 1; we set it up so the area to the right
of b is ˛2 and the area to the left of a is also ˛2 : Using a TI-8x calculator, the command is a D
invchi.n; 1 ˛=2/ and b D invchi.n; ˛=2/ where the first parameter is the area desired to the
right of a or b . The program to get this is based on Newton’s method for solving 2 cdf.0; x; n/ D
1 ˛ for x .

(1) Input “RT TAIL,” A (5) X-(2 cdf(0,X,N)+A-1)/2 pdf(X,N) !


X
(2) Input “D of F,” N
(6) End
(3) N ! X (7) Disp X
(4) For ( J,1,9) (8) Stop

2.6.2 STUDENT’S t-DISTRIBUTION


This is a combination of two independent rvs, a N.0; 1/ rv and a 2 .k/ rv which arises naturally
in statistics.

Z
T .k/ D p ; Z  N.0; 1/: (2.3)
2
 .k/=k
We say that T has a Student’s t -distribution with k degrees of freedom.
Remark 2.68 We will come to this later but this will come from looking at the sample mean
divided by the sample standard deviation
v
u n
X  X1 C    Xn u 1 X 2
T D p ; XD ; S Dt Xi X :
S= n n n 1
i D1
50 2. RANDOM VARIABLES
This rv will have a t -distribution with n 1 degrees of freedom. Here are the main properties

of the t -distribution:
k
ET D 0; Var.T / D ; k > 2:
k 2

2.6.3 F -DISTRIBUTION
This is also a distribution arising in hypothesis testing in statistics as the quotient of two inde-
pendent 2 rvs. In particular,

2 .k1 /
k
F D 2 1 :
 .k2 /
k2

We say F  F .k1 ; k2 / has an F -distribution with k1 and k2 degrees of freedom.


The mean and variance are given by

 2
k2 k2 k1 C k2 2
EX D ; k2 > 2; and Var.X / D 2 ; k2 > 4:
k2 2 k2 2 k1 .k2 4/

The connection with the t -distribution is


0 12
21 .1/=1 B Z C
F .1; k/ D D @q A ; Z  N.0; 1/;
22 .k/=k 2
2 .k/=k

which means F .1; k/ D T .k/2 :

2.7 PROBLEMS

2.1. We roll two dice and X is the difference of the larger number of the two dice and the
smaller number. Find R.X /, the pmf, and the cdf of X and then find P .0 < X  3/
and P .1  X < 3/: Hint: Use the sample space.
2.7. PROBLEMS 51
2.2. Suppose that the distribution function of X is given by
8
ˆ 0 b<0 (a) Find P .X D i/; i D 1; 2; 3:
ˆ
ˆ
ˆ
ˆ (b) Find P .1=2 < X < 3=2/:
ˆ
ˆ b
ˆ
ˆ 0b<1
ˆ
ˆ
< 4
F .b/ D 1 b 1
ˆ C 1b<2
ˆ 2
ˆ 4
ˆ
ˆ 11
ˆ
ˆ
ˆ
ˆ 2b<3
ˆ 12

1 b3

2.3. If X has cdf FX .x/, what is the cdf

(a) of e X ?
(b) of the random variable aX C b; where a and b are nonzero constants?

2.4. Determine c so that f .x/ D P .X D x/ is a pmf.

(a) f .x/ D xc ; x D 1; 2; 3; : : : ; n:
c
(b) f .x/ D .xC2/.xC3/
;x D 0; 1; 2; 3; : : : : Hint: Use partial fractions.
8
< 3=4; 0  x  1
2.5. Let X be a continuous random variable with pdf f .x/ D 1=4; 2  x  3
:
0; otherwise.

(a) Draw the graph of f.


(b) Determine the cdf F of X , and draw its graph.
.x /2
1
2.6. Let f .x/ D p e 2 2 ; 1 < x < 1:
 2
(a) Show that f is a pdf.
(b) Show that x D  is a max point of f and x D  ˙  are inflection points of f: That
is, show that f 00 . ˙ / D 0 and f 00 .x/ < 0;   < x <  C  and f 00 .x/ > 0
if 1 < x <   or  C  < x < 1:

2.7. The pdf 8


f of a continuous random variable X is given by:
< cx C 3; 3x 2
f .x/ D 3 cx; 2  x  3
:
0; otherwise.
52 2. RANDOM VARIABLES
(a) Compute c .
(b) Compute the cdf of X .
2.8. The score X of a student on a certain exam is represented by a number between 0 and
1. Suppose that the student passes
8
the exam if this number is at least 0.55. Suppose the
< 4x; 0  x  1=2
pdf of X is given by f .x/ D 4 4x; 1=2  x  1
:
0; otherwise.
(a) What is the probability that the student fails the exam?
(b) What is the score that he will obtain with a 50% chance, in other words, what is
the 50th percentile of the score distribution?
(c) What is the 75th percentile score, i.e., find x75 such that P .X  x75 / D 0:75?
2.9. Let X be a normal random variable with mean 12 and variance 4. Find the value of c
such that P .X > c/ D :10: This c would be the 90th percentile of X:
2.10. (a) If X  Binom.n; p/ show that P .k  X  j / D P .X  j / P .X  k 1/: (b) If
you toss a fair coin 100 times, what is the probability you get from 52–60 Heads inclu-
sive?
2.11. Suppose 75% of the age group 10–14 years regularly utilize seat belts. Find the proba-
bility that in a random stop of 100 automobiles containing 10–14 year olds, 70 or fewer
are found to be wearing a seat belt. Find the solution using the binomial distribution as
well as the normal approximation to the binomial distribution.
2.12. If the probability that an individual suffers a bad reaction from injection of a given
serum is 0.001, determine the probability that out of 2000 individuals (a) exactly 3 and
(b) more than 2 individuals will suffer a bad reaction. Calculate this using the binomial,
Poisson (with  D np), and normal distributions. Which is the better approximation?
2.13. So far in the season, a certain baseball player for the Chicago Cubs has the following
probabilities, displayed in the table below, for the outcome of an at-bat. The possible
outcomes are an out, a walk, a single, a double, a triple, or a home run.

Outcome of an at-bat Outcome number Probability


Out 1 0:662
Walk 2 0:052
Single 3 0:213
Double 4 0:018
Triple 5 0:009
Home run 6 0:046
2.7. PROBLEMS 53
Find the probability that the player strikes out, walks twice, and doubles twice in the
same game.
P .X D k C 1/
2.14. For a hypergeometric random variable, determine :
P .X D k/
2.15. Let X  Binom.n; p/ and Y  Poisson. D np/: Compute
(a) P .X D 2/ and P .Y D 2/; n D 8; p D 0:1.
(b) P .X D 9/ and P .Y D 9/; n D 10; p D 0:95.
2.16. If you buy a lotto ticket in 50 games, in each of which your chances of winning is 1/100,
what is the probability you will win

(a) at least once? (b) exactly once? (c) at least twice?

2.17. Suppose X D .X1 ; : : : ; Xk / is multinomial with parameters n; k and p! 1 ; : : : ; pk : Show


X n x
that P .Xi D x/ D P .X1 D x1 ; : : : Xi D x; : : : ; Xk D xk / D p .1 pi /n x
x i
xj ;j ¤i
so that each Xi  Binom.n; pi /. Hint: Define success as in category i , failure as not in
category i .
2.18. If X is Geometric(p), show that P .X  n C kjX  n/ D P .X  k/: This is the mem-
oryless property since it implies that a geometric rv does not recall that n trials have
already passed.
2.19. Let X be a random variable that takes values in Œ0; 1; and has cdf given by
FX .x/ D x 2 ; 0  x  1; with FX .x/ D 0; x < 0; FX .x/ D 1; x > 1. Compute P .1=2 <
X  3=4/ and find the pdf of X:
2.20. Suppose we choose arbitrarily a point from the square with corners at (2,1), (3,1), (2,2),
and (3,2). The random variable A is the area of the triangle with its corners at (2,1),
(3,1) and the chosen point.
(a) What is the largest area that can occur, and what is the set of points for which
A  1=4?
(b) Determine the distribution function F of A.
(c) Determine the pdf f of A.
2.21. Show that if Z is a standard normal random variable, then, for x > 0,
(a) P .Z > x/ D P .Z < x/;
(b) P .jZj > x/ D 2P .Z > x/; and
54 2. RANDOM VARIABLES
(c) P .jZj < x/ D 2P .Z < x/ 1.

2.22. Jensen, arriving at a bus stop, just misses the bus. Suppose that he decides to walk if the
(next) bus takes longer than 5 minutes to arrive. Suppose also that the time in minutes
between the arrivals of buses at the bus stop is a continuous random variable with a
Unif.4; 6/ distribution. Let X be the time that Jensen will wait.

(a) What is the probability that X is less than 4 1/2 (minutes)?


(b) What is the probability that X equals 5 (minutes)?
(c) Is X a discrete random variable or a continuous random variable?

2.23. Let X have an Exp.0:2/ distribution. Compute P .X > 5/:


2.24. Let X  Exp./;  > 0: Find a value m such that P .X  m/ D 0:5.
2.25. Let Z  N.0; 1/. Find a number z such that P .Z  z/ D 0:9: Also find z so that
P . z  Z  z/ D 0:9:

2.26. The time (in hours) required to repair a machine is an exponentially distributed random
variable with parameter  D 1:2

(a) What is the probability that a repair time exceeds 2 hours?


(b) What is the conditional probability that a repair takes at least 10 hours, given that
its duration exceeds 9 hours?

2.27. The number of years a radio functions is exponentially distributed with parameter  D
1=18. If Jones buys a used radio, what is the probability that it will work for an additional
8 years?
2.28. A patient has insurance that pays $1,500 per day up to 3 days and $1,000 per day after
3 days. For typical illnesses the number of days in the hospital, X; has the pmf p.k/ D
7 k
21
; k D 1; 2; : : : ; 7: Find the mean expected amount the insurance company will pay.

2.29. Let X  N.; / use substitution and integration by parts to verify E.X / D  and
SD.X/ D : That is, verify Example 2.36.

2.30. An investor has the option of investing in one of two stocks. If he buys Stock A he can
net $500 with probability 1/2, and lose $100 with probability 1/2. If he buys Stock B,
he can net $1,500 with probability 1/4 and lose $200 with probability 3/4.

(a) Find the mean and SD for each investment. Which stock should he buy based on
the coefficient of variation defined by SD=?
(b) What is the interpretation of the coefficient of variation?
2.7. PROBLEMS 55
p
(c) The value of x dollars is worth g.x/ D x C 200 to the investor. This is called a
utility function. What is the expected utility to the investor for each stock?

2.31. Suppose an rv has density f .x/ D 2x; 0 < x < 1 and 0 otherwise. Find

(a) P .X < 1=2/; P .1=4 < X  1=2/; P .X < 3=4jX > 1=2/ and
(b) EŒX; SDŒX ; EŒe tX :
5x
2.32. An rv has pdf f .x/ D 5e ; 0  x < 1; and 0 otherwise. Find EŒX ; VarŒX; medŒX:
P
2.33. Find the mgf of X  Geometric.p/: (Hint: 1 k a
kD1 a D 1 a ; jaj < 1:) Use it to find
the mean and variance of X:
P ak
2.34. Find the mgf of X  Poisson./ (Hint: 1 a
kD0 kŠ D e :) Use it to find the mean and
variance of X:
2t t
2.35. Suppose X has the mgf MX .t/ D 0:09e C 0:24e C 0:24e t C 0:09e 2t C 0:34:

(a) Find P .X  0/:


(b) Find EŒX :

2.36. Let X  UnifŒ0; 1: Find EŒ4X 5 C 5X 4 C 4X 3 8X 2 C 7X C 1.


P
2.37. The mean deviation of a discrete rv is defined by MD.X/ D fx j P .X Dx/>0g jx
EŒX jP .X D x/: Find the mean deviation of the rv X which is the sum of the faces
of two fair dice which are rolled.
2.38. RThe mean deviation of a continuous rv X with pdf fX .x/ is defined by MD.X/ D
1
1 jx EŒX jfX .x/ dx: Find the mean deviation for X  Exp./ and for X 
UnifŒa; b:
2.39. An exam is graded on a scale 0–100. A student must score at least 60 to pass. Student
8
ˆ
<0:0004x 0  x  50
scores are modeled by the density f .x/ D 0:04 0:0004x 50  x  100

0 otherwise:

(a) Find the probability a student passes.


(b) What exam score is the 85th percentile?

2.40. Find EX and Var.X/ for the rv X with the following densities.

(a) P .X D k/ D n1 ; k D 1; 2; : : : ; n:
(b) fX .x/ D rx r 1
; 0 < x < 1; r > 0:
56 2. RANDOM VARIABLES
2.41. Let N be the number of Bernoulli trials (meaning independent and only one of two
outcomes possible) to get r successes. N is a negative binomial  rv with parame-
ters r; p; NegBinom.r; p/; and we know P .N D k/ D kr 11 p r .1 p/k r ; k D r; r C
1; r C 2; : : : : If you think of the process restarting after each success is obtained, it is
reasonable to write N D Y1 C    C Yr ; where Yi ’s are independent geometric rvs. Use
this to find the mgf of N and then find EN; Var.N /:
2.42. Let X be Hypergeometric.N; n; k/: Let p D k=N . It can be shown that EX D
np and Var.X/ D np.1 p/ N N 1
n
. This looks like the same mean and variance
k N n
of a Binomial.n; p/ with p D except for the extra term : This term
N N 1
is known as the Correction
p factor. We know that Binomial.n; p/ can be ap-
proximated by Normal.np; np.1 p//: What is the approximate distribution of
Hypergeometric.N; n; k/?
2.43. We throw a coin until a head turns up for the second time, where p is the probability that
a throw results in a head and we assume that the outcome of each throw is independent
of the previous outcomes. Let X be the number of times we have to throw the coin.
(a) Determine P .X D 2/; P .X D 3/; and P .X D 4/:
(b) Show that P .X D n/ D .n 1/p 2 .1 p/n 2
; for n  2:
(c) Find EX:
2.44. Suppose P .X D 0/ D 1 P .X D 1/; E.X / D 3Var.X/: Find P .X D 0/:
2.45. If EX D 1; Var.X/ D 5 find E.2 C X /2 and Var.4 C 3X/:
2.46. Monthly worldwide major mainframe sales is 3.5 per year and has a Poisson distribution.
Find
(a) the probability of at least 2 sales in the next month,
(b) the probability at most one sale in the next month, and
(c) the variance of the monthly number of sales.
2.47. A batch of 100 items has 6 defective and 94 good. If X is the number of defectives in a
randomly drawn sample of 10, find P .X D 0/; P .X > 2/; EX; Var.X /:
2.48. An insurance company sells a policy with a 1 unit deductible. Let X be the amount of

0:9; x D 0
the loss have pmf f .x/ D
c=x; x D 1; 2; 3; 4; 5; 6:
Find c and the expected total amount the insurance company has to pay out.
! 
4 1 4
2.49. Find EX; EŒX.X 1/; if X has pmf f .x/ D ; x D 0; 1; 2; 3; 4:
x 2
2.7. PROBLEMS 57
2.50. Choose a so as to minimize EŒjX aj:

(a) when X is uniformly distributed over .0; A/ and


(b) when X is now exponential with rate .
2.51. Suppose that X is a normal random variable with mean 5. If P .X > 9/ D :2; what is
Var.X /?
2.52. We have the rvs X; Y with the joint density f .x; y/ D x C y for 0  x; y  1; and
f .x; y/ D 0 otherwise. Find the marginal densities fX .x/ and fY .y/: Are X and Y
independent? Calculate E.X C Y /:
2.53. Suppose a population contains 16% ex-cons. If 50 people are selected at random use the
CLT to estimate P .X < 5/ where X is the number of ex-cons in your sample.
2.54. Show that (a) Cov.X; Y / D Cov.Y; X/; and (b) Cov.aX C b; Y / D aCov.X; Y /:
2.55. Let c D k in Chebychev’s inequality. Show that P .jX j < k/  1 k12 : Estimate
the probability that the values of a rv fall within two standard deviations of the mean.
2.56. Consider the discrete rv with pdf P .X D ˙2/ D 81 ; P .X D 0/ D 34 : Find EX; Var.X/
and use Chebychev to find P .jX j  2/ and compare it with the exact answer.
2.57. Let X  Poisson.ƒ/ where ƒ  UnifŒ0; 3: Find P .X D 1/ by using the Law of Total
R3
Probability P .X D 1/ D 0 P .X D 1jƒ D /P .ƒ D / d :
59

CHAPTER 3

Distributions of Sample Mean


and Sample SD
This chapter begins the probabilistic study of statistics. When we take a random sample from
the population we wind up with new random variables like the Sample Mean X and Sample
Standard Deviation S and various functions involving these quantities. In statistics we need to
know the probabilistic distribution of these quantities in order to be able to quantify the errors
we make in approximating parameters like the population mean using the random sample.

3.1 POPULATION DISTRIBUTION KNOWN


A random variable representing the item of interest is labeled X and is called the population
random variable. This could be something like the income level of an individual or the voter
preference or the efficacy of a drug, etc. A random sample from X is a collection of indepen-
dent random variables X1 ; X2 ; : : : ; Xn ; which have the same distribution as X . The random
variables in the sample have the same pdf and cdf as the population random variable X: Because
the random variables X1 ; X2 ; : : : ; Xn ; are independent and with the same distribution, this is a
model of sampling with replacement.
A population box is a representation of each individual in the population using a ticket.
On the ticket is written the particular object of interest for study, such as weight, income, IQ,
years of schooling, etc.

#1 #2 # 3 # 4 # 5 …

The box contains one ticket for each individual in the population and the number on the ticket
is the item of interest to the experimenter. Think of a random sample as choosing tickets from
the population box with replacement (so the box remains the same on each draw). If we don’t
replace the tickets, the box changes after each draw and the random variables would no longer be
independent or have the same distribution as X . Each time we take a sample from the popula-
tion we are getting values X1 D x1 ; X2 D x2 ; : : : ; Xn D xn and these values .x1 ; x2 ; : : : ; xn / are
specific observed values of the random variables .X1 ; : : : ; Xn /. In general, lowercase variables
will be observed variables, while uppercase represents random variables before observation.
60 3. DISTRIBUTIONS OF SAMPLE MEAN AND SAMPLE SD
Once we have a random sample we want to summarize the values of the random variables
to obtain some information.

Definition 3.1 Given any collection of random variables X1 ; X2 ; : : : ; Xn ; the sample mean is
P
X D X1 CX2nCCXn : The sample variance is S 2 D n 1 1 niD1 .Xi X/2 and the sample stan-
dard deviation is S: The sample median, assuming the random variables are sorted, is
8
<X nC1 ; if n is odd;
eD
X 2

: X n2 CX n2 C1
2
; if n is even.

Any function g.X1 ; : : : ; Xn / of the random sample is said to be a statistic. For example
e are examples of statistics.
X ; S; and X

Remark 3.2 e are random variables, not num-


It is important to keep in mind that X; S; and X
bers. Once the experiment is performed and we have observations X1 D x1 ; : : : ; Xn D xn ; then
the observed statistics
8
n < x nC1
ˆ ; if n is odd;
x1 C x2 C    C xn 2 1 X 2
2
xD ;s D .xi x/ ; e
xD
n n 1 :̂ x n2 Cx n2 C1
i D1 ; if n is even
2

result in real numbers and these are not random variables.

Example 3.3 Consider the population box 0 1 2 3 4 : This is another way of saying
the population rv is discrete with R.X / D f0; 1; 2; 3; 4g and P .X D k/ D 1=5; k D 0; 1; 2; 3; 4:
We will choose random samples of size 2 from this population, without replacement. The pop-
ulation mean of the numbers on the tickets is  D E.X/ D 0C1C2C3C4 5
D 2 with population
2 1 P5 2
variance=E.X 2/ D 5 i D1 .xi 2/ D 2:
How many samples of size 2 are there? Since we don’t care about the order, we areasking
for how many combinations of the 5 numbers taken 2 at a time there are and that is 52 D 10:
Here are the 10 possible samples of size 2:

.0; 1/; .0; 2/; .0; 3/; .0; 4/; .1; 2/; .1; 3/; .1; 4/; .2; 3/; .2; 4/; .3; 4/:

We take the sample mean X D X1 CX 2


2
; where Xi is the number on ticket i D 1; 2: The possible
values of X are 1=2; 1; 3=2; 2; 5=2; 3; 7=2: The distribution of X is

x 1/2 1 3/2 2 5/2 3 7/2


P .X D x/ 0.1 0.1 0.2 0.2 0.2 0.1 0.1
3.1. POPULATION DISTRIBUTION KNOWN 61
For example, P .X D 3=2/ D 0:2 because there are 10 samples of size 2 of which exactly 2 have
a sample mean of 3=2: Now that we have the distribution of X , we may compute
7
X
EŒX D x i P .X D x i /
iD1

D 0:1.1=2/ C 0:1.1/ C 0:2.3=2/ C 0:2.2/ C 0:2.5=2/ C 0:1.3/ C 0:1.7=2/ D 2

the same as the population mean : This is not a coincidence. Next, the variance of the sample
mean is
X7
Var.X / D .x i 2/2 P .X D x i / D 0:75:
i D1

The population variance is 2 but the variance of the sample mean is 0.75. The variance of the
sample mean is always lower than the variance of the individuals in the population as we will
show in the next theorem. Averages have lower variation.

The theorem will show us how to calculate E.X/ and SD.X/ directly from the population
mean, SD, and sample size.

Theorem 3.4 If X1 ; : : : ; Xn ; is a random sample from a population X with mean  D EX and


variance  2 D Var.X/; then

2
EŒX D ; Var.X / D :
n

If the population size, N; is finite, and the sampling is done without replacement, then

2 N n
EŒX  D  Var.X / D :
n N 1

In our example, we do have a finite population N D 5 so the variance of X calculated


2 N n 25 2 3
using the theorem is Var.X/ D D D :
n N 1 25 1 4
To show the theorem, we have
1 1
EŒX D EŒX1 C    C Xn  D n D :
n n
Also, we know
Var.X C Y / D Var.X/ C 2Cov.X; Y / C Var.Y /:
62 3. DISTRIBUTIONS OF SAMPLE MEAN AND SAMPLE SD
If X; Y are uncorrelated, which is certainly true if they are independent, then Var.X C Y / D
Var.X/ C Var.Y /: Therefore,

1 n 2 2
Var.X/ D ŒVar.X1 / C    C Var.Xn / D D :
n2 n2 n
The rest of the proof when sampling is done without replacement is based on the Hypergeometric
distribution and is omitted. 

Definition
q 3.5 The Standard Error (SE) of the sample mean is SE D SD.X / D
E.X E.X //2 : When we are sampling from an infinite population or from a finite popula-

tion but with replacement, then SE.X/ D p : When we are sampling from a finite population
n
without replacement, then
r
N n 
SE.X/ D p :
N 1 n
r
N n
The term ; when we have a finite population and sampling is without replacement, is
N 1
called the finite population correction factor for the standard error of the sample mean.

Remark 3.6 Theorem 3.4 shows that EX D : Whenever we have an estimator, in this case
X ; of a parameter inherent to the population rv, in this case ; and we know EX D ; we say
that the statistic X is an unbiased estimator of :

Sample means are calculated using X D .X1 C    C Xn /=n. Sometimes we are interested
in the distribution of the sums Sn D X1 C    Xn : Here is the result.

Theorem 3.7 If X1 ; : : : ; Xn is a random sample from a population X with mean  D EX and


variance  2 D Var.X/; then
" n # " n #
X X
E Xi D n; Var Xi D n 2 :
i D1 i D1

p
Thus, SE.Sn / D  n: If we have
r a finite population and we sample without replacement,
p N n
EŒSn  D n but SE.Sn / D  n :
N 1

Since nSn D X H) EŒSn  D nEŒX  D n; and VarŒSn  D VarŒnX D n2 VarŒX D


2
n2 n D n  2 : The point is that we just multiply the result for X by n:
3.1. POPULATION DISTRIBUTION KNOWN 63
3.1.1 THE POPULATION X  N.; /
In this section we assume the population rv X actually has a Normal distribution with unknown
mean  but known standard deviation : The goal is to estimate the unknown parameter  by
using the sample mean X:

Theorem 3.8 If X  N.; /; and X1 ; : : : ; Xn is a random sample from X; then X 


N.; pn /:

p
Proof. In Proposition 2.61 we see that Sn D X1 C    C Xn  N.n ; n /: Therefore, X D
Sn
n
 N.; pn /: 

X p
Now that we know X  N.; pn / we may standardize X to see that Z D = n
is N.0; 1/:
We can answer questions about the chances the sample mean lies in a particular interval P .a 
X  b/: Here’s an example

Example 3.9 Suppose IQs of a population are normally distributed N.100; 10/: We know
10
X  N.100; p n
/: If we take a random sample of n D 25 people, we find
p
(a) P .95 < X < 105/ D normalcdf.95; 105; 100; 10= 25/ D 0:9875.
p
(b) P .X > 120/ D normalcdf.120; 1; 100; 10= 25/ D 7:7  10 24  0:
p
(c) P .X < 98/ D normalcdf. 1; 98; 100; 10= 25/ D 0:1586.

You can see that the sample mean X has much less variability than X itself since, for example
P .95 < X < 105/ D normalcdf.95; 105; 100; 10/ D 0:3829: This says about 38% of individuals
have IQs between 95 and 105, but 98.75% of samples of size 25 will result in a sample mean
between 95 and 105. Individuals may vary a lot, but averages don’t.

Many underlying populations do not follow a normal distribution. For instance, the distri-
butions of annual incomes, or times between arrivals of customers, are not normally distributed.
Now we ask the question what happens to the sample mean when the underlying population
does not follow a normal distribution.

3.1.2 THE POPULATION X IS NOT NORMAL BUT HAS KNOWN MEAN


AND VARIANCE
Here’s where the Central Limit Theorem 2.63 plays a big role. Even though X; having mean
EX D  and SD.X/ D ; is not normal, the Theorem 2.63 says that X will be approximately
N.; pn / as long as the sample size n is large enough. Specifically, the CLT theorem says that
for Z  N.0; 1/;
64 3. DISTRIBUTIONS OF SAMPLE MEAN AND SAMPLE SD
!
X 
lim P a p b D P .a  Z  b/ : (3.1)
n!1 = n
We will use this in two forms, one for the sample mean X and the other for the sum
Sn D X1 C    C Xn :
(a) X  N.; pn /.
p
(b) Sn D X1 C X2 C    C Xn  N.n; n/.
How large does n need to be to make the approximation decent? An exact answer depends
on the underlying population distribution but a good rule of thumb is n  30:
Example 3.10 Suppose we take a random sample of size n D 10 from a population X 
UnifŒ0; 10: We want to find P .X  4/: Even though n < 30 let’s use the normal ap-
proximation. We know X  N.; pn / and in this case  D EX D 5 and Var.X/ D  2 D
p
102 =12: Therefore, X  N.5; 10= 12  10/ D N.5; 0:9128/: Using this we get P .X  4/ 
normalcdf.0; 4; 5; 0:9128/ D 0:1366:
Analytically it is not easy to find P .X  4/ D P .X1 C    C X10  40/ directly using the
uniform distribution because the sum of 10 independent uniform rvs is not uniform. Never-
theless, we may use technology to find the exact value of P .X1 C    C X10  40/ D 0:138902:
Even with n D 10 we get a really good approximation of 0:1366: In Figure 3.1 we show what
happens to the pdf of the sum of n Uniform[0,10] distributions if we take n D 1; 2; 4; 10: You
can see that with n D 10 the pdf looks a lot like a normal density. Even with n D 4 it looks
normal (see Figure 3.1).

3.1.3 THE POPULATION IS BERNOULLI, p KNOWN


There are many problems in statistics in which each individual in the population is either a
success (represented by a 1) or a failure (represented by a 0). The box model is

0 1 1 0 … p D proportion of 1s in the box.

Suppose we know the probability of success is p; 0 < p < 1: This also means the percentage
of 1s in the population will be 100p%: The population rv is X  Bernoulli.p/ and we take a
random sample X1 ; : : : ; Xn from X: Each Xi is either 0 or 1 and Sn D X1 C    C Xn is the
total number of 1s in our sample and X D Snn is the fraction of 1s in the sample. It is also true
that Sn is exactly Binom.n; p/: By Theorem 2.63 we also know
r !
p p.1 p/
Sn  N.np; np.1 p// and X  N p; :
n
3.1. POPULATION DISTRIBUTION KNOWN 65
0.10

0.08

0.06

0.04

0.02

20 40 60 80 100

Figure 3.1: Sum of n D 1; 2; 4; 10 UniformŒ0; 10 distributions.

In general, we will use this approximation as long as np  5 and n.1 p/  5:

Example 3.11 Suppose a gambler bets $1 on Red in a game of roulette. The chance he wins
is p D 18
38
and the bet pays even money (which means if red comes up he is paid his $1 plus an
additional $1). He plans to play 50 times.
First, how many games do we expect to win? That is given by ES50 D 50  18 38
D 23:68
and we will expect to lose 50 23:68 D 26:32 games. Since the bet pays even money and we are
betting $1 on each game, we expect to lose $26.32.
How far off this number do weq expect to be, i.e., what is the SE for the number of games
won and lost? The SE is SD.S50 / D 50 18 38
.1 18
38
/ D 3:53 so we expect to win 23.68 games,
give or take about 3.53 games. Put another way, we expect to lose $26.32, give or take $3.53.
(a) What is the chance he wins 40% of the games, i.e., 20 games?
We are looking for P .X D 0:4/ which is equivalent to P .S50 D 20/: We’ll do this in two
ways to get the exact result and then using
q the normal approximation.
18
First, since S50  Binom.50  38
; 50 18
38
.1 18
38
/ D Binom.23:68; 3:53/ we get the exact
value
! 20  30
50 18 18
P .S50 D 20/ D 1 D binompdf.50; 18=38; 20/ D 0:066:
20 38 38

Using the normal approximation we have to use the continuity correction


 because the probability

q
18 18=38.1 18=38/
a continuous rv is any single value is always zero. We use X  N 38 ; 50
D
66 3. DISTRIBUTIONS OF SAMPLE MEAN AND SAMPLE SD
N.0:473; 0:0706/

P .X D 0:4/  P .:39  X  :41/ D normalcdf.:39; :41; :473; :0706/ D 0:06623:

Note that the continuity correction would use 19:5  S50  20:5 H) :39  X  :41: Also,
np D 50  18=38 > 5 and n.1 p/ D 50  20=38 > 5; so a normal approximation may be used.
(b) What is the chance he wins no more than 40% of the games?
Now we want P .X  0:4/ or P .S50  20/: We have the exact answer
20
!   
X 50 18 k 18 50 k
P .S50  20/ D 1 D binomcdf.50; 18=38; 20/ D 0:1837:
k 38 38
kD0

The approximate answer is either given by P .S50  20/  normalcdf.0; 20:5; 23:68; 3:53/ D
0:18383 or P .X  0:4/ D normalcdf.0; 0:4; :473; :0706/ D :1505. If we use the continuity cor-
rection, it will be P .X  0:4/ D normalcdf.0; 0:41; :473; :0706/ D :18610.
(c) What is the chance the gambler comes out ahead?
This is asking for P .S50  25/ or P .X  0:5/ Using the same procedure as before we get
the exact answer

P .S50  25/ D 1 P .S50  24/ D 1 binomcdf..50; 18=38; 24/ D :4078:

Then, since 24:5=50 D :49 with the continuity correction we get

P .X  :49/  normalcdf.:49; 1; :473; :0706/ D 0:40485:

Without the continuity correction P .X  :5/ D :35106:

The result for Bernoulli populations, i.e., when X is either 0 or 1, extends to a more general
case when we assume the population takes on any two values. Here is the result.

a; with probability p ,
Theorem 3.12 Suppose the population is X D and we have a
b; with probability 1 p ,
random sample X1 ; : : : ; Xn ; from the population. Then
(a) EX D ap C b.1 p/ and Var.X/ D .a b/2 p.1 p/.

(b) ESn D E.X1 C    C Xn / D n.ap C b.1 p// and Var.Sn / D n.a b/2 p.1 p/.

(c) EX D ap C b.1 p/ and Var.X/ D n1 .a b/2 p.1 p/.


p
(d) Sn  N.n.ap C b.1 p//; ja bj qnp.1 p//
and X  N.ap C b.1 p/; ja bj p.1n p/ /.
3.2. POPULATION VARIANCE UNKNOWN 67
Example 3.13 Going back to Example 3.11 we can think of each play with outcome the
amount won or lost. That means the population rv is

C1; with probability 18=38;
XD
1; with probability 20=38.

This results in EX D 0:0526; SD.X/ D 0:9986: Therefore, the mean expected winnings in
50 plays is ES50 D 2:63 with SE, SD.S50 / D 7:06: In 50 plays we expect to lose 2:63
dollars, give or take 7:06: Now we ask what are the chances of losing more than $4? This
is P .S50 < 4/  normalcdf. 1; 4; 2:63; 7:06/ D 0:423: Using the continuity correction
P .S50 < 4/  normalcdf. 1; 4:5; 2:63; 7:06/ D 0:395:

Example 3.14 A multiple choice exam has 20 questions with 4 possible answers for each ques-
tion. A student will guess the answer for each question by choosing an answer at random. To
penalize guessing, the test is scored as C2 for each correct answer and 1 for each incorrect
answer.
(a) What is the students’s expected score and the SE for the score?
Answering a question is just like choosing at random 1 out of 4 possible tickets from the box
-1 -1 -1 2 . Since there are 20 questions, we will do this 20 times and total up the num-
bers on the tickets. The population mean is .2/ 14 C . 1/ 34 D 1
and the population SD is
q 4
.2 . 1// 14 34 D 1:299.
p
The expected score is ES20 D 20. 14 / D 5 with SE D SD.S20 / 20  1:299 D 5:809: The best
estimate for the student’s score is 5; give or take about 6.
(b) Find the approximate chance the student scores a 5 or greater.
Since S20  N. 5; 5:809/ we have P .S20  4:5/ D 0:0509.

3.2 POPULATION VARIANCE UNKNOWN: SAMPLING


DISTRIBUTION OF THE SAMPLE VARIANCE
Up until now we have assumed that the population mean and variance were known. When  is
unknown we are going to use X to approximate it. In this section we show what happens when
 is unknown and use the sample SD to approximate it.
We have a random sample X1 ; : : : ; Xn from a population X assumed to have EX D  and
Var.X/ D  2 : Recall that the sample variance is given by
n
X
1
S2 D .Xi X/2 :
n 1
i D1
68 3. DISTRIBUTIONS OF SAMPLE MEAN AND SAMPLE SD
If we use S 2 to approximate  2 can we expect it to be a good approximation? And why do we
divide by n 1 instead of just n? The next theorem answers both of these questions.

Theorem 3.15 S 2 is an unbiased estimator of  2 : That is, ES2 D  2

Proof. If X was replaced by ; we could make this easy calculation


n
! n
1 X 2 1 X n
E .Xi / D E.Xi /2 D  2:
n 1 n 1 n 1
iD1 i D1

But we have the factor n n 1 which shouldn’t appear, so it must come from the substitution of X
for : Here is what we have to do:
n
! n
!
1 X 2 1 X
E Xi X DE .Xi  C  X/2
n 1 n 1
i D1 i D1
n
!
1 X  2
2
D E .Xi / C 2.Xi /  X C  X
n 1
i D1
1  2 2  
D n C nE  X C 2E  X .Xi /
n 1
 
1 2 1 
D n 2 C n C 2E. X/.nX n/ D n 2 C  2 2nE.X /2
n 1 n n 1
 
1 2 2
D n 2 C  2 2n D .n C 1 2/ D  2 :
n 1 n n 1

That’s why we divide by n 1 instead of just n: If we divided by n; S 2 would be a biased estimator


of  2 : 

Now we need to determine not just ES 2 but the distribution of S 2 :

Theorem 3.16 If we have a random sample from a normal population X  N.;  /; then
n 1 2 2 2 2
S   .n 1/: In particular, ES D E2 .n 1/ D  2 and
2 n 1
 2 
2  2
Var.S / D Var  .n 1/
n 1
4 4 4
D Var.2 .n 1// D 2.n 1/ D 2 :
.n 1/2 .n 1/2 n 1
3.2. POPULATION VARIANCE UNKNOWN 69
We will skip the proof of this theorem but just note that if we replaced X by ; we would
have
n
X n   n
2 1 2  2 X Xi  2 2 X 2
W D .Xi / D D Zi ;
n 1 n 1  n 1
i D1 i D1 i D1

where Zi  N.0; 1/; i D 1; 2; : : : ; n is a set of n independent standard normal rvs. But we know
P
that in this case niD1 Zi2  2 .n/: Consequently, n 21 W 2  2 .n/: That looks really close to
what the theorem claims and we would be done except for the fact that W ¤ S and 2 .n/ ¤
2 .n 1/: Replacing  by X accounts for the difference. We omit the details.

Example 3.17 Suppose we have a population X  N.; 10/ and we choose a random sample
X1 ; : : : ; X25 from X: Letting S 2 denote the sample variance, we want to find the following.

(a) P .S 2 > 50/. Using the theorem we have .n 1/S 2 = 2  2 .24/ so that P .S 2 > 50/ D
P .24S 2 =100 > 24.50/=100/ D P .2 .24/ > 12/ D 2 cdf .12; 1; 24/ D 0:9799:

(b) P .75 < S 2 < 125/ D P .:24.75/ < 2 .24/ < :24.125// D 0:61825:
2 4
(c) ES 2 D  2 D 100; Var.S 2 / D D 10;000=12:
n 1

Now we know that if  was known and we had a random sample from a normal population
X p
X  N.;  /, then = n
 N.0; 1/: If  is unknown, this is no longer true. What we want is to
replace  by S , which is determined from the sample and not the population. That makes the
denominator also depend on the random sample. Consider the rv

X  X 
p p
X  = n = n Z N.0; 1/
T D D r Ds Dr Dr :
S S 2 2
 .n 1/ 2 .n 1/
p .n 1/= 2 S 2
n 2 n 1 n 1
.n 1/

From (2.3) we see that T  T .n 1/ has a Student’s t -distribution distribution with n 1 de-
grees of freedom. We summarize this result.

Theorem 3.18 (1) If X  N.; / and X1 ; : : : ; Xn is a random sample from X; then T D


X 
p has a Student’s t -distribution with n 1 degrees of freedom.
S= n
(2) If X has EX D ; Var.X/ D  2 and X1 ; : : : ; Xn is a random sample from X; which
X 
may not be normal, then T D p has an approximate Student’s t -distribution with n 1
S= n
degrees of freedom.
70 3. DISTRIBUTIONS OF SAMPLE MEAN AND SAMPLE SD
Example 3.19 The length of stays at a medical center after heart surgery have mean  D
6:2 days. A random sample of n D 20 such patients resulted in a sample mean of x D
7:8 with a sample SD of s D 1:2: The probability that we have a sample mean of 7.8 or
greater is P .X  7:8/ D P . X p
6:2
 7:8 p6:2 / D P .T .19/  5:963/ D tcdf.5:963; 1; 19/ D
1:2= 20 1:2= 20
6
4:86  10  0: This is very unlikely to happen. Maybe  is incorrect.

Bernoulli Population; p Unknown


If the population is Bernoulli but we don’t know the population probability of success p; then
we estimate p from the sample proportion given by X D X1 CCX
n
n
D P : We know that
p
p.1 p/
E.P / D p and SD.P / D p :
n

Everything would be fine except for the fact that we don’t know p . We have no choice but to use
the information available to us and replace
p
p by the observed sample proportion P D p: The
SE is then approximated by SD.P / D pn p/ and
p.1

r !
P p p.1 p/
ZDr  N.0; 1/ or P  N p; :
p.1 p/ n
n

This should only be used if np  5 and n.1 p/  5:

3.2.1 SAMPLING DISTRIBUTION OF DIFFERENCES OF TWO


SAMPLES
In many problems we want to compare the results of two independent random samples from
two populations. For those problems we will need the sampling distributions of the differences
in the sample means. The following theorem summarizes the results. Since the sample sizes may
be different, we use X n to denote the sample mean when the sample size is n.

Theorem 3.20 Let X and Y be two random variables with and let X1 ; X2 ; : : : ; Xn be a
random sample from X and Y1 ; Y2 ; : : : ; Ym be an independent random sample from Y: Let
X D EX; Y D EY; X D SD.X/; and Y D SD.Y /:
X Y
(a) E.X n / D X ; SD.X n / D p and E.Y m / D Y ; SD.Y m / D p .
n m
s
X2 2
(b) E.X n Y m / D X Y and SD.X n Y m / D C Y:
n m
3.3. PROBLEMS 71
(c) If X  N.X ; X /; Y  N.Y ; Y / then X n Y m  N.X Y ; SD.X n Y m //:
(d) For large enough n; m; X n Y m  N.X Y ; SD.X n Y m //:
(e) If X  Bernoulli.pX /, and Y  Bernoulli.pY /, then, if npX  5; n.1 pX /  5 and
mpY  5; m.1 pY /  5;
r
pX .1 pX / pY .1 pY /
X n Y m  N.pX pY ; C :
n m
(f ) If X  Bernoulli.pX /, and Y  Bernoulli.pY /, then, if npX  5; n.1 pX /  5 and
mpY  5; m.1 pY /  5;
p
Sn Sm  N.npX mpY ; npX .1 pX / C mpY .1 pY /:

(g) If the sampling is done without replacement from finite populations, the correction factors
are used to adjust the SDs.
(h) If the SDs X and Y are unknown, replace the normal distributions with the
t distributions in a similar way as the one-sample cases.

The only point we need to verify is the formulas for the SD of the difference. This follows
from independenc:
Var.X n Y m / D Var.X n / C Var.Y m /:

3.3 PROBLEMS
3.1. Consider the population box 0 1 2 3 4 : We will choose random samples of size
2 from this population, with replacement.
(a) Find all samples of size 2. Find the distribution of X:
(b) Find E.X / and SD.X/ using Theorem 3.4 and directly from the first part.

3.2. Consider the population box 7 1 2 3 4 : We will choose random samples of size
2 from this population, with replacement.
(a) Find all samples of size 2. Find the distribution of X:
(b) Find E.X / and SD.X/ using Theorem 3.4 and directly from the first part.
(c) Repeat the first two parts if the samples are drawn without replacement.
3.3. Suppose an investor has 3 types of investments: 20% is at $40, 35% is at $55, and 45%
is at $95. If the investor randomly selects 2 of these investments for sale, what is the
distribution of the average sale price, i.e., P .X D k/‹ Then find E.X/ and SD.X/.
72 3. DISTRIBUTIONS OF SAMPLE MEAN AND SAMPLE SD
3.4. The distribution of X is given by

k 4 5 6 7 8
P .X D k/ 1/9 2/9 3/9 2/9 1/9

(a) Find E.X/ and SD.X/:


(b) What is the population mean? What is the population  if the sample size was 9
and sampling was with replacement?

3.5. A manufacturer has six different devices; device i D 1; : : : ; 4 has i defects and devices
5 and 6 have 0 defects. The inspector chooses at random 2 different devices to inspect.

(a) What is the expected total number of defects of the two sample devices? What is
the SE for the total?
(b) What is P .X D 3/?

3.6. The IQs of 1000 students have an average of 105.5 with an SD of 11. IQs approximately
follow a normal distribution. Suppose 150 random samples of size 25 are taken from
this population. Find

(a) the mean and SD of S25 D X1 C    C X25 ;


(b) the mean and SD of X ;
(c) the expected number of sample means that are between 98 and 104 inclusive; and
(d) the expected number of sample means < 97:5:

3.7. Let X1 ; X2 ; : : : ; X50 be a random sample (so independent and identically distributed)
with  D 1=4 and  D 1=3: Use the CLT to estimate P .X1 C    C X50 < 10/:
3.8. The mean life of a cell phone is 5 years with an SD of 1 year. Assume the lifetime follows
a normal distribution. Find

(a) P .4:4 < X < 5:2/ when we take a sample of 9 phones;


(b) the 85th percentile of sample means with samples of size 9; and
(c) the probability a cell phone lasts at least 7 years.

3.9. In the 1965 case of Swain v Alabama, an African-American man appealed to the U.S.
Supreme Court his conviction on a charge of rape on the basis of the fact that there
were no African-Americans on his jury. At that time in Alabama, only men over the
age of 21 were eligible to serve on a jury. Jurors were selected from a panel of 100
ostensibly randomly chosen men chosen from the county. Census data at that time
3.3. PROBLEMS 73
showed that 16% of the eligible members to be selected for the panel were African-
Americans. Of the 100 panel members for the Swain jury, 8 were African-American
but were not chosen for the jury because of challenges by the attorneys. Use the Central
Limit Theorem to approximate the chance that 8 or fewer African-Americans would
be on a panel selected at random from the population.
3.10. A random sample X1 ; : : : ; X150 ; is drawn from a population with mean  D 25 and
 D 10: The population distribution is unknown. Let A be the sample mean of the first
50 and B be the sample mean of the remaining 100.
(a) What are the approximate distributions of A; B from the CLT?
(b) Use the CLT to find P .19  A  26/ and P .19  B  26/:
3.11. An elevator in a hotel can carry a maximum weight of 4000 pounds. The weights of the
customers at the hotel are normally distributed with mean 165 pounds and SD D 12
pounds. How many passengers can get on the elevator at one time so that there is at
most a 1% chance it is overloaded? If the weights are not normally distributed will your
answer be approximately correct? Explain.
3.12. A random sample of 25 circuit boards is chosen and their mean life is calculated, X :
The true distribution of the length of life is Exponential with a mean of 5 years. Use the
CLT to approximate the probability P .jX 5j  0:5/:
3.13. You will flip a fair coin a certain number of times.
(a) Use both the exact Binomial.n; p/ distribution and the normal approximation to
find the probabilities P .X  n=2/; P .X D n=2/ and compare the results. Use n D
10; 20; 30; 40 and p D 0:5: Here X is the number of heads.
(b) Find P .X D 2/ when n D 4 using the Binomial and the Normal approximation.
Note that np < 5; n.1 p/ < 5:
3.14. In a multiple choice exam there are 25 questions each with 4 possible answers only one
of which is correct. If a student takes the exam and randomly guesses the answer for
each problem, what is the exact and approximate chance the student will get at least 10
correct.
3.15. Five hundred people will each toss a coin which is suspected to be loaded. They will
each toss the coin 120 times.
(a) If it is a fair coin how many people will we expect to get between 40 and 60%
heads?
(b) If in fact 453 people get between 40 and 60% heads, and you declare the coin to
be fair, what is the probability you are making a mistake?
74 3. DISTRIBUTIONS OF SAMPLE MEAN AND SAMPLE SD
3.16. A candidate in an election received 44% of the popular vote. If a poll of a random sample
of size 250 is taken find the approximate probability a majority of the sample would be
for the candidate.
3.17. Another possible bet in roulette is to bet on a group of 4 numbers; so if any one of
the numbers comes up the gambler wins. The bet pays $8 for every $1 bet. Suppose a
gambler will play 25 times betting $1 on a group each time.

(a) What is the population box for this game?


(b) Use Theorem 3.12 to find the population mean and SD.
(c) Find the probability of winning 4 or more games.
(d) Find the approximate probability of coming out ahead.

3.18. Prove Theorem 3.12.


3.19. Let Tn denote a Student’s t rv with n degrees of freedom. Find

(a) P .T7 < 2:35/. (c) P . 1:45 < T12 < 2:2/.
(b) P .T22 > 1:33/. (d) the number c so that P .T12 > c/ D 0:95.

3.20. Let Y  2 .8/:

(a) Find EY and Var.Y /. (c) Suppose P .a < Y < b/ D 0:8: Find
a; b so that the two tails have the
same area.
(b) Suppose P .Y > a/ D 0:05; P .Y < (d) Suppose P .2 .1/ < a/ D 0:23; and
b/ D 0:1; P .Y < c/ D 0:9; P .Y > P .Z 2 < b/ D 0:23: Find a; b and de-
d / D 0:95: Find a; b; c , and d: termine if a D b 2 :

3.21. A random sample of 400 people was taken from the population of factory workers.
213 worked for factories with 100 or more employees. Find the probability of getting a
sample proportion of 0:5325 or more, when the true proportion is known to be 0:51:
3.22. Ten measurements of the diameter of a ball resulted in a sample mean x D 4:38 cm
with sample SD D 0:08: Given that the mean diameter should be  D 4:30 find the
probability P .jX 4:30j > 0:08/.
3.23. The heights of a population of 300 male students at a college are normally distributed
with mean 68 inches. Suppose 80 male students are chosen at random and their sample
average is 66 inches with a sample SD of 3 inches. What are the chances of drawing
such a sample with x  66 if the sample is drawn with or without replacement?
3.3. PROBLEMS 75
3.24. A real estate office takes a random sample of 35 rental units and finds a sample average
rent paid of $1,200 with a sample SD of $325. The rents do not follow the normal
curve. Find the probability the sample average would be $1,200 or more if the actual
mean population rent is $1,250.
3.25. A random sample of 1000 people is taken to estimate the percentage of Republicans in
the population. 467 people in the sample claim to be Republicans. Find the chance that
the percentage of Republicans in a sample of size 1000 will be in the range from 45%
to 48%.
3.26. Three organizations take a random sample of size 40 from a Bernoulli population with
unknown p: The number of 1s in the first sample is 8, in the second sample is 10, and the
third sample is 13. Find the estimated population proportion p for each sample along
with the SE for each sample.
3.27. Given a random sample X1 ; : : : Xn from a population with E.X / D ; Var.X / D  2 ;
find the mean and variance of the estimator O D .X1 C Xn /=2:
3.28. A random sample of 20 measurements will be taken of the pressure in a chamber. It is
assumed that the measurements will be normally distributed with mean 0 and variance
 2 D 0:004: What are the chances the sample mean will be within 0.01 of the true
mean?
3.29. A measurement process is normally distributed with known variance  2 D :054 but
unknown mean. What sample size is required in order to be sure the sample mean is
within 0.01 of the true mean with probability at least 0.95?
3.30. The time to failure of a cell phone’s battery is 2.78 hours, where failure means a low
battery indicator will flash.
(a) Given that 100 measurements are made with a sample SD of s D 0:26; what is the
probability the sample mean will be within 0.05 of the true mean?
(b) How many measurements need to be taken to ensure P .jX j < 0:05/  0:98?
Assume s D 0:26 is a good approximation to :
3.31. Suppose we have 49 data values with sample mean x D 6:25 and sample SD 6: Find the
probability of obtaining a sample mean of 6.25 or greater if the true population mean is
 D 4:
3.32. Two manufacturers each produce a semiconductor component with a mean lifetime of
X D 1400 hours and X D 200 for company X; and Y D 1200; Y D 100 for com-
pany Y: Suppose 125 components from each company are randomly selected and tested.
(a) Find the probability that X ’s components will have a mean lifetime at least 160
hours longer than Y ’s.
76 3. DISTRIBUTIONS OF SAMPLE MEAN AND SAMPLE SD
(b) Find the probability that X ’s components will have a mean lifetime at least 250
hours longer than Y ’s.
3.33. Two drug companies, A and B; have competing drugs for a disease. Each companies
drug has a 50% chance of a cure. They will each choose 50 patients at random and
administer their drug. Find the approximate probability company A will achieve 5 or
more cures than company B:
3.34. The population mean score of students on an exam is 72 with an SD of 6. Suppose two
groups of students are independently chosen at random with 26 in one group and 32 in
the other.
(a) Find the probability the sample means will differ by more than 3.
(b) Find the probability the sample means will differ by at least 2 but no more than 4
points.
3.35. Post election results showed the winning candidate with 53% of the vote. Find the prob-
ability the two independent random samples with 200 voters each would indicate a dif-
ference of more than 12% in their two voting proportions.
3.36. Let fZ1 ; : : : ; Z16 g be a random sample from N.0; 1/ and fX1 ; : : : ; X64 g an independent
random sample from X  N.; 1/:

(a) Find P .Z12 > 2/: (d) Find a value of c such that!
! 16
16
X 1 X
P .Zi Z/2 > c D 0:05:
(b) Find P Zi > 2 : 15
i D1
i D1
! (e) Find the distribution of
16
X X16 64
X
(c) Find P Zi2 > 16 : Y D 2
Zi C .Xi /2 :
i D1 i D1 i D1
77

CHAPTER 4

Confidence and Prediction


Intervals
This chapter introduces the concept of a statistical interval. The principal types of statistical
intervals are confidence intervals and prediction intervals. Instead of using a single number
(a point estimate) to estimate quantities like the mean, a statistical interval includes the point
estimate and an interval around it (an interval estimate) quantifying the errors involved in the
estimate.

4.1 CONFIDENCE INTERVALS FOR A SINGLE SAMPLE


We will use # to denote a parameter of interest in the pmf or pdf which we are trying to estimate.
Normally this is the population mean  or SD  , or the population binomial proportion p: The
following is the precise definition of what it means to be a confidence interval for the parameter
# . We give the definition for a continuous random variable X with pdf fX .xI #/. The definition
if X is discrete is similar.
Definition 4.1 Let X1 ; X2 ; : : : ; Xn be a random sample from a random variable X with pdf
fX .xI #/. Given 0 < ˛ < 1; a 100.1 ˛/% confidence interval for # is an open interval with
random endpoints of the form l.X1 ; X2 ; : : : ; Xn / and u.X1 ; X2 ; : : : ; Xn / such that
P .l .X1 ; X2 ; : : : ; Xn / < # < u .X1 ; X2 ; : : : ; Xn // D 1 ˛:

The percentage 100.1 ˛/% is called the confidence level of the interval.

In the remainder of the book we will use CI to denote Confidence Interval.


Remark 4.2 The value of ˛ sets the confidence level, and ˛ represents the probability the
interval .l; u/ does not contain the parameter #: The probability that the value of the unknown
parameter # lies in the interval can be adjusted depending on our choice of ˛ . For example,
if we would like the probability to be 0:99 (a confidence level of 99%), then we would choose
˛ D 0:01. If it is acceptable that the probability be only 0:90 (a confidence level of 90%), then
we can instead choose ˛ D 0:10. We would expect that the smaller the value of ˛; meaning a
higher confidence level, the wider the interval and vice versa. That is, the more confidence we
want, the less we have to be confident of. No matter what the level of confidence (except for
˛ D 0), the random interval .l; u/ may not contain the true value of #:
78 4. CONFIDENCE AND PREDICTION INTERVALS
Remark 4.3 The interval in the definition has random endpoints which depend on the random
sample. How do we obtain an actual interval of real numbers? If X1 D x1 , X2 D x2 , and Xn D xn
is the set of actual observed sample values,1 then an interval on the real line is obtained by
evaluating the functions l and u at the sample values. That is, the observed sample endpoints
of the interval are l D l.x1 ; x2 ; : : : ; xn / and u D u.x1 ; x2 ; : : : ; xn /. Then the observed 100.1
˛/% confidence interval is simply .l; u/. (Although .l.x1 ; : : : ; xn /; u.x1 ; : : : ; xn // does not have
random endpoints, we will still refer to it as a confidence interval. It is important to keep in mind
however that P .l.X1 ; : : : ; Xn / < # < u.X1 ; : : : ; Xn // has meaning, but P .l.x1 ; : : : ; xn / < # <
u.x1 ; : : : ; xn // does not since there are no random variables present in the latter expression.)
The true value of # is either in .l.x1 ; : : : ; xn /; u.x1 ; : : : ; xn // D .l; u/ or it is not. If .l; u/
does contain # , let us consider that a success. A failure occurs when # is not in the interval.
We may consider this a Bernoulli rv with 1 ˛ as the probability of success. If we construct
N 100.1 ˛/% confidence intervals by taking N observations of the random sample, the rv
S which counts the number of successes is S  Binom.N; 1 ˛/: Then, we expect E.S / D
N.1 ˛/ of these constructed intervals to actually be successes, i.e., contain #: For instance, if
˛ D 0:05, and we perform the experiment 100 times, we expect 95 of the intervals to contain
the true value of # and 5 not to contain # . Of course, when we obtain data and construct the
confidence interval, we will not know whether this particular interval contains # or not, but we
have 100.1 ˛/% confidence that it does.

4.1.1 CONTROLLING THE ERROR OF AN ESTIMATE USING


CONFIDENCE INTERVALS
An important feature of confidence intervals is that they allow us to control the error of the
estimates produced by estimators. Specifically, suppose that #e D #.xO 1 ; : : : ; xn / is an estimate
O
obtained by substituting observed sample values into an estimator #.X1 ; : : : ; Xn / for the param-
eter # .

Definition 4.4 The error " of the estimate #e is the quantity " D j#e #j.

The error of the estimate is therefore the amount by which the estimate deviates from the
true value of # .

1 Capital values, X , are random variables, while small script, x , is the observed value of the rv, X .
4.1. CONFIDENCE INTERVALS FOR A SINGLE SAMPLE 79
4.1.2 PIVOTAL QUANTITIES
How do we construct confidence intervals for parameters in probability distributions? Normally
we need what are called pivotal quantities, or pivots for short.

Definition 4.5 Let X1 ; X2 ; : : : ; Xn be a random sample from a random variable X with pdf
fX .xI #/. A pivotal quantity for # is a random variable h.X1 ; X2 ; : : : ; Xn ; #/ whose distribution
does not depend on #; i.e., it is the same for every value of #:

The function h is a random variable constructed from the random sample and the constant
# . Here’s an example of a pivotal quantity.

Example 4.6 Take a sample X of size one from a normal distribution with unknown  but
known  . The rv
X 
ZD  N.0; 1/

is a standard normal random variable that doesn’t depend on the value of . Therefore, Z D
.X /= is a pivotal quantity for . These quantities will also be known as test statistics in
a later context.

Pivotal Quantities for the Normal Parameters


Let X1 ; X2 ; : : : ; Xn be a random sample from a normal random variable X  N.; /. The
following distributions leading to pivotal quantities were introduced in Chapter 3.

 2 D 02 is known
p
Since E.X/ D  and SD.X / D 0 = n, a pivotal quantity for  is

X 
p  N.0; 1/:
0 = n

 2 is unknown
In this case,  has to be estimated from the sample. Therefore, a pivotal quantity for  is

X 
p  t.n 1/:
SX = n

We now turn to finding a pivotal quantity for the variance. We again consider two cases.
80 4. CONFIDENCE AND PREDICTION INTERVALS
Table 4.1: Pivotal quantities for N.; /

Parameter Conditions Pivotal Quantity Distribution


X 
  D 0 known p N.0; 1/
0 = n
X 
  unknown p t.n 1/
S= n
1 P n
2  D 0 known .Xi 0 /2 2 .n/
 2 i D1
1 P n 2
2  unknown Xi X 2 .n 1/
 2 i D1

 D 0 is known
In this case,
n 
X 2
Xi 0
 2 .n/:

i D1

Recall that this follows from the fact that each term .Xi /=  N.0; 1/, and the sum of n
squares of independent standard normals is 2 .n/.

 is unknown
In this case,  has to be estimated from the sample, and the pivot becomes

n
1 X 2
Xi X  2 .n 1/:
2
i D1

There is a loss of a degree of freedom due to estimating . We summarize our results in Table 4.1.

Remark 4.7 A TI program to obtain critical values of the 2 distribution is given in Re-
mark 2.67.

4.1.3 CONFIDENCE INTERVALS FOR THE MEAN AND VARIANCE OF A


NORMAL DISTRIBUTION
Now that we have pivotal quantities for the parameters of interest, we can proceed to construct
confidence intervals for the normal random variable.
4.1. CONFIDENCE INTERVALS FOR A SINGLE SAMPLE 81
A Confidence Interval for the Mean,  known
We first construct a 100.1 ˛/% confidence interval for the mean  of a normal random variable
X  N.;  / when  D 0 is known. Let X1 ; X2 ; : : : ; Xn be a random sample. We know that
X p
ZD 0 = n
 N.0; 1/: Therefore, we want
!
 X 
P z˛=2 < Z < z˛=2 D P z˛=2 < p < z˛=2 D1 ˛;
0 = n

where z˛=2 D invNorm.1 ˛=2/ is the ˛=2 critical value of Z . Rearranging we get
 
0 0
P X z˛=2 p <  < X C z˛=2 p D 1 ˛:
n n

The confidence interval with random endpoints is therefore


 
0 0
.l.X1 ; : : : ; Xn /; u.X1 ; : : : ; Xn // D X z˛=2 p ; X C z˛=2 p :
n n

For observed values of X1 D x1 ; : : : ; Xn D xn ; the confidence interval is


 
0 0
x z˛=2 p ; x C z˛=2 p :
n n

We are 100.1 ˛/% confident that the true value of  is in this interval. If  really is in this
interval, then the error is
0
" D jx j  z˛=2 p :
n

But  might not be in the interval, and so we can only say that we are 100.1 ˛/% confident
p
that the error is no more than z˛=2 0 = n.

Remark 4.8 Looking at the error ", we see that " can be decreased in several ways. For a fixed
sample size n, the larger ˛ is, i.e., the lower the confidence level, the smaller " will be. This is
because a smaller confidence level implies a smaller value of z˛=2 . On the other hand, for a fixed
˛; if we increase the sample size, " gets smaller. For a given confidence level, the only way to
decrease the error is to increase the sample size. Notice that as n ! 1; i.e., as the sample size
becomes larger and larger, the error shrinks to zero.

Example 4.9 A sample X1 ; X2 ; : : : ; X106 of 106 healthy adults have their temperatures taken
during a routine physical checkup, and it is found that the mean body temperature of the sample
is x D 98:2ı F. Previous studies suggest that 0 D 0:62ı F. Assume the body temperatures are
82 4. CONFIDENCE AND PREDICTION INTERVALS
drawn from a normal population. Set ˛ D 0:05 for a confidence level of 95%. A 95% confidence
interval for  is, since z:025 D invNorm.:975/ D 1:96;
 
0 0
x z:025 p ; x C z:025 p D .98:08ı ; 98:32ı /.
106 106
We are 95% confident that the true mean healthy adult body temperature is between 98:08ı F
and 98:32ı F. The traditional value of 98:6ı F for the body temperature of a healthy adult is not in
this interval! There is a 5% chance that the interval missed the true value, but this result provides
some evidence that the true value may not be 98:6ı F. Finally, we note that
0
" D jx j  z:025 p D 0:12ı F.
106:0
Therefore, we are 95% confident that our estimate of x D 98:2ı F as the mean healthy adult
body temperature deviates from the true temperature by at most 0:12ı F.

Sample Size for Given Level of Confidence


For a fixed level of confidence, the only parameter we can control is the sample size n. What we
want to know is how large the sample size needs to be so that we can guarantee the error is no
greater than some given amount, say d . We would need to require that
z˛=2 0
p  d.
n
Solving the inequality for n, we obtain
 
p p z˛=2 0 z˛=2 0 2
z˛=2 0  d n H) n H) n D :
d d

The smallest sample size that will do the job is given by the smallest integer greater than or equal
to .z˛=2 0 =d /2 :
Example 4.10 Going back to the previous example, suppose that we take d D 0:05, and we
want to be 99% percent confident that our estimate xN differs from the true value of the mean
healthy adult body temperature by at most 0:05ı F. In this case,
  &  ' &  '
z˛=2 0 2 z0:005 0:62 2 .2:58/ .0:62/ 2
nD D D D 1;024.
d 0:05 0:05

We would have to take a much larger sample than the original n D 106 to be 99% confident that
our estimate is within 0:05ı F of the true value.

Now we have to work on the more realistic problem that the variance of the population is
unknown.
4.1. CONFIDENCE INTERVALS FOR A SINGLE SAMPLE 83
A Confidence Interval for the Mean,  Unknown
If the variance  2 is unknown, we have no choice but to replace  with its estimate SX from the
X 
sample. A pivotal quantity for  is given by T D p  t.n 1/: Repeating essentially the
SX = n
same derivation as before, we obtain the 100.1 ˛/% confidence interval for  as
 
sX sX
x t.n 1; ˛=2/ p ; x C t.n 1; ˛=2/ p :
n n
Just as x is the sample mean estimate for ; sX is the sample SD estimate for :
Example 4.11 During a certain two-week period during the summer, the number of drivers
speeding on Lake Shore Drive in Chicago is recorded. The data values are listed below.
Drivers Speeding on LSD
10 15 11 9 12 7 10
6 15 12 8 12 15 9
Assume the population is normal, meaning that the number of drivers speeding in a two-week
period is normally distributed. The variance is unknown and must be estimated. We compute
x D 10:93 and sX2 D 10:07 from the data values. Set ˛ D 0:05 for a confidence level of 95%. A
95% confidence interval for , the true mean number of drivers speeding on a given day, is given
by  
sX sX
x t.13; 0:025/ p ; x C t.13; 0:025/ p D .9:10; 12:47/:
14 14

A Confidence Interval for the Variance


We now turn to the variance. We consider two cases: the mean  D 0 is known, and  is
unknown.
First consider the case when  D 0 is known. From our table of pivotal quantities, we
have that
n
1 X
.Xi 0 /2  2 .n/.
2
i D1

If we set our confidence level at 100.1 ˛/%, then


n
!
1 X 2
P 2 .n; 1 ˛=2/ < 2 .Xi 0 / < 2 .n; ˛=2/ D 1 ˛:

i D1

Notice that the 2 distribution is not symmetric and so we need to consider two distinct 2
critical values. For instance, 100˛=2% of the area under the 2 pdf is to the right of 2 .n; ˛=2/,
84 4. CONFIDENCE AND PREDICTION INTERVALS
and 100˛=2% of the area under the 2 pdf is to the left of 2 .n; 1 ˛=2/: Solving this inequality
for  2 , we get
0P n Pn 1
.Xi 0 /2 .Xi 0 /2
B i D1 i D1 C
PB 2
@ 2 .n; ˛=2/ <  < 2 .n; 1 ˛=2/ A D 1 ˛:
C

A 100.1 ˛/% confidence interval for  2 is therefore given by


0P
n P
n 1
.xi  0 /2 .xi  0 /2
B i D1 i D1 C
B C:
@ 2 .n; ˛=2/ ; 2 .n; 1 ˛=2/ A

In a similar way, if the mean  is unknown, then it must be estimated as X . This time our
table of pivotal quantities gives
n
1 X 2
Xi X  2 .n 1/:
2
i D1
n
X 2
We write Xi X D .n 1/SX2 : In this case, a 100.1 ˛/% confidence interval for  2 is
i D1
given by
 
.n 1/sX2 .n 1/sX2
2
; :
 .n 1; ˛=2/ 2 .n 1; 1 ˛=2/
We summarize all our confidence intervals for the parameters of the normal random vari-
able in Table 4.2.

4.1.4 CONFIDENCE INTERVALS FOR A PROPORTION


We now turn to Bernoulli(p ) populations which are either successes or failures, i.e., each indi-
vidual in the population is either a 0 or a 1, and we are interested in estimating the percentage
of 1s in the population, which is p .
Let X  Binom.n; p/: Recall that X may be represented as
X D X1 C X2 C    C Xn
where each Xi is a Bernoulli(p ) random variable. Recall also that E.X / D np and Var.X/ D
np.1 p/. If n is large enough, the Central Limit Theorem can be invoked to approximate the
random variable
X np
p  N.0; 1/:
np.1 p/
The normal approximation is appropriate if np > 5 and n.1 p/ > 5.
4.1. CONFIDENCE INTERVALS FOR A SINGLE SAMPLE 85
Table 4.2: Confidence intervals for the parameters of the normal distribution

Parameter Conditions Confidence Interval


 
 
  D 0 known x z˛=2 p ; x C z˛=2 p
n n
 
sX sX
  unknown x t.n 1; ˛=2/ p ; x C t .n 1; ˛=2/ p
n n
0P n Pn 1
.x  0 /2 .xi 0 /2
B i D1 i i D1 C
2  D 0 known B C
@ 2 .n; ˛=2/ ; 2 .n; 1 ˛=2/ A

 
2
.n 1/sX2 .n 1/sX2
  unknown ; 2
2 .n 1; ˛=2/  .n 1; 1 ˛=2/

Confidence Interval for p


Using the normal approximation, we have
!
X np
P z˛=2 < p < z˛=2 Š1 ˛,
np.1 p/

and so  p p 
P z˛=2 np.1 p/ < X np < z˛=2 np.1 p/ Š 1 ˛:
Dividing through by n, we get
r r !
p.1 p/ p.1 p/
P z˛=2 <X p < z˛=2 Š1 ˛:
n n

Solve this inequality for p to get


r r !
p.1 p/ p.1 p/
P X z˛=2 < p < X C z˛=2 Š1 ˛.
n n

The problem with this interval is that the endpoints contain the unknown parameter p . To
eliminate p from the endpoints, we use the fact that p can be approximated by XN . If we take
this approach, called the bootstrap method, we obtain
0 s s 1
X.1 X/ X .1 X/
P @X z˛=2 < p < X C z˛=2 A Š 1 ˛.
n n
86 4. CONFIDENCE AND PREDICTION INTERVALS
The 100.1 ˛/% confidence interval for p using this approach is
r r !
p.1 p/ p.1 p/
p z˛=2 ; p C z˛=2 :
n n

We have replaced the random variable X with the observed sample proportion p D x:
The center of the confidence interval is p which is our approximation for p .

Error Bounds and Sample Size.


For the population proportion, the error is " D jpN pj, the amount by which the estimate pN
deviates from the true value of p . An error bound is easily obtained since we are 100.1 ˛/%
confident that r
p.1 p/
"  z˛=2 .
n
Now suppose we wish to construct an interval for which we are 100.1 ˛/% confident that
"  d for some specified d > 0. Using the fact that p.1 p/  1=4, since the function f .x/ D
x.1 x/ achieves a maximum value of 1=4 on the interval Œ0; 1; we obtain
r r
p.1 p/ 1
z˛=2  z˛=2 :
n 4n
Therefore & '
r 2
1 z˛=2
z˛=2  d H) n  :
4n 4d 2

The value of n is a sample size for which we can be 100.1 ˛/% confident that the er-
ror "˙  d . For example,
 if ˛ D 0:05 and d D 0:01, the sample size for which "  0:01 is
n  1:962 =.4.0:01/2 / D 9;604.
l m
2
Remark 4.12 The estimate of the sample size n  z˛=2 =.4d 2 / is the conservative estimate
because we have replaced the unknown p with 1=2: Another method of estimating n is to run
a two-stage experiment. In the first stage, an arbitrary sample size n  30 is taken, and the
estimate p is computed. Then the sample size to obtain an error bound of d is calculated by
 z 2 
˛=2
n p.1 p/ :
d

Example 4.13 A gardener is trying to grow a rare species of orchid in a greenhouse. Let X
denote the number of plants that survive under greenhouse conditions. Of the (random sample
4.1. CONFIDENCE INTERVALS FOR A SINGLE SAMPLE 87
of ) 50 plants she originally potted, only 17 survived. The random variable X  Binom.50; p/
has an unknown p . Suppose we wish to compute a 95% confidence interval for the percentage
p of plants that will survive in the greenhouse. Notice that our estimate for p is p D 0:34:
For the endpoints of the confidence interval, we obtain
r r
0:34.1 0:34/ 0:34.1 0:34/
l D 0:34 z0:025 D 0:209 and u D 0:34 C z0:025 D 0:471 .
50 50
The 95% confidence interval is .0:209; 0:471/. This time "  0:471 0:34 D 0:131. If we want
"  0:09 D d , then we need a sample of size
 z  
:025 2
n  :34.1 :34/ D d106:422e D 107:
:09
If we had not taken a sample of n D 50 and didn’t have an estimate of p D :34, we would need

:025 2
a sample of size n  d 41 z:09 e D 119 orchids to guarantee "  0:09:

4.1.5 ONE-SIDED CONFIDENCE INTERVALS


Sometimes we only desire an upper or lower bound on the unknown parameter # of a probabil-
ity distribution at a given confidence level. In this case we can generate one-sided confidence
intervals. For example, only an upper confidence bound on a speed limit is needed to ensure the
maximum speed is not exceeded. A lower confidence bound is needed to ensure some minimum
design specification is met. The intervals described so far have all been two-sided.
Definition 4.14 Let X1 ; X2 ; : : : ; Xn be a random sample from a random variable X with
pdf fX .xI #/. A 100.1 ˛/% one-sided confidence interval for # is an interval of the form
.l.X1 ; X2 ; : : : ; Xn /; C1/ or . 1; u.X1 ; X2 ; : : : ; Xn // such that
P .l .X1 ; X2 ; : : : ; Xn / < #/ D 1 ˛ or P .# < u .X1 ; X2 ; : : : ; Xn // D 1 ˛;
respectively. The quantity 100.1 ˛/% is called the confidence level of the respective intervals.

One-sided confidence intervals can be constructed for all the parameters in this chapter.
It will suffice to give a simple example to illustrate the process of constructing such an interval.
Example 4.15 Consider Example 4.11 involving speeders on Lake Shore Drive discussed
previously. Suppose we wish to construct 95% one-sided confidence intervals for the mean
number of speeders. As before, if the variance  2 is unknown, then a pivotal quantity for 
is SX =pn  t .n 1/. To obtain an upper one-sided confidence interval, we set
X
!
X 
P p > t.n 1; ˛/ D 1 ˛ ,
SX = n
88 4. CONFIDENCE AND PREDICTION INTERVALS
or equivalently, p 
P  < X C t .n 1; ˛/SX = n D 1 ˛.
p
A 100.1 ˛/% upper confidence interval for  is . 1; X C t.n 1; ˛/SX = n/: An upper
100.1 ˛/% one-sided observed confidence interval is given by
p 
1; x C t.n 1; ˛/sX = n :
p
For our example, if ˛ D 0:05, . 1; 10:93 C .1:77/ .3:17/ = 14/ D . 1; 12:43/ is a 95% upper
confidence interval for . We would say that we are 95% confident that the mean number of
speeders is no more than 12.43. If someone claimed the mean number of speeders was at least
15, our upper bound would be good evidence against that.
Similarly, a lower 100.1 ˛/% one-sided observed confidence interval is given by
p 
x t.n 1; ˛/sX = n; C1 :
p
For our example, we obtain .10:93 .1:77/ .3:17/ = 14; C1/ D .9:43; C1/ as a 95% lower
confidence interval for : We say that we are 95% confident that the mean number of speeders
is at least 9.43. By contrast, the two-sided 95% confidence interval for the mean number of
speeders is .9:10; 12:47/, and we are 95% confident the mean number of speeders is in that
interval.
One-sided intervals are not constructed simply by taking the upper and lower values of
a two-sided interval because in the two-sided case we use z˛=2 , but we use z˛ in the one-sided
case.

4.2 CONFIDENCE INTERVALS FOR TWO SAMPLES


In order to compare two populations we need to find a confidence interval for the differences in
the means, ratio of the variances, or difference of proportions.

4.2.1 DIFFERENCE OF TWO NORMAL MEANS


Let X1 ; X2 ; : : : ; Xm and Y1 ; Y2 ; : : : ; Yn be independent random samples from two normal pop-
ulations X  N.X ; X / and Y  N.Y ; Y /; respectively. We will derive 100.1 ˛/% confi-
dence intervals for the difference X Y under three different sets of conditions on the vari-
ances of these populations:
• X2 and Y2 both known,
• X2 and Y2 both unknown but equal, and
• X2 and Y2 both unknown and unequal.
Observe that we do not require the same sample size for the two populations. We will denote
the sample averages by X m and Y n to emphasize the sample sizes.
4.2. CONFIDENCE INTERVALS FOR TWO SAMPLES 89
Variances of Both Populations Known
We first assume that X2 D 0;X
2
and Y2 D 0;Y
2
are both known. By independence, Var.X m
2
0;X 2
0;Y
Y n/ D m
C n
; and
0 s 1
2 2
0;X 0;Y
Xm Y n  N @X Y ; C A.
m n

.X mr Y n / .X Y /
Consequently, the random variable 2 2
D Z  N.0; 1/ is a pivotal quantity.
0;X 0;Y
m C n
Therefore, we have
0 1

B Xm Y n .X Y / C
P @ z˛=2 < q 2 2
< z˛=2 A D 1 ˛:
0;X 0;Y
m
C n
s
2 2
0;X 0;Y
For simplicity, set Dn;m D C . Then after some algebra we get
m n

P .X m Y n/ z˛=2 Dn;m < X Y < .X m Y n / C z˛=2 Dn;m D 1 ˛:
The 100.1 ˛/% confidence interval for X Y in the case when both variances are known
is therefore given by

0 s s 1
2 2 2 2
0;X 0;Y 0;X 0;Y
@.x m yn/ z˛=2 C ; .x m y n / C z˛=2 C A:
m n m n

Variances of Both Populations Unknown but Equal


In this case, let  2 D X2 D Y2 be the common (unknown) variance. We need to obtain a pivotal
quantity for X Y . First observe that, similar to the case of known variances,
 
Xm Y n .X Y / Xm Y n .X Y /
q D q  D Z  N.0; 1/.
 2  2 2 1 C 1
m
C n
 m n

Since  is unknown, we know we have to replace it with a sample SD. The sample SDs of both
samples, which may not be equal even though we are assuming the populations SDs are the
same, have to be taken into account. We will do that by pooling the two SDs. Define

.m 1/SX2 C .n 1/SY2
Sp2 D :
mCn 2
90 4. CONFIDENCE AND PREDICTION INTERVALS
as the pooled sample variance. Observe that it is a weighted average of SX2 and SY2 . We will
replace  by Sp , and we need to find the distribution of

Xm Y n .X Y /
q :
1
Sp m C n1

We know that
.m 1/SX2 .n 1/SY2
 2 .m 1/ and  2 .n 1/:
2 2
But the sum of two independent 2 random variables with 1 and 2 degrees of freedom, re-
spectively, is again a 2 random variables with 1 C 2 degrees of freedom. Therefore,
.m 1/SX2 .n 1/SY2
C  2 .m C n 2/:
2 2
Again by independence,
,s
.X m Y n/ .X Y / .m 1/SX2 C .n 1/SY2
T D q   t.m C n 2/
2 1
C 1  2 .m C n 2/
m n

since the numerator is distributed as N.0; 1/ and the denominator is the square root of a 2
random variable with m C n 2 degrees of freedom divided by m C n 2. That is, T  t.m C
n 2/: Using algebra we see that
,s
.X m Y n / .X Y / .m 1/SX2 C .n 1/SY2
T D q 
2 1 C 1  2 .m C n 2/
m n

.X m Y n / .X Y /
D q ;
1
Sp m C n1

and so the above expression will be our pivotal quantity for X Y . Similar to the derivation in
the previous section, we conclude that a 100.1 ˛/% observed confidence interval for X Y
in the case when both variances are equal but unknown is given by

r r !
1 1 1 1
.x m yn/ t.m C n 2; ˛=2/sp C ; .x m y n / C t .m C n 2; ˛=2/sp C :
m n m n

In summary, this is the confidence interval to use when the variances are assumed un-
known but equal, and the sample variance we use is the pooled variance because it takes both
samples into account.
4.2. CONFIDENCE INTERVALS FOR TWO SAMPLES 91
Variances of Both Populations Unknown and Unequal
The final case occurs if both X2 and Y2 are unknown and unequal. Finding a pivotal quantity
for X Y with an exact distribution is currently an unsolved problem in statistics called the
Behrens–Fisher problem. Accordingly, the CI in this case is an approximation and not exact.
It turns out it can be shown that


X Yn .X Y /
T D q  t./
2 2
SX SY
m
C n

does follow a t -distribution, but the degrees of freedom is given by the formula

$  %
1 1 2
m
r C n
D 1 1
;
m2 .m 1/
r2 C n2 .n 1/

where r D sX2 =sY2 is simply the ratio of the sample variances. Observe that  depends only on
the ratio of the sample variances and the sizes of the respective samples, nothing else.
Therefore, an approximate 100.1 ˛/% observed confidence interval for X Y can
now be derived as

0 s s 1
sX2 sY2 sX2 sY2
@.x m yn/ t .; ˛=2/ C ; .x m y n / C t.; ˛=2/ C A:
m n m n

This is the CI to use with independent samples from two populations when there is no
reason to expect that the variances of the two populations are equal.

Example 4.16 A certain species of beetle is located throughout the United States, but specific
characteristics of the beetle, like carapace length, tend to vary by region. An entomologist is
studying the carapace length of populations of the beetle located in the southeast and the north-
east regions of the country. The data for the two samples of beetles is assumed to be normal and
is given below. It is desired to compute a 95% confidence interval for the mean difference in
carapace length between the two populations, the variances of which are unknown and assumed
to be unequal.
92 4. CONFIDENCE AND PREDICTION INTERVALS

Carapace Length (in millimeters)


10:5 9:72 10:05 9:94
8:90 9:33 9:73 11:37
Northeast
10:38 10:71 10:38 10:18
9:93 9:97 10:39 10:86
10:51 10:93 10:12
10:03 10:54 10:70
Southeast
9:59 10:72
9:60 11:21

Let N1 ; N2 ; : : : ; N16 represent the northeast sample and S1 ; S2 ; : : : ; S10 represent the southeast
2
sample. We first compute the sample variances of the two populations as sN D 0:355 and sS2 D
0:297: The value of  is given by rounding down

 2  
1 1 1 0:355 1 2
rC  C
m n 16 0:297 10
D  2 D 20:579.
1 1 1 0:355 1
2
r2 C 2 C 2
m .m 1/ n .n 1/ 2
16 .16 1/ 0:297 10 .10 1/

We find t.20; 0:025/ D invT.0:975; 20/ D 2:0859: A 95% confidence interval for X Y can
now be derived as
 q
.10:146 10:395/ t.20; 0:975/ 0:355
16
C 0:297
10
;
q 
.10:146 10:395/ C t .20; 0:975/ 0:355
16
C 0:297
10
D . 0:724 ; 0:226 /.

Notice that the value 0 is in this confidence interval, and so it is possible that there is no difference
in carapace length between the two populations of beetles.

Error Bounds
Error bounds in all the three cases discussed above are easy to derive since the estimator X m Yn
lies in the center of the confidence interval. In particular, for

" D j.xN m yNn / .X Y /j ,


4.2. CONFIDENCE INTERVALS FOR TWO SAMPLES 93
we are 100.1 ˛/% confident that
8 s
ˆ 2 2
ˆ
ˆ 0;X 0;Y
ˆ
ˆ z˛=2 C , if both variances are known
ˆ
ˆ m n r
ˆ
<
1 1
" t.m C n 2; ˛=2/sp C , if both variances are unknown but equal
ˆ
ˆ m n
ˆ
ˆ s
ˆ
ˆ
ˆ s2 s2
:̂ t.; ˛=2/ X C Y , if both variances are unknown and unequal.
m n

4.2.2 RATIO OF TWO NORMAL VARIANCES


Obtaining a confidence interval for the difference of two normal means is the more common
practice when comparing two samples. However, sometimes comparing variances is also im-
portant. For example, suppose a factory has two machines that produce a part that is used in a
product the company makes. The part must satisfy a critical engineering specification of some
type, say a dimension specification like diameter, to be usable in the product. It could be that
the means of two samples drawn from the machines are not significantly different, but the vari-
ances of the two samples are. The parts produced by the machine with the greater variability will
deviate from the specification more often than the machine with less variability. Quality con-
trol regulations might require that parts that do not conform to the engineering specification be
discarded, costing the company money in wasted material, labor, etc. In this situation, it would
be meaningful to compare the ratio of the variances of the two machines.
Obtaining a pivotal quantity for X2 =Y2 is easy in this case. Recall that
.m 1/SX2 .n 1/SY2
 2 .m 1/ and  2 .n 1/.
X2 Y2

Since the ratio of two 2 random variables each divided by their respective degrees of freedom
is distributed as an F random variable, we have
2
.m 1/SX , 2
.n 1/SY 
2
X 2
Y S2 SY2 SX2 Y2
D X2 D  F .m 1; n 1/.
m 1 n 1 X Y2 SY2 X2

It follows that
 
S2 2
P F .m 1; n 1; 1 ˛=2/ < X2 Y2 < F .m 1; n 1; ˛=2/ D 1 ˛.
SY X

Rewriting so that X2 =Y2 is in the center of the inequality gives the probability interval
 2 
SX 1 X2 SX2 1
P < < D 1 ˛.
SY2 F .m 1; n 1; ˛=2/ Y2 SY2 F .m 1; n 1; 1 ˛=2/
94 4. CONFIDENCE AND PREDICTION INTERVALS
A 100.1 ˛/% observed confidence interval for X2 =Y2 can now be derived as
 
sX2 1 s2 1
; X2 .
sY2 F .m 1; n 1; ˛=2/ sY F .m 1; n 1; 1 ˛=2/

This confidence interval for X2 =Y2 can be displayed in a variety of ways. Notice that if X 
F .m; n/, then 1=X  F .n; m/. It follows that F .n; m; 1 ˛/ D 1=F .m; n; ˛/. (Prove this!) The
confidence interval for X2 =Y2 can also be expressed as
 
sX2 sX2
F .n 1; m 1; 1 ˛=2/; F .n 1; m 1; ˛=2/
sY2 sY2
 2 
sX 1 sX2
D ; F .n 1; m 1; ˛=2/
sY2 F .m 1; n 1; ˛=2/ sY2
 2 
s s2 1
D X2 F .n 1; m 1; 1 ˛=2/; X2 .
sY sY F .m 1; n 1; 1 ˛=2/

Example: Continuing Example 4.16 involving beetle populations, set ˛ D 0:025. At the 95%
confidence level, F .15; 9; 0:025/ D 3:7694 and F .15; 9; 0:975/ D 0:3202. Therefore, a 95% con-
fidence interval for N2 =S2 is given by
 
0:355 1 0:355 1
 ;  D .0:317 1; 3: 732 9/.
0:297 3:7694 0:297 0:3202

Since the interval contains the value 1, we cannot conclude the variances are different at the 95%
confidence level.
We summarize, on Table 4.3, all the two-sample confidence intervals obtained for the
normal distribution.

4.2.3 DIFFERENCE OF TWO BINOMIAL PROPORTIONS


Let X  Binom.m; pX / and Y  Binom.n; pY / be two binomial random variables with param-
eters pX and m, and pY and n; respectively. From the CLT, we know that for sufficiently large
sample sizes, the distributions of X m and Y n can be approximated as
r ! r !
pX .1 pX / pY .1 pY /
Xm  N pX ; and Y n  N pY ; :
m n

By independence of the samples, the distribution of XNm YNn can be approximated as


r !
pX .1 pX / pY .1 pY /
X m Y n  N pX pY ; C :
m n
4.2. CONFIDENCE INTERVALS FOR TWO SAMPLES 95
Table 4.3: Confidence intervals for the parameters of the normal distribution (two samples)

Parameter Conditions Confidence Interval


r
2 2
0;X 0;Y
.xN m yNn / z˛=2 C ;
X2 D 0;X
2 and m n
d D X Y r !
2 2
Y D 0;Y known 2 2
0;X 0;Y
.xN m yNn / C z˛=2 m C n
 q
1
.xN m yNn / t.m C n 2; ˛=2/sp m C n1 ;
d D X Y X2 ; Y2 unknown, X2 D Y2 q 
1 1
.xN m yNn / C t .m C n 2; ˛=2/sp m C n
r
2
sX 2
sY
.xN m yNn / t.; ˛=2/ m C n ;
r !
2
sX s2
d D X Y X2 ; Y2 unknown, X2 ¤ Y2 .xN m yNn / C t .; ˛=2/ m C nY ;
6  2 7
6 7
6 1 1
m rC n
7 2
D4 1
5 ; r D sX
r2C 2 1 2 sY
m2 .m 1/ n .n 1/

2
 2

X sX 1 s2 1
rD 2 X ; Y unknown 2 ; X
F .m 1;n 1;˛=2/ s 2 F .m 1;n 1;1 ˛=2/
Y sY Y

Therefore,
.X m Y n/ .pX pY /
ZDr  N.0; 1/ (approximate):
pX .1 pX / pY .1 pY /
C
m n
Approximating pX by pNX D xN m and pY by pNY D yNn , we obtain the 100.1 ˛/% (approximate)
confidence interval for pX pY as

r r !
pNX .1 pNX / pNY .1 pNY / pNX .1 pNX / pNY .1 pNY /
.pNX pNY / z˛=2 C ; .pNX pNY / C z˛=2 C :
m n m n

The error " D j.pNX pNY / .pX pY /j can be bounded (approximately) as


r
pNX .1 pNX / pNY .1 pNY /
"  z˛=2 C
m n
at a 100.1 ˛/% confidence level.

Example 4.17 An experiment was conducted by a social science researcher which involved as-
sessing the benefits of after-school enrichment programs in a certain low-income school district
96 4. CONFIDENCE AND PREDICTION INTERVALS
in Cleveland, Ohio. A total of 147 three- and four-year-old children were involved in the study.
Children were randomly assigned to two groups, one group of 73 students which participated
in the after-school programs, and a control group with 74 students which did not participate
in these programs. The children were followed as adults, and a number of data items were col-
lected, one of them being their annual incomes. In the control group, 23 out of the 74 children
were earning more that $75;000 per year whereas in the group participating in the after-school
programs, 38 out of the 73 children were earning more than $75;000 per year. Let pX and pY
denote the proportion of three- and four-year-old children who do and do not participate in
after-school programs making more than $75;000 per year, respectively. A 95% confidence in-
terval for pX pY is given by
r
0:520.1 0:520/ 0:311.1 0:311/
.0:520 0:311/ ˙ .1:96/ C H) .0:053; 0:365/:
73 74
As a result of the study, we can be 95% confident that between 5:3% and 36:5% more children
not having participated in after-school enrichment programs were earning less than $75;000 per
year than students who did participate in these programs.

4.2.4 PAIRED SAMPLES


In many situations we wish to compare the means or proportions of two populations, but we
cannot assume that the random samples from each population are independent. For example, if
we want to compare exam scores before and after a learning module is taken, the samples are not
independent because they are the exam scores of the same students before and after the learning
module. Another example would be a weight loss or drug effectiveness program. Clearly, the
random sample must involve the same people, tested before the program and after the program.
Let X1 ; X2 ; : : : ; Xm and Y1 ; Y2 ; : : : ; Ym be random samples of the same size m from two
populations where Xi is paired with Yi . The variables X and Y from which the corresponding
samples are drawn are dependent which means the procedure for independent samples cannot
be used. The solution is to consider the random variable for the difference between X and Y;
DDX Y and Di D Xi Yi ; i D 1; 2; : : : ; m:
Clearly, D1 ; D2 ; : : : ; Dm is a random sample from D , the population of differences, that has
mean D D X Y . We will assume that D  N.D ; D /. This is now just like a one-sample
CI with an unknown variance. Since the variance D2 is unknown, it must be estimated from the
sample of differences as
m
2 1 X 2
SD D Di D m .
m 1
i D1
D m D
Therefore, p D T  t.m 1/: A 100.1 ˛/% confidence interval for the difference
SD = m
between the two means D can now be obtained as
4.3. PREDICTION INTERVALS 97

 
sD sD
dm t .m 1; ˛=2/ p ; d m C t.m 1; ˛=2/ p :
m m
As usual, di is the observed difference of the i th pair xi yi :

Example 4.18 Ten middle-aged men with high blood pressure engage in a regimen of aerobic
exercise on a newly introduced type of tread mill. Their blood pressures are recorded before
starting the exercise program. After using the treadmill for 30 minutes each day for six months,
their blood pressures are again recorded. During the six-month period, the ten men do not
change their lifestyles in any other significant way. The two blood pressure readings in mmHg
(before and after the period of aerobic exercise) are given in the table below.
Male i BP (before).Xi / BP (after) .Yi / Difference Di
1 143 144 1
2 171 164 7
3 160 149 11
4 182 175 7
5 149 142 7
6 162 162 0
7 177 173 4
8 165 156 9
9 150 148 2
10 165 161 4

Take ˛ D 0:05. From the table, we compute d 10 D 5 and sD D 3:887. A 95% confidence interval
for D is given by
 
3:887 3:887
5 t.9; 0:025/ p ; 5 C t .9; 0:025/ p D .2:22; 7:78/.
10 10
We are 95% confident that exercising on the tread mill can lower blood pressure roughly between
2 and 8 points.

4.3 PREDICTION INTERVALS


Confidence intervals are used to estimate unknown parameters such as  or  in a normal distri-
bution, or p in the binomial distribution. Prediction intervals are used not to estimate parameters
but rather to estimate future sample values drawn from the distribution. The parameters in the
population such as  or  may be unknown.
Suppose X1 ; X2 ; : : : ; Xn is a random sample from a random variable X . Consider making
another observation from X , namely XnC1 . Given a value of ˛ , we would like to derive an
98 4. CONFIDENCE AND PREDICTION INTERVALS
interval with random endpoints (as in the confidence interval case) such that the probability of
the .n C 1/st observation XnC1 lying in the interval, based on the sample X1 ; : : : ; Xn ; is 1 ˛ .

Definition 4.19 Let X1 ; X2 ; : : : ; Xn be a random sample from a random variable X . A


100.1 ˛/% prediction interval for XnC1 is an interval (having random endpoints) of the form
l.X1 ; X2 ; : : : ; Xn / and u.X1 ; X2 ; : : : ; Xn / such that

P .l .X1 ; X2 ; : : : ; Xn / < XnC1 < u .X1 ; X2 ; : : : ; Xn // D 1 ˛:

To obtain the actual prediction interval, we substitute the observed values of the
sample into the functions l.X1 ; X2 ; : : : ; Xn / and u.X1 ; X2 ; : : : ; Xn / to get an interval
.l.x1 ; : : : ; xn /; u.x1 ; : : : ; xn // of real numbers. Unlike for confidence intervals, the statement
P .l.x1 ; : : : ; xn / < XnC1 < u.x1 ; : : : ; xn // does have meaning since XnC1 is a random variable.
We are not capturing a number in the interval, but a random variable.
Similar to confidence intervals, constructing prediction intervals depends on identifying
pivotal quantities. In a prediction interval, a pivotal quantity may depend on X1 ; : : : ; XnC1 .

Definition 4.20 Let X1 ; X2 ; : : : ; Xn be a random sample from a random variable X . A pivotal


quantity for XnC1 is a random variable h.X1 ; X2 ; : : : ; Xn ; XnC1 / whose distribution is inde-
pendent of the sample values. It is assumed that the underlying population follows a normal
distribution.

We will obtain pivotal quantities and construct prediction intervals depending on whether
 or  is known or unknown.

 D 0 and  2 D 02 are Known


This case is straightforward since no parameters have to be estimated. We only want to predict
X 0
the next sample value from the previous sample values. Since 0 and 0 are known, nC1 0
D
Z  N.0; 1/, and therefore
 
XnC1 0 
P z˛=2 < < z˛=2 D P 0 z˛=2 0 < XnC1 < 0 C z˛=2 0 D 1 ˛:
0

Therefore, a 100.1 ˛/% prediction interval for XnC1 is simply



0 z˛=2 0 ; 0 z˛=2 0 :

In other words, we can predict a sample value from N.0 ; 0 / will be in .0 z˛=2 0 ; 0 C
z˛=2 0 / with probability 1 ˛:
4.3. PREDICTION INTERVALS 99
 is Unknown and  D 2
02 is Known
Since Var.XnC1 X/ D 02 C 02 =n, by independence, we have
XnC1 X XnC1 X
q D q D Z  N.0; 1/.
2 1
0
2
0 C n 0 1 C n

XnC1 X
Therefore, the random variable q is a pivotal quantity for XnC1 . The 100.1 ˛/%
0 1 C n1
prediction interval for XnC1 is given by
r r !
1 1
x z˛=2 0 1 C ; x C z˛=2 0 1 C :
n n

 D 0 is Known and  2 is Unknown


In this case,
XnC1 0
D T  t .n 1/;
SX
and so the 100.1 ˛/% prediction interval for XnC1 is now given by

.0 t.n 1; ˛=2/sX ; 0 C t.n 1; ˛=2/sX / :

Both  and  2 are Unknown


In this case,
XnC1 X XnC1 X
q D q D T  t .n 1/:
2
S
SX2 C nX SX 1 C n1
The 100.1 ˛/% prediction interval for XnC1 is now given by
r r !
1 1
x t.n 1; ˛=2/sX 1 C ; x C t .n 1; ˛=2/sX 1C :
n n

One of the uses of prediction intervals is in the detection of outliers, extreme values of
the random variable or a value that comes from a population whose mean is different from the
one under consideration. Given a choice of ˛ , the sample value XnC1 will be considered an
outlier if XnC1 is not in the 100.1 ˛/% prediction interval for XnC1 .
Example 4.21 An online furniture retailer has collected the times it takes an adult to assem-
ble a certain piece of its furniture. The data in the table below represents a random sample
X1 ; X2 ; : : : ; X36 of the number of minutes it took 36 adults to assemble a certain advertised
“easy to assemble” outdoor picnic table. Assume the data is drawn from a normal population.
100 4. CONFIDENCE AND PREDICTION INTERVALS
Assembly Time for Adults
17 13 18 19 17 21 29 22 16 28 21 15
26 23 24 20 8 17 17 21 32 18 25 22
16 10 20 22 19 14 30 22 12 24 28 11
We will use the data in the table to construct a 95% prediction interval for the next assembly
time. We compute the mean and standard deviation of the sample as x 36 D 19:92 and sX D 5:73.
The 95% prediction interval is given by
r r !
1 1
19:92 t.35; 0:025/ .5:73/ 1 C ; 19:92 C t.35; 0:025/ .5:73/ 1 C
36 36
D .8:13; 31:71/.
We can write a valid probability statement as P .8:13 < X37 < 31:71/ D 0:95:
Any data value falling outside the prediction interval .8:13; 31:71/ could be considered an
outlier at the 95% level.

4.4 PROBLEMS
4.1. An urn contains only black and white marbles with unknown proportions. If a random
sample of size 100 is drawn and it contains 47 black marbles, find the following.
(a) The percentage of black marbles in the urn is estimated as with a standard
error of
(b) The SE measures the likely size of the error due to chance in the estimate of the
percentage of black marbles in the urn. (T/F)
(c) Suppose your estimate of the proportion of black marbles in the urn is p:O Then pO
is likely to be off from the true proportion of black marbles in the urn by the SE.
(T/F)
(d) A 95% CI for the proportion of black marbles in the urn is to

(e) What is a 95% CI for the proportion of black marbles in the sample, or does that
make sense? Explain.
(f ) Suppose we know that 53% of the marbles in the urn are black. We take a random
sample of 100 marbles and calculate the SE as 0:016: Find the chance that the pro-
portion of black marbles in the sample is between 0:53 0:032 and 0:53 C 0:032:
4.2. Suppose a random sample of 100 male students is taken from a university with 546
male students in order to estimate the population mean height. The sample mean is
x D 67:45 inches with an SD of 2.93 inches.
4.4. PROBLEMS 101
(a) Assuming the sampling is done with replacement, find a 90% CI for the population
mean height.
(b) Assuming the sampling is done without replacement, find a 90% CI.
4.3. Suppose 50 random samples of size 10 are drawn from a population which is normally
distributed with mean 40 and variance 3. A 95% confidence interval is calculated for
each sample.
(a) How many of these 50 CIs do you expect to contain the true population mean
 D 40?
(b) If we define the rv X to be the number of intervals out of 50 which contain the
true mean  D 40; what is the distribution of X ? Find P .X D 40/; P .X  40/;
and P .X > 45/:
4.4. Find a conservative estimate of the sample size needed in order to ensure the error in a
poll is less than 3%. Assume we are using 95% confidence.
4.5. Find the 99% confidence limits for the population proportion of voters who favor a
candidate. The sample size is 100, and the sample percentage of voters favoring the
candidate was 55%.
4.6. A random sample is taken from a population which is N.; 5/: The sample size is 20
and results in the sample mean x D 15:2:
(a) Find the CI for levels of confidence 70; 80; and 90%.
(b) Repeat the problem assuming the sample size is 100.
4.7. Show that a lower 100.1 ˛/% one-sided confidence interval for the unknown mean
p
 with unknown variance is given by .x t .n 1; ˛/sX = n; C1/.
4.8. Suppose X1 ; : : : ; Xn is a random sample from a continuous random variable X with
population median m: Suppose that we use the interval with random endpoints
.Xmin ; Xmax / as a CI for m:

(a) Find P .Xmin < m < Xmax / giving the confidence level for the interval
.Xmin ; Xmax /: Notice that it is not 100% because this involves a random sample.
(b) Find P .Xmin < m < Xmax / if the sample size is n D 8:
4.9. A random sample of 225 flights shows that the mean number of unoccupied seats is
11.6 with SD D 4.1. Assume this is the population SD.
(a) Find a 90% CI for the population mean.
(b) The 90% CI you found means (choose ONE):
102 4. CONFIDENCE AND PREDICTION INTERVALS
(a) The interval contains the population mean with probability 0.9.
(b) If repeated samples are taken, 90% of the CIs contain the population mean.
(c) What minimum sample size do you need if you want the error reduced to 0.2?

4.10. Suppose you want to provide an accurate estimate of customers preferring one brand of
coffee over another. You need to construct a 95% CI for p so that the error is at most
0:015. You are told that preliminary data shows p D 0:35: What sample size should you
choose?
4.11. Consider the following data points for fuel mileage in a particular vehicle.

42 36 38 45 41 47 33 38 37 36
40 44 35 39 38 41 44 37 37 49

Assume these data points are from a random sample. Construct a 95% CI for the mean
population mpg. Is there any reason to suspect that the data does not come from a nor-
mal population? What would a lower one-sided CI be, and how would it be interpreted?
4.12. The accuracy of speedometers is checked to see if the SD is about 2 mph. Suppose a ran-
dom sample of 35 speedometers are checked, and the sample SD is 1.2 mph. Construct
a 95% CI for the population variance.
4.13. The ACT exam for an entering class of 535 students had a mean of 24 with an SD of
3.9. Assume this class is representative of future students at this college.

(a) Find a 90% CI for the mean ACT score of all future students.
(b) Find a 90% for the ACT score of a future student.

4.14. A random sample of size 32 is taken to assess the weight loss on a low-fat diet. The
mean weight loss is 19.3 pounds with an SD of 10.8 pounds. An independent random
sample of 32 people who are on a low calorie diet resulted in a mean weight loss of
15.1 pounds with an SD of 12.8 pounds. Construct a 95% CI for the mean difference
between a low calorie and a low fat diet. Do not pool the data.
4.15. A sample of 140 LEDs resulted in a mean lifetime of 9.7 years with an SD of 6 months.
A sample of 200 CFLs resulted in a mean lifetime of 7.8 years with an SD of 1.2 years.
Find a 95% and 99% CI for the difference in the mean lifetimes. Is there evidence that
the difference is real? Do not pool the data.
4.16. The SD of a random sample of the service times of 200 customers was found to be 3
minutes.

(a) Find a 95% CI for the SD of the service times of all such customers.
4.4. PROBLEMS 103
(b) How large a sample is needed in order to be 99.93% confident that the true pop-
ulation SD will not differ from the sample SD by more than 5%.
4.17. Suppose the sample mean number of sick days at a factory is 6.3 with a sample SD of
4.5. This is based on a sample of 25.
(a) Find a 98% CI for the population mean number of sick days.
(b) Calculate the sample size needed so that a 95% CI has an error of no more than
0.5 days.
4.18. A random sample of 150 colleges and universities resulted in a sample mean ACT score
of 20.8 with an SD of 4.2. Assuming this is representative of all future students,
(a) find a 95% CI for the mean ACT of all future students, and
(b) find a 95% for the ACT score of a single future student.
4.19. A logic test is given to a random sample of students before and after they completed a
formal logic course. The results are given below. Construct a 95% confidence interval
for the mean difference between the before and after scores.

After 74 83 75 88 84 63 93 84 91 77
Before 73 77 70 77 74 67 95 83 84 75

4.20. Two independent groups, chosen with random assignments, A and B, consist of 100
people each of whom have a disease. An experimental drug is given to group A but not
to group B, which are termed treatment and control groups, respectively. Two simple
random samples have yielded that in the two groups, 75 and 65 people, respectively,
recover from the disease. To study the effect of the drug, build a 95% confidence interval
for the difference in proportions pA pB .
105

CHAPTER 5

Hypothesis Testing
This chapter is one of the cornerstones of statistics because it allows us to reach a decision based
on an experiment with random outcomes. The basic question in an experiment is whether or
not the outcome is real, or simply due to chance variation. For example, if we flip a coin 100
times and obtain 57 heads, can we conclude the coin is not fair? We know we expect 50 heads,
so is 57 too many, or is it due to chance variation? Hypothesis testing allows us to answer such
questions. This is an extremely important issue, for instance in drug trials in which we need to
know if a drug truly is efficacious, or if the result of the clinical trial is simply due to chance, i.e.,
the possibility that a subject will simply improve or get worse on their own. The purpose of this
chapter is to show how hypothesis testing is implemented.

5.1 A MOTIVATING EXAMPLE


In 2004, the drug company Merck was conducting a clinical trial1 to determine if its drug
VIOXX© had any effect in treating polyps in the colon. VIOXX© is a drug originally designed
to treat chronic inflammation for arthritis patients so this would have been an added benefit
of the drug if the clinical trial turned out positively. The trial involved 2,586 patients in a con-
trolled, randomized, double-blind experiment. In the experiment, 1,299 patients were assigned
to the control group and 1,287 to the treatment group. Among other things, the safety of the
drug was of paramount importance, and the clinical trial monitored the drug for safety. At the
conclusion of the trial, it was observed that 26 of the control group and 46 of the treatment
group experienced a cardiovascular event (CV). Can this be attributed to the drug? Or is it due
to random chance?
The first part of that question is answered by the way the experiment was designed. By
making the control and treatment groups statistically identical except that one group took the
drug and the other did not, we can be reasonably sure that if the difference in CV events is real,
then we can attribute it to the drug. But how do we know the difference is real? Couldn’t it
be due to—simply by chance—assigning more people who were destined to have a CV event
to the treatment group rather than to the control group? To answer that question, because we
randomly assigned subjects to the treatment and control groups, we may apply a probabilistic
method to quantify this. That method is hypothesis testing.

1 Cardiovascular events associated with rofecoxib in a colorectal adenoma chemoreception trial, R.S. Bristlier et al., March
17, 2005, N. Engl. J. Med., 2005; 352:1092–1102.
106 5. HYPOTHESIS TESTING
What follows is how hypothesis testing would work in this situation. We will introduce
the necessary terminology as we explain the method.
First, let’s denote the true population proportion of people who will have a CV event as
pT if they are in the treatment group, and pC if they are in the control group. Hypothesis testing
assumes that the difference between these two quantities should be zero, i.e., that there is no
difference between the two groups. We say that this is the Null Hypothesis, and write it as H0 W
pT pC D 0: Next, since we observed p T D 0:0357 and p C D 0:0200, the observed difference
p T p C D 0:0157 > 0 and so establishes the Alternative Hypothesis as H1 W pT pC > 0.
What are the chances of observing a difference of 0.0157 in the proportions, if, in fact the
difference should be 0 under the assumption of the null hypothesis? The random assignment
of subjects to control and treatment groups allows us to use probability to actually answer this
question. We will see by the methods developed in this chapter that we will be able to calculate
that the chance of observing a difference of 0:0157 (or more) is 0:00753; under the assumption
of the null hypothesis that there should be no difference. This value is called the p-value of the
test or the level of significance of the test.
Now we are ready to reach a conclusion. Under the assumption there is no difference,
we calculate that the chance of observing a difference of 0.0157 is only 0.7%. But, this is what
actually occurred even though it is extremely unlikely. The more likely conclusion, supported
by the evidence, is that our assumption is incorrect, i.e., it is more likely H0 is wrong.2 The
conclusion is that we reject the null hypothesis in favor of the alternative hypothesis. How small
does the p-value have to be for a conclusion to reject the null? Statisticians use the guide in
Table 5.1 in Section 5.3.2 to reach a conclusion based on the p-value.

Remark 5.1 What does it mean to reject the null? It means that our assumption that the null
is true leads to a very low probability of it actually being true. So either we just witnessed a very
low probability event, or the null is not true. We say that the null is rejected if the p-value of
the test is low enough. If the p-value is not low enough, it means that it is plausible the null is
true, i.e., there is not enough evidence against it, and therefore we do not reject it.

What we just described is called the p-value approach to hypothesis testing, and the p-
value is essentially the probability we reach a wrong conclusion if we assume the null. Another
approach is called the critical value approach. We’ll discuss this in more detail later, but in this
example here’s how it works.
First, we specify a level of significance ˛; which is the largest p-value we are willing to
accept to reject the null. Say we take ˛ D 0:01. Then, we calculate the z value that gives the
proportion ˛ to the right of z under the standard normal curve, i.e., the 99th percentile of
the normal curve. In this case z D 2:326: We are choosing the area to the right because our
alternative hypothesis is H1 W pT pC > 0: This z -value is called the critical value for a 1%
level of significance. Now, when we carry out the experiment, any value of the test statistic
2 In 2004, on the basis of statistical evidence, Merck pulled VIOXX© off the market.
5.2. THE BASICS OF HYPOTHESIS TESTING 107
observed expected
zD  2:326 will result in our rejecting the null, and we know the corre-
SE
sponding p-value must be less than ˛ D 0:01: The advantage of this approach is that we know
the value of the test statistic we need to reject the null before we carry out the experiment.

Example 5.2 We have a coin which we think is not fair. That means we think p ¤ 0:5, where
p is the probability of flipping a head. Suppose we toss the coin 100 times and get 60 heads.
The sample proportion of heads we obtain is p D 0:6. Thus, our hypothesis test is H0 W p D 0:5
vs. H1 W p > 0:5 because we got 60% heads. We start off assuming it is a fair coin.
Now suppose we are given a level of significance p ˛ D 0:05. We know, using the normal
approximation, that the sample proportion P  N.0:5; 0:5  0:5=100/. The critical value for
˛ D 0:05 is the z -value giving 0.05 area to the right of z , and this is z D 1:644: This is the
critical value for this test. The value of the test statistic is z D 0:60:050:5 D 2:0: Consequently, since
2:0 > 1:644, we conclude that we may reject the null at the 5% level of significance.
The p-value approach gives more information since it actually tells us the chance of being
wrong assuming the null. In this case the p-value is P .P  0:6/ D normalcdf.:6; 1; :5; :05/ D
0:0227; or by standardizing,
!
P 0:5 0:6 0:5
P .P  0:6 j p D 0:5/ D P p p
0:5  0:5=100 0:5  0:5=100

 P .Z  2:0/ D 0:0227:

We have calculated the chance of getting 60 or more heads if it is a fair coin as only 2%, so we have
significant evidence against the null. Again, since 0:02 < 0:05, we reject the null and conclude
it is not a fair coin. We could be wrong, but the chance of being wrong is only 2%. Another
important point is our choice of null H0 W p D 0:5 which specifies exactly what p should be for
this coin if it’s fair.

5.2 THE BASICS OF HYPOTHESIS TESTING


To start with, suppose that X1 ; X2 ; : : : ; Xn is a random sample from a random variable X with
pdf fX .xI #/. We would like to devise a test to decide whether we should reject the value # D #0
in favor of some other value of # . Consider the two hypotheses

H 0 W # D #0 vs. H1 W # ¤ #0 : (5.1)

The hypothesis H0 W # D #0 is called the null hypothesis. The hypothesis H1 W # ¤ #0 is called


the two-sided alternative hypothesis.
108 5. HYPOTHESIS TESTING
The Critical Value Approach and the Connection with CIs
Let .l.X1 ; X2 ; : : : ; Xn /; u.X1 ; X2 ; : : : ; Xn // be a 100.1 ˛/% confidence interval for # . We are
100.1 ˛/% confident that .l; u/ contains the true value of #; and so if we have a result from
a random sample that is not in this interval, that is sufficiently good evidence our assumption
about the value of # may not be a good one. Accordingly, we will reject H0 if

#0  l.X1 ; X2 ; : : : ; Xn /; .#0 appears to be too small/

or if
#0  u.X1 ; X2 ; : : : ; Xn /; .#0 appears to be too large/:
As before,

P .#0  l.X1 ; X2 ; : : : ; Xn / or #0  u.X1 ; X2 ; : : : ; Xn / j # D #0 / D ˛:

That is, the probability of rejecting the null hypothesis when the null is true (# D #0 ) is ˛ . The
form of the pivotal quantity for the CI under the null hypothesis is called the test statistic.
The value of ˛ is called the significance level (or simply level) of the test, and this is the
largest probability we are willing to accept for making an error in rejecting a true null. Rejecting
the null hypothesis when the null is true is commonly called a Type I error. A Type II error
occurs when we do not reject the null hypothesis when it is false. The probability of making a
Type II error is denoted ˇ . To summarize,

˛ D P .reject H0 j H0 is true/ and ˇ D P .fail to reject H0 j H0 is false/:

The table may make this easier to remember.

Conclusion
Retain H0 Reject H0
Actually H0 true Correct Type I error; probability ˛
H0 false Type II error; probability ˇ Correct

It turns out that the value of ˇ is not fixed like the value of ˛ and depends on the value of
# taken in the alternative hypothesis H1 W # D #1 ¤ #0 . In general, the value of ˇ is different
for every choice of #1 ¤ #0 , and so ˇ D ˇ.#1 / is a function of #1 .

Remark 5.3 A Type I error rejects a true null, while a Type II error does not reject a false null.
In general, making ˛ small is of primary importance. Here’s an analogy to explain why this is.
Suppose a male suspect has been arrested for murder. He is guilty or not guilty. In the U.S., the
null hypothesis is H0 W suspect is not guilty, with alternative H1 W suspect is guilty. A Type I
error, measured by ˛; says that the suspect is found guilty by a jury, but he is really not guilty. A
5.3. HYPOTHESES TESTS FOR ONE PARAMETER 109
Type II error, measured by ˇ , says that the suspect is found not guilty by the jury, when, in fact,
he is guilty. Both of these are errors, but finding an innocent man guilty is considered the more
serious error. That’s one of the reasons we focus on keeping ˛ small.

The set of hypotheses in (5.1) is called a two-sided test. In such a test, we are interested
in deciding if # D #0 or not. Sometimes we are only interested in testing whether H0 W # D #0
vs. H1 W # < #0 or vs. H1 W # > #0 . These types of tests are called one-sided. Often one decides
the form of the alternative on the basis of the result of the experiment. For example, in our coin
tossing experiment we obtained p D 0:6, so that a reasonable alternative is H1 W p > 0:5:
To construct a test of hypotheses for one-sided tests using the critical value approach, we
simply use one-sided confidence intervals instead of two-sided. Specifically, for a given level
of significance ˛; for the alternative hypothesis H1 W # < #0 ; we will reject H0 W # D #0 in favor
of H1 if #0 is not in the 100.1 ˛/% confidence interval . 1; u.X1 ; X2 ; : : : ; Xn // for # . In
particular, we reject the null hypothesis if #0  u.x1 ; : : : ; xn / if X1 D x1 ; X2 D x2 ; : : : ; Xn D xn
are the observed sample values. For the alternative hypothesis, H1 W # > #0 , we reject H0 if
#0  l.x1 ; : : : ; xn /. Finally, the the set of real number S for which we reject H0 if #0 2 S is
called the critical region of the test. To summarize, we reject the null hypothesis if

#0  u.x1 ; : : : ; xn /, for H1 W # < #0


#0  l.x1 ; : : : ; xn /, for H1 W # > #0
#0  u.x1 ; : : : ; xn / or #0  l.x1 ; : : : ; xn / for H1 W # ¤ #0 .

5.3 HYPOTHESES TESTS FOR ONE PARAMETER


In this section we develop tests of hypotheses (both one- and two-sided) for one unknown
parameter. We will focus on the normal parameters  and  , and on the binomial proportion p .
All the hypotheses tests for means or proportions are based on a test statistic of the form
observed-expected
. This always involves calculating the correct SE. In hypothesis testing, the
SE
expected in the numerator is always calculated on the basis of the null hypothesis.

5.3.1 HYPOTHESES TESTS FOR THE NORMAL PARAMETERS,


CRITICAL VALUE APPROACH
Tests of hypotheses for the parameters  and  of a random variable X  N.; / can be easily
derived from the confidence intervals described in Chapter 4. The decision to either reject or
not reject the null hypothesis is given in terms of critical values and the values of the respective
test statistics for a given random sample from X . The value of a statistic when the data values
from a random sample are substituted for the random variables in the statistic is called a score.
110 5. HYPOTHESIS TESTING
Test for ,  D 0 Known, Null H0 W  D 0
p N 0
The SE is 0 = n. The test statistic is Z D X =pn
with score z . For the alternative
0

8
ˆ
< < 0 ; reject H0 if z  z˛
H1 W  >  0 ; reject H0 if z  z˛ (5.2)

 ¤ 0 ; reject H0 if jzj  z˛=2 :

Test for ,  Unknown, Null H0 W  D 0


p XN 
The SE is SX = n. The test statistic is T D p0
SX = n
with score t . For the alternative
8
ˆ
< < 0 ; reject H0 if t  t .n 1; ˛/
H1 W  > 0 ; reject H0 if t  t.n 1; ˛/ (5.3)

 ¤ 0 ; reject H0 if jtj  t.n 1; ˛=2/:

Test for  2 ,  D 0 Known, Null H0 W  2 D 02


n 
P 2
Xi 0
The test statistic is X 2 D 0
with score 2 . For the alternative
i D1
8
2 2
ˆ
< < 0 ; reject H0 if 2  2 .n; 1 ˛/
H1 W  2 > 02 ; reject H0 if 2  2 .n; ˛/ (5.4)
:̂ 2
 ¤ 02 ; reject H0 if 2  2 .n; 1 ˛=2/ or 2  2 .n; ˛=2/:

Test for  2 ,  Unknown, Null H0 W  2 D 02


n 
P 2
Xi XN
The test statistic is X 2 D 0
D n 21 SX2 with score 2 . For the alternative
i D1 0

8
2 2
ˆ
< < 0 ; reject H0 if 2  2 .n 1; 1 ˛/
H1 W  2 > 02 ; reject H0 if 2  2 .n 1; ˛/ (5.5)
:̂ 2
 ¤ 02 ; reject H0 if 2  2 .n 1; 1 2
˛=2/ or    .n2
1; ˛=2/:

Recall that 2 .n; ˛/ is the critical value of the 2 distribution with n degrees of freedom. The
area under the 2 pdf to the right of the critical value is ˛ .
We present several examples illustrating these tests.

Example 5.4 Suppose we want to test the hypothesis H0 W  D 7 vs. H1 W  ¤ 7: A random


sample of size 25 from a large population yields the sample mean x D 6:3, with a sample standard
deviation of sX D 1:2: Is the difference between the assumed mean and the sample mean real,
or is it due to chance? Assume normal populations and a level of significance of ˛ D 0:05.
5.3. HYPOTHESES TESTS FOR ONE PARAMETER 111
To answer the question, we calculate the value of the test statistic T as
x 0 6:3 7
tD p D D 0:116:
sX = n 1:2=5

We are using the T statistic because we do not know the population SD and so must use the
sample SD. Since ˛ D 0:05, the area below the critical value must be 0.975. The critical value
is therefore t .24; 0:025/ D invT.:975; 24/ D 2:064. For the two-sided test we reject H0 if t …
. 2:064; 2:064/, but our value of t D 0:116 is clearly in the interval. We conclude that we
cannot reject the null at the 0.05 level of significance and state that the result in our experiment
could be due to chance. Whenever we have such a result, it is said that we do not reject the
null or that the null is plausible. It is never phrased that we accept the alternative. If we want
to calculate the p-value we need P .t.24/  0:116/ C P .t .24/  0:116/ D 0:9086 > ˛ , and so
there is a large probability we will make an error if we reject the null.
If we had decided to use the one-sided alternative H1 W  < 7 on the basis that our ob-
served sample mean is x D 6:3, our p-value would turn out to be P .t .24/ < 0:116/ D 0:4545,
and we still could not reject the null.

Next we present an example in which we have a two-sided test for the variance.

Example 5.5 A certain plumbing supply company produces copper pipe of fixed various
lengths and asserts that the standard deviation of the various lengths is 1:3 cm. The mean length
is therefore assumed to be unknown. A contractor who is a client of the company decides to
test this claim by taking a random sample of 30 pipes and measuring their lengths. The stan-
dard deviation of the sample turned out to be s D 1:1 cm. The contractor tests the following
hypotheses, and sets the level of the test to be ˛ D 0:01.

H0 W  2 D 1:69 vs. H1 W  2 ¤ 1:69:

Therefore, under an assumption of normality of the lengths, we have the test statistic

.n 1/s 2 .30 1/.1:21/


2 D D D 20:763:
02 1:69

The 2 -score is therefore 20:763. According to (5.5), we should reject H0 W  2 D 1:69 if 2 


2 .29; 0:005/ D 52:3 or 2  2 .29; 0:995/ D 13:1. Since 2 D 20:763, we fail to reject the null
hypothesis. There is not enough evidence in the data to reject the company’s claim that  D
1:3 cm at the ˛ D 0:01 level.

Example 5.6 At a certain food processing plant, 36 measurements of the volume of tomato
paste placed in cans by filling machines were made. A production manager at the plant thinks
that the data supports her belief that the mean is greater than 12 oz. causing lower company
112 5. HYPOTHESIS TESTING
profits. Because the sample size n D 36 > 30 is large, we assume normality. The sample mean
is x 36 D 12:19 oz. for the data. The standard deviation is unknown, but the sample standard
deviation sX D 0:11. The manager sets up the test of hypotheses.

H0 W  D 12 vs. H1 W  > 12:

She sets the level of the test at ˛ D 0:05. We have

.12:19 12/
tD p D 10:36:
0:11= 36

According to (5.3), she should reject H0 W  D 12 if t  t.35; 0:05/ D 1:69. The critical region
is Œ1:69; 1/: Since t D 10:36, the manager rejects the null. The machines that fill the cans must
be recalibrated to dispense the correct amount of tomato paste.

We end this section with a test for a median.

Example 5.7 Hypothesis test for the median. We introduce a test for the median m of a
continuous random variable X . Recall that the median is often a better measure of centrality
of a distribution than the mean. Suppose that x1 , x2 ,    , xn is a random sample of n observed
values from X . Consider the test of hypothesis H0 W m D m0 vs. H1 W m > m0 . Assume that
xi ¤ m0 for every i . Let S be the random variable that counts the number of values greater than
m0 (considered successes). If H0 is true, then by definition of the median, S  Bin.n; 0:5/. If
S is too large, for example if S > k for some k , H0 should be rejected. Since the probability of
committing a Type I error must be at most ˛ , we have

˛  P .rejecting H0 j H0 is true/ D P .S > k j H0 is true/


k
!
X n
D 1 P .S  k j H0 is true/ D 1 .0:5/i .0:5/n i
i
iD0
k
!
X n
D1 .0:5/n :
i
i D0

k
X n

Choose the minimum value of k so that i
.0:5/n  1 ˛.
i D0
To illustrate the test, consider the following situation. Non-Hodgkins Lymphoma (NHL)
is a cancer that starts in cells called lymphocytes which are part of the body’s immune system.
Twenty patients were diagnosed with the disease in 2007. The survival time (in weeks) for each
of these patients is listed in the table below.
5.3. HYPOTHESES TESTS FOR ONE PARAMETER 113

Survival Time (in weeks)


for Non-Hogkins lymphoma
82 665 162 1210 532
476 487 133 129 230
894 449 310 505 55
252 453 630 551 711

We want to test the hypothesis H0 W m D 520 vs. H1 W m > 520 at the ˛ D 0:05 level and find
the p-value of the test. From the data we observe S D 7 data points above 520. Relevant values
of Binom.20; 0:5/ are shown in the table below.

k P .S  k j H0 is true/
7 0:1316
 
12 0:8684
13 0:9423
14 0:9793
15 0:9941

From this table we see that the smallest k for which P .X > k j H0 is true/  ˛ is 14. Therefore
we reject H0 if S > 14. Since S D 7, we retain H0 . As for the p-value we have
6
! 
X 15 1 15
p-value D P .S  7/ D 1 P .S < 7/ D D 0:30362:
i 2
i D0

5.3.2 THE p-VALUE APPROACH TO HYPOTHESIS TESTING


The last section introduced the method of hypothesis testing known as the critical value ap-
proach which tells us to reject the null when the value of the test statistic is in the critical region.
In the introduction we discussed the p-value approach, and we will give further details here. It is
important to remember that the two approaches are equivalent, but the p-value approach gives
more information to the researcher.
Let X1 ; X2 ; : : : ; Xn be a random sample from a random variable X with pdf fX .xI #/.
First consider the one-sided test of hypotheses

H0 W # D #0 vs. H1 W # > # 0 :
114 5. HYPOTHESIS TESTING
Table 5.1: Conclusions for p-values

p-Value Range Interpretation


0 < p  0:01 Highly significant (very strong evidence against H0 ; reject H0 )
0:01 < p  0:05 Significant (strong evidence against H0 , reject H0 )
0:05 < p  0:1 Weakly significant (weak evidence against H0 , reject H0 )
0:1 < p  1 Not significant (insufficient evidence against H0 , do not reject H0 )

Let ‚ denote the test statistic under the null hypothesis, that is, when # D #0 , and let  be the
value of ‚ when the sample data values X1 D x1 , X2 D x2 ; : : : ; Xn D xn are substituted for the
random variables. The
p-value of  is P .‚  /

which is the probability of obtaining a value of the test statistic ‚ as or more extreme than the
value  that was obtained from the data. For example, suppose that ‚  Z when # D #0 . We
would reject H0 if p D P .Z  / < ˛; where ˛ is the specified minimum level of significance
required to reject the null. We may choose ˛ according to Table 5.1. On the other hand, if p  ˛ ,
then we retain, or do not reject, H0 .
The p-value of our observed statistic value  determines the significance of  . If p is the
p-value of  , then for all test levels ˛ such that ˛ > p we reject H0 . For all test levels ˛  p , we
retain H0 . Therefore, p can be viewed as the smallest test level for which the null hypothesis is
rejected.
If ˛ is not specified ahead of time, we use Table 5.1 to reach a conclusion depending on
the resulting p-value.
In a similar way, the alternative H1 W # < #0 is a one-sided test, and the p-value of the
observed test statistic  is p D P .‚  /: If p < ˛ , we reject the null. The level of significance
of the test statistic  is the p-value p . Finally, if the alternative hypothesis is two-sided, namely,
H1 W # ¤ #0 ; then the p-value is given by

p D P .‚  / C P .‚  /:

Remark 5.8 If the distribution of ‚ is symmetric, the p-value of a two-sided test is p D


2P .‚  /; assuming   0: If  < 0; its p-value D 2P .‚  /:
If the distribution of ‚ is not symmetric, such as in the 2 distribution, we calculate
P .‚  / as well as P .‚  /, and the p-value of the two-sided test is p D 2 minfP .‚ 
/; P .‚  /g.
5.3. HYPOTHESES TESTS FOR ONE PARAMETER 115
Example 5.9 Consider Example 4.9 of the preceding chapter. The hypothesis test is H0 W  D
98:6 vs. H1 W  ¤ 98:6ı : The test statistic is Z D X =p
ı 0
n
; and the z score is z D 98:2 p98:6 D
0 0:62= 106:
11
6: 64: The two-sided p-value is computed as P .jZj  6: 64/ D 3:14  10 constituting over-
whelming evidence against H0 :

Example 5.10 A manufacturer of e-cigarettes (e-cigs) claims that the variance of the nicotine
content of its cigarettes is less than 0:77 mg. The sample variance in a random sample of 25
of the company’s e-cigs turned out to be 0:41 mg. A health professional would like to know if
there is enough evidence to reject the company’s claim. The hypothesis test is H0 W  2 D 0:77
vs. H1 W  2 < 0:77: Under an assumption of normality, the test statistic is

.n 1/SX2
 2 .n 1/:
02

Therefore, the 2 score is 2 D 24.0:41/


0:77
D 12:779; and so the one-sided p-value is P .2 .24/ 
12:779/ D 0:0303. The result is significant evidence against H0 W  2 D 0:77. We may conclude
that there is sufficient evidence to say that the variance is indeed less than 0:77.
If we now change the alternative to H1 W  2 ¤ 0:77, what is the p-value? For that we need
to calculate

P .2 .24/  12:779/ D 0:0303 and P .2 .24/  12:779/ D 0:9696:

The p-value is therefore p D 2 minf0:0303; 0:9606g D 0:0606: Since p > 0:05, the result is not
statistically significant, and we do not reject the null. This example illustrates how a one-sided
alternative may lead to rejection of the null, but a two-sided alternative does not reject the null.
If we use the critical value approach with ˛ D 0:05; we calculate the critical values
P . .24/  a/ D 0:025 which implies a D 12:401 and P .2 .24/  b/ D 0:975 which implies
2

b D 39:364. The two-sided critical region is .0; 12:401 [ Œ39:364; 1/. Since our observed value
2 D 12:779 is not in the critical region, we do not reject the null at the 5% level of significance.
There is insufficient evidence to conclude that the variance is not 0:77.

5.3.3 TEST OF HYPOTHESES FOR PROPORTIONS


Let X be a Binomial random variable with parameters n and p . Recall that if np > 5 and
n.1 p/ > 5, the Central Limit Theorem can be invoked to obtain the approximation

X np X p
p Dr D Z  N.0; 1/:
np.1 p/ p.1 p/
n
116 5. HYPOTHESIS TESTING
Before deriving the confidence interval for p , the approximate 100.1 ˛/% confidence interval
r r !
p.1 p/ p.1 p/
x z˛=2 ; x C z˛=2
n n

was obtained, but it was noted that this interval cannot be computed from the sample values since
p itself appears in the endpoints of the interval. Recall that this problem is solved by replacing p
with pN D xN . However, in a hypothesis test, we have a null H0 W p D p0 in which we assume the
given population proportion of p0 , and this is exactly what we substitute whenever a value of
p is required. Here’s how it works.
Consider the two-sided test of hypotheses H0 W p D p0 vs. H1 W p ¤ p0 . We take a ran-
dom sample and calculate the sample proportion p . Since p D p0 under the null hypothesis,
X p0
r  Z (approximate).
p0 .1 p0 /
n
This is where we use the assumed value of p under the null. For a given significance level ˛ , we
will reject H0 if
r r
p0 .1 p0 / p0 .1 p0 /
x D p  p0 C z˛=2 or x D p  p0 z˛=2 :
n n
The p-value of the test is !
p p0
P jZj  p :
p0 .1 p0 /=n
We reject H0 if this p-value is less than ˛ .
Example 5.11 One way to cheat at dice games is to use loaded dice. One possibility is to
make a 6 come up more than expected by making the 6-side weigh less than the 1-side. In
an experiment to test this possible loading, 800 rolls of a die produced 139 occurrences of a 6.
Consider the test of hypotheses
1 1
H0 W p D vs. H1 W p ¤
6 6
where p is the probability of obtaining a 6 on a roll of the die. Computing the value of the test
statistic, we get with p D 139=800 D 0:17375;
139 1
p p0
zDr D v 800 6
 D 0:53759:
p0 .1 p0 / u 
u1 1
u 1
n t6 6
800
5.4. HYPOTHESES TESTS FOR TWO POPULATIONS 117
The corresponding p-value is P .jZj  0:53759/ D 0:5909. We do not reject H0 , and we do not
have enough evidence that the die is loaded. How many 6s need to come up so that we can
say it is loaded? Set ˛ D 0:05. Then we need P .jZj  z/ < 0:05 which implies jzj > 1:96: Since
p :166 n
zD , substitute p D 800 , and solve the inequality for n. We obtain that n  154 or
:01317
n  112. Therefore, if we roll at least 154 6s or at most 112 6s, we can conclude the die is loaded.

One-sided tests for a proportion are easily constructed. For the tests

H0 W p D p0 vs. H1 W p < p0 and H0 W p D p0 vs. H1 W p > p0 ;

we reject H0 W p D p0 if z  z˛ and if z  z˛ ; respectively. Using the p-value approach, we


reject H0 if the p-value
! !
pN p0 pN p0
P Zp < ˛ or P Z  p < ˛:
p0 .1 p0 /=n p0 .1 p0 /=n

Example 5.12 Suppose an organization claims that 75% of its members have IQs over 135.
Test the hypothesis H0 W p D 0:75 vs. H1 W p < :75: Suppose 10 members are chosen at random
and their IQs are measured. Exactly 5 had an IQ over 135. Under the null, we have
!
0:5 0:75
P Zp D P .Z  1:825/ D 0:034:
:75  :25=10

Since 0:03 < 0:05; our result is statistically significant, and we may reject the null. The evidence
supports the alternative.
There is a problem with this analysis in that np D 10.0:75/ D 7:5 > 5, but n.1 p/ D
10.0:25/ D 2:5 < 5, so that the use of the normal approximation is debatable. Let’s use the
exact binomial distribution to analyze this instead. If X is the number of members in a sample
of size 10 with IQs above 135, we have X  Binom.10; 0:75/ under the null. Therefore, P .X 
5/ D binomcdf.10; 0:75; 5/ D 0:078 is the exact probability of getting 5 or fewer members with
IQs below 135. Therefore, we do not have enough evidence to reject the null, and the result is
not statistically significant.

5.4 HYPOTHESES TESTS FOR TWO POPULATIONS


In this section we obtain tests of hypotheses for the difference of two independent samples
X1 ; : : : ; Xm and Y1 ; : : : ; Yn from the confidence intervals that were derived in the preceding
chapter. The underlying assumption is that Xi  N.X ; X / and Yj  N.Y ; Y / or else that
the samples are large enough that we may approximate by normal populations. For a given
118 5. HYPOTHESIS TESTING
constant d0 assumed to be the difference in the population means, the tests we consider are of
the form 8
< H1 W X Y < d0
H0 W X Y D d0 vs. one of H W Y > d0
: 1 X
H1 W X Y ¤ d0 :

The test statistic depends on the standard deviations of the populations or the samples. Here
is a summary of the results. We use the critical value approach to state the results. Computing
p-values is straightforward.

SDs of the Populations are Known


N
.X N
The test statistic is Z D rm Yn / d0  N.0; 1/ when H0 is true. If z is the score, for the alter-
2
X 2
Y
m C n
native
8
< X Y < d0 , reject H0 if z  z˛
H1 W  Y > d0 , reject H0 if z  z˛
: X
X Y ¤ d0 , reject H0 if jzj  z˛=2 .

SDs of the Populations are Unknown, but Equal


.XN mqYNn / d0
The test statistic is T D 1 1
where Sp2 is the pooled variance defined by
Sp mCn

.m 1/SX2 C .n 1/SY2
Sp2 D :
mCn 2

When H0 is true, T  t.m C n 2/. If t is the score, for the alternative


8
< X Y < d0 , reject H0 if t  t.m C n 2; ˛/
H1 W  Y > d0 , reject H0 if t  t.m C n 2; ˛/
: X
X Y ¤ d0 , reject H0 if jtj  t .m C n 2; ˛=2/.

SDs of the Populations are Unknown and not Equal


N
.X N
The test statistic is T D rm Yn / d0 and is approximately distributed as t./ if H0 is true where
2
X 2
Y
m C n

$  %
1 1 2
m
r C n sX2
D 1 1
,r D .
m2 .m 1/
r2 C n2 .n 1/
sY2

If t is the score, for the alternative


5.4. HYPOTHESES TESTS FOR TWO POPULATIONS 119

8
< X Y < d 0 , reject H0 if t  t.; ˛/
H1 W  Y > d 0 , reject H0 if t  t.; ˛/
: X
X  Y ¤ d0 , reject H0 if jtj  t.; ˛=2/.

Paired Samples
If n D m and the two samples X1 ; : : : ; Xn and Y1 ; : : : ; Yn are not independent, we consider
the difference of the two samples Di D Xi Yi and deem this a one-sample t -test because we
DN n d0
assume D is unknown. The test statistic is T D p
SD = n
 t.n 1/ if H0 is true. If t is the score,
for the alternative
8
< D < d0 , reject H0 if t  t.n 1; ˛/
H1 W  > d0 , reject H0 if t  t.n 1; ˛/
: D
D ¤ d0 , reject H0 if jtj  t.n 1; ˛=2/.

Test for Variances of Two Samples


If we test the relationship between variances of two independent samples, we consider the ratio
2
r D X2 : We assume the means X and Y are unknown. We test whether X2 is a multiple
Y
2
of Y2 which means we test H0 W X2 D r0 vs. one of the usual alternatives. The test statistic is
Y
2
SX 1
F D 2
SY
 r0
 F .m 1; n 1/ when H0 is true. If f is the score, for the alternative

8
< r < r0 , reject H0 if f  F .m 1; n 1; 1 ˛/
H1 W r > r0 , reject H0 if f  F .m 1; n 1; ˛/
:
r ¤ r0 , reject H0 if f  F .m 1; n 1; 1 ˛=2/ or f  F .m 1; n 1; ˛=2/.

Example 5.13 In this example, two popular brands of low-fat, Greek-style yogurt, Artemis
and Demeter, are compared. Let X denote the weight (in grams) of a container of the Artemis
brand yogurt, and let Y be the weight (also in grams) of a container of the Demeter brand yogurt.
Assume that X  N.X ; X / and that Y  N.Y ; Y /. Nine measurements of the Artemis
yogurt and thirteen measurements of Demeter yogurt were taken with the following results.

Artemis Brand Demeter Brand


21:7 21:0 21:2 21:5 20:5 20:3 21:6 21:7
20:7 20:4 21:9 21:3 23:0 21:3 18:9 20:0
20:2 21:6 20:6 20:4 20:8 20:3
120 5. HYPOTHESIS TESTING
Consider the hypotheses
X2 X2
H0 W D1 vs. H1 W ¤ 1:
Y2 Y2
We will compute the value of the test statistic

SX2 1 SX2
F D  D
SY2 r0 SY2

since r0 D 1 in this case. From the two samples we compute sX2 D 1:014 and sY2 D 0:367. There-
fore, f D 0:367
1:014
D 0:3619.
If we set the level of the test to be ˛ D 0:05, then F .8; 12; 0:025/ D 4:20 and
F .8; 12; 0:975/ D 0:28. Since 0:28 < f < 4:20, we cannot reject H0 , and we may assume the
variances are equal.
Now we want to test if the mean weights of the two brands of yogurt are the same. Con-
sider the hypotheses

H0 W X Y D 0 D d0 vs. H1 W X Y ¤ 0:

The pooled sample standard deviation is computed from the two samples as sp D 0:869. The
test statistic is
.X m Y n / d0 .X m Y n /
T D r D r  t.20/:
1 1 1 1
Sp C Sp C
m n m n
Therefore,
.21:03 20:89/
tD r D 0:372:
1 1
0:869 C
9 13
For the same level ˛ D 0:05, we reject H0 if jtj  t.20; 0:025/ D 2:086. Since t D 0:372, we
retain H0 . Also, we may calculate the p-value as P .jT j  0:372/ D 1 P .jT j  0:372/ D 1
tcdf. 0:372; 0:372; 20/ D 1 0:286 D 0:7138. We cannot reject the null and conclude there is
no difference between the weights of the Artemis and Demeter brand yogurts.

5.4.1 TEST OF HYPOTHESES FOR TWO PROPORTIONS


Recall that we could not construct a confidence interval for the difference pX pY since we did
not know the values of pX and pY . We approximated these values by XNm and YNn , respectively.
Now consider the hypothesis H0 W pX pY D d0 . If H0 is true, it follows that

.XN m YNn / d0
q  Z (approximate).
pX .1 pX / pY .1 pY /
m
C n
5.4. HYPOTHESES TESTS FOR TWO POPULATIONS 121
If we now approximate pX and pY by XNm and YNn respectively, the approximation still holds, and
we obtain the test statistic
.XN m YNn / d0
ZDq  N.0; 1/.
XN m .1 XN m / YNn .1 YNn /
m
C n

So if z is the score, for the alternative


8
< pX pY < d0 , reject H0 if z  z˛
H1 W p pY > d0 , reject H0 if z  z˛
: X
pX pY ¤ d0 , reject H0 if jzj  z˛=2 .

As a special case, suppose that d0 D 0. Therefore, H0 W pX pY D d0 becomes H0 W pX D


pY D p0 where p0 is the common value. Therefore, our test statistic becomes

.XN m YNn / .XN m YNn /


q Dp q  Z (approximate).
p0 .1 p0 / p0 .1 p0 / 1 1
m
C n
p0 .1 p0 / m C n

We encounter the same problem as before, namely, that we do not know the value of p0 . Since
we are assuming that the population proportions are the same, it makes sense to pool the pro-
portions (that is, form a weighted average) as in

mXN m C nYNn
PN0 D .
mCn
Our test statistic becomes
.XN m YNn /
p q  Z (approximate).
1 1
PN0 .1 PN0 / m C n

If z is the score, for the alternative


8
< pX < pY , reject H0 if z  z˛
H1 W p > pY , reject H0 if z  z˛
: X
pX ¤ pY , reject H0 if jzj  z˛=2 .

P-values can be easily computed for all the above tests. As an example, for the test of hypotheses
H0 W pX D pY vs. H1 W pX ¤ pY , the p-value is computed as 2P .Z  jzj/ where

.pNX pNY / mpNX C npNY


zDp q , p0 D .
pN0 .1 pN0 / m 1
C 1 mCn
n
122 5. HYPOTHESIS TESTING
(Note that pNX D xN m and pNY D yNn ). The TI calculator command is 2 normalcdf.jzj ; 1/. To
compute the absolute value, use the sequence MATH ! NUM ! 1.
Example 5.14 A survey was done among boys and girls, ages 7–11, to assess their interest in
being part of an effort to colonize the planet Mars as potential settlers. Of 1,900 boys asked
this question, 1,311 expressed interest in becoming settlers whereas out of 2,000 girls that were
asked the same question, 1,440 said they would be interested. Let pB and pG be the proportions
of boys and girls, ages 7–11, that are interested in colonizing Mars as settlers. We are interested
in determining if these proportions are the same. Consider the test of hypotheses.

H0 W pB D pG vs. H1 W pB ¤ pG :

Set the level of the test at ˛ D 0:05. The pooled estimate of the common proportion p0 is
mpNB C npNG 1311 C 1440
p0 D D D 0:705.
mCn 3900
Therefore, the observed value of the test statistic is
1311 1440
zD 1900 r2000 D 2:053:
p 1 1
0:705.1 0:705/ C
1900 2000
We reject H0 since jzj D 2:053  z0:025 D 1:96. The p-value of z D 2:053 is easily calculated
to be 0:04.

Example 5.15 We revisit the VIOXX© clinical study introduced at the beginning of the chap-
ter to obtain the solution in terms of the notation and methods of hypothesis testing. Let pC
and pT denote the true proportions of subjects that experience cardiovascular events (CVs)
when taking the drug. Recall that the test of hypotheses was H0 W pT D pC vs. H1 W pT > pC .
As described in this section, the test statistic is
TN1287 CN 1299
ZDq .
TN1287 .1 TN1287 / CN 1299 .1 CN 1299 /
1287
C 1299

For the VIOXX© data, the value is


46 26
1287 1299
zDq 46 46 26 26
D 2: 4302.
1287 .1 1287 / 1299 .1 1299 /
1287
C 1299

The p-value is calculated as P .Z  2:4302/ D 0:00755 which is highly significant. It can be


interpreted that the probability that the obseved difference of means TN1287 CN 1299 is at least
5.5. POWER OF TESTS OF HYPOTHESES 123
46 26
1287 1299
D 0:0157 under the assumption that the population means are the same. We re-
ject the null hypothesis. The treatment population proportion appears greater than the control
population proportion. What if we pool the proportions since under the null hypothesis, we are
72
assuming that pT D pC ? Does that make a difference? The pooled proportion is p0 D 2586 , and
so the value of the test statistic is now
46 26
1287 1299
zDq q D 2: 4305
72 72 1 1
2586
1 2586 1287
C 1299

yielding a p-value of P .Z  2:4305/ D 0:00754. Pooling the proportions has no effect on the
significance of the result.

Tables summarizing all the tests for the normal parameters and the binomial proportion
are located in Section 5.8.

5.5 POWER OF TESTS OF HYPOTHESES


The power of a statistical test refers to its ability to avoid making Type II errors, that is, not
rejecting the null hypothesis when it is false. Recall that the probability of making a Type I or
Type II error is denoted ˛ and ˇ , respectively. Specifically, we have

˛ D P .rejecting H0 j H0 is true/ and ˇ D P .not rejecting H0 j H1 is true/.

In order to quantify the power of a test, we need to be more specific with the alternative hy-
pothesis. Therefore, for each #1 specified in an alternative H1 W # D #1 , we obtain a different
value of ˇ . Consequently, ˇ is really a function of the # which we specify in the alternative. We
define for each #1 ¤ #0 ;

ˇ.#1 / D P .not rejecting H0 j H1 W # D #1 ¤ #0 /:

Definition 5.16 The power of a statistical test at a value #1 ¤ #0 is defined to be .#1 / D


1 ˇ.#1 / D P .rejecting H0 j H1 W # D #1 ¤ #0 /:

The power of a test is the probability of correctly rejecting a false null. For reasons that
will become clear later, we may define .#0 / D ˛ .
We now consider the problem of how to compute .#/ for different values of # ¤ #0 . To
illustrate the procedure in a particular case, suppose we take a random sample of size n from a
random variable X  N.; 0 /. (0 is known.) Consider the set of hypotheses

H0 W  D 0 vs. H1 W  ¤ 0 :
124 5. HYPOTHESIS TESTING
Recall that we reject H0 if j x =p
0
n
j  z˛=2 : If, in fact,  D 1 ¤ 0 , the probability of a Type II
0
error is
ˇ.1 / D P .not rejecting H0 j  D 1 / (5.6)
ˇ ˇ !
ˇX  ˇ
ˇ 0ˇ
DP ˇ p ˇ < z˛=2 j  D 1
ˇ 0 = n ˇ
 
0 0
D P 0 z˛=2 p < X < 0 C z˛=2 p j  D 1
n n
!
.0 1 / X 1 .0 1 /
DP p z˛=2 < p < p C z˛=2 j  D 1
0 = n 0 = n 0 = n
  !
.0 1 / .0 1 / X 1
DP p z˛=2 < Z < p C z˛=2 since p Z
0 = n 0 = n 0 = n
ˇ ˇ 
ˇ 0 1 ˇˇ
D P ˇˇZ p ˇ < z˛=2 :
0 = n
We needed to standardize X using 1 and not 0 because 1 is assumed to be the correct
value of the mean. Once we have ˇ.1 /, the power of the test at  D 1 is then .1 / D
1 ˇ.1 /.
Remark 5.17 Notice that

lim .1 / D 1 P jZj < z˛=2 D 1 .1 ˛/ D ˛;
1 !0

and so it makes sense to define .0 / D ˛ . Also, note that lim1 !1 .1 / D 1 and
lim1 ! 1 .1 / D 1.

Keep in mind that the designer of an experiment wants both ˛ and ˇ to be small. We want
the power of a test to be close to 1 because the power quantifies the ability of a test to detect a
false null.
Example 5.18 A data set of n D 106 observations of the body temperature of healthy adults
was compiled. The standard deviation is known to be 0 D 0:62ı F. (In this example, degrees are
Fahrenheit.) Consider the set of hypotheses
H0 W  D 98:6ı vs. H1 W  ¤ 98:6ı :
Table 5.2 lists selected values of the power function ./ for the above test of hypotheses com-
puted using the formula derived above with the level set at ˛ D 0:05. For example, if the alter-
native is H1 W 1 D 98:75ı , the power is .98:75ı / D 0:702, and so there is about a 70% chance
of rejecting a false null if the true mean is 98:75ı :
A graph of the power function ./ is given below (Fig. 5.1).
5.5. POWER OF TESTS OF HYPOTHESES 125
ı
Table 5.2: Values of .1 /; 1 ¤ 98:6

1 98.35 98.40 98.45 98.50 98.55 98.60 98.65 98.70 98.75 98.80 98.85
.1 / 0.986 0.913 0.702 0.382 0.132 0.050 0.132 0.382 0.702 0.913 0.986

1.0
(98.35º, 0.986)

̟(μ) = P(rejecting H0 | μ ≠ μ0)


0.8

(98.75º, 0.702)
0.6

0.4

0.2

(98.6º, 0.05) H0 : μ0 = 98.6º


98.3 98.4 98.5 98.6 98.7 98.8 98.9

Figure 5.1: A power curve ./:

5.5.1 FACTORS AFFECTING POWER OF A TEST OF HYPOTHESES


How can we increase the power of a test of hypotheses? This is an important question since we
would like our tests to be as sensitive as possible to the presence of a false null hypothesis. The
derivation of the power function ./ presented in (5.6) contains important clues. The power
function was computed as
ˇ ˇ 
ˇ 0 1 ˇˇ
.1 / D 1 P ˇˇZ p < z˛=2 j 1 ¤ 0 .
0 = n ˇ
p
Assume 1 , 0 , and ˛ are all fixed. If the sample size n increases, then .0 1 /=.0 = n/
approaches ˙1 depending upon whether 0 > 1 or 0 < 1 : Consequently,
ˇ ˇ 
ˇ 0 1 ˇˇ
lim P ˇZˇ p < z˛=2 D 0,
n!1 0 = n ˇ
and so limn!1 .1 / ! 1. So one way to increase the power of a (two-sided) statistical test is
to increase the sample size. This is the most common and practical way of increasing power.
Example 5.19 In the diagram below, power curves are illustrated for two-sided tests for healthy
adult body temperature of the form H0 W  D 98:6ı vs. H1 W  ¤ 98:6ı ; for various choices
of sample size.
126 5. HYPOTHESIS TESTING
1.0

n = 106

0.8
n = 150

n = 50

0.6

0.4
n = 20

0.2 n = 10

98.3 98.4 98.5 98.6 98.7 98.8 98.9 99.0

Figure 5.2: Power curves for a two-sided test and different sample sizes.

Suppose we specify the desired power of a test. Can we find the sample size needed to
achieve this? Here’s an example of the method for a two-sided test if we specify ˇ: We have from
(5.6)

ˇ ˇ 
ˇ  0 1 ˇ
1 ˇ D P ˇˇZ p ˇˇ  z˛=2
 0 = n   
0 1 0 1
D P Z  z˛=2 p C P Z  z˛=2 C p
 0 = n   0 = n 
j0 1 j j0 1 j
D P Z  z˛=2 p C P Z  z˛=2 C p .
0 = n 0 = n

For statistically significant levels ˛ , the first term in the sum above is close to 0, and so it makes
sense to equate the power 1 ˇ with the second term and solve for n giving us a sample size
that is conservative in the sense that the sum of the two terms above will only be slightly larger
than 1 ˇ . Doing this, we get
 
j0 1 j
1 ˇ D P Z z˛=2 C p
0 = n
j0 1 j
H) zˇ D z˛=2 C p
& 0 = n '

0 .zˇ C z˛=2 / 2
H) n D .
0 1
5.5. POWER OF TESTS OF HYPOTHESES 127
  
0 .zˇ Cz˛=2 / 2
If we take N D 0 1
, then for all n  N , the power at an alternative 1 ¤ 0 will
be at least 1 ˇ , approaching 1 as n ! 1.

Example 5.20 Continuing with the body temperature example once more, for a test at level
˛ D 0:05, suppose we specify that if j 98:6ı j  0:15ı , then we want ./  0:90. By symme-
try and monotonicity of the power function, we need only require that .98:75ı / D 0:90. There-
fore, with z˛=2 D z:025 D 1:96; zˇ D z:1 D 1:28; 0 D 98:6ı ; 1 D 98:75ı , and 0 D 0:62ı , we
have a sample size
& 2 '
0:62.1:28 C 1:96/
nD D 180:
0:15

A sample of at least 180 healthy adults must be tested to ensure that the power of the test at all
alternatives  such that j 98:6ı j  0:15ı will be at least 0:90.

5.5.2 POWER OF ONE-SIDED TESTS


Computing power and constructing power curves can be done for one-sided tests as well as for
two-sided tests. For example, suppose we take a random sample of size n from a random variable
X  N.; 0 /. (0 is known.) Consider the set of hypotheses H0 W  D 0 vs. H1 W  > 0 : At
 D 1 ¤ 0 ,

.1 / D 1 ˇ.1 /
!
X 0
D1 P p < z˛ j  D 1
0 = n
 
0
D 1 P X < 0 C z˛ p j  D 1
n
!
X 1 .0 1 /
D1 P p < p C z˛ j  D 1
0 = n 0 = n
  !
.0 1 / X 1
D1 P Z< p C z˛ since p Z
0 = n 0 = n
 
0 1
D1 P Z < z˛ C p :
0 = n

We have that lim1 !0 .1 / D 1 P .Z < z˛ / D 1 .1 ˛/ D ˛ as before, and so we may


p
define .0 / D ˛ . Notice that as 1 ! 1, .0 1 /=.= n/ increases to C1, and so
./ ! 0. So for alternative values of  < 0 , the power is less than ˛ , and the smaller  gets,
128 5. HYPOTHESIS TESTING
the less power the test has. This is of no concern really since the test is one-sided, and power
only becomes meaningful for alternative values of  greater than 0 .
Remark 5.21 As in two-sided tests, power can be increased
 by increasing sample size. For a
 
0 .zˇ Cz˛ / 2
one-sided test, the sample size needed for a given ˇ is n D 0 1
. (Verify this!)

Example 5.22 The diagram in Figure 5.3 displays power curves for one-sided tests for healthy
adult body temperature of the form H0 W  D 98:6ı vs. H1 W  > 98:6ı , for various choices of
sample size. Notice that for  < 98:6ı , the power of the test diminishes rapidly.

1.0
n = 106

0.8

n = 50

0.6

n = 150

0.4
n = 20

0.2 n = 10

98.3 98.4 98.5 98.6 98.7 98.8 98.9 99.0

Figure 5.3: One-sided power curves for different sample sizes.

5.6 MORE TESTS OF HYPOTHESES


If we have data from an experiment and we don’t know the distribution of the data, how can
we test the conjecture that the data comes from, say, a normal distribution, or an exponential
distribution? Tests involving the pdf (or pmf in the discrete case) itself and not just the param-
eters for a known pdf are called goodness-of-fit tests. A test of hypotheses of this type has the
general form
H0 W fX .x/ D f0;X .x/ vs. H1 W fX .x/ ¤ f0;X .x/;
where f0;X .x/ represents a known pdf.
Another type of test involves testing whether two traits, say A and B , are independent or
not as in H0 W A and B are independent vs. H1 W A and B are dependent. For example, we might
5.6. MORE TESTS OF HYPOTHESES 129
want to test whether increasing rates of addiction to opioids (A trait) are related to a downturn
in the economy (B trait) or whether the number of hours that a person spends on social media
sites (A trait) is related to the person’s educational level (B trait).
Both of these classes of tests use essentially the same statistic called Pearson’s D statistic
introduced by Karl Pearson in 1900. We will describe the distribution of this statistic and show
how it is used in hypothesis testing. In preceding sections, we constructed tests for at most two
parameters. We will conclude this section with an important multiparameter test that generalizes
the two-sample t -test, namely, the analysis of variance or ANOVA for short.

5.6.1 CHI-SQUARED STATISTIC AND GOODNESS-OF-FIT TESTS


We begin by discussing how the 2 distribution is involved in a goodness-of-fit test. Start with
the case in which the population is X  Binom.n; p/. We know that X is a special case of
the multinomial distribution .X1 ; X2 / where X D X1 is the number of successes, and X2 the
number of failures in n trials. The probability of a success or a failure is p1 D p and p2 D 1 p ,
respectively. Since X1 is the sum of n independent Bernoulli trials, we know by the Central
Limit Theorem that for large enough n, the following approximations are appropriate:

X1 np1 X2 np2
p  N.0; 1/ and p  N.0; 1/:
np1 .1 p1 / np2 .1 p2 /

However, these two distributions are not independent. (Why?) We also know from Chapter 2
that Z  N.0; 1/ implies Z 2  2 .1/. Now we calculate

.X1 np1 /2
2 .1/  D1 D
np1 .1 p1 /
.1 p1 /.X1 np1 /2 p1 .X1 np1 /2
D C
np1 .1 p1 / np1 .1 p1 /
2
.X1 np1 / .n X2 np1 /2
D C
np1 np2
2
.X1 np1 / .n.1 p1 / X2 /2
D C
np1 np2
2
.X1 np1 / .np2 X2 /2
D C
np1 np2
2
.X1 np1 / .X2 np2 /2
D C :
np1 np2

Therefore, the sum of the two related distributions is approximately distributed as 2 .1/. If
they were independent, D1 would be approximately distributed as 2 .2/: This result can be
generalized in a natural way. Specifically, if .X1 ; X2 ; : : : ; Xk /  Multin.n; p1 ; p2 ; : : : ; pk /, then
130 5. HYPOTHESIS TESTING
it can be shown that as n ! 1,

k
X .Xi npi /2
Dk 1 D  2 .k 1/
npi
i D1

although the proof is beyond the scope of this text. A common rule of thumb is that n should
be large enough so that npi  5 for each i to guarantee that the approximation is acceptable.
The statistic Dk 1 is called Pearson’s D statistic. The subscript k 1 is one less than the number
of terms in the sum.
We now discuss how the statistic Dk 1 might be used in testing hypotheses. Consider
an experiment with a sample space S , and suppose S is partitioned into disjoint events, i.e.,
S D [kiD1 Ai where A1 , A2 , …, Ak are mutually disjoint subsets such that P .Ai / D pi : Clearly,
p1 C p2 C    C pk D 1. If the experiment is repeated independently n times, and Xi is defined
to be the number of times that Ai occurs, then .X1 ; X2 ; : : : ; Xk /  Multin.n; p1 ; p2 ; : : : ; pk /.
We want to know if the experimental results for the proportion of time event Ai occurs matches
some prescribed proportion p0;i for each i D 1; 2; : : : ; k: The test of hypotheses for given prob-
abilities p0;1 ; : : : ; p0;k is

H0 W p1 D p0;1 , p2 D p0;2 , …, pk D p0;k vs.


H1 W pi ¤ p0;i for at least one i:

This is now setup for the 2 goodness-of-fit test. If H0 is true, then .X1 ; X2 ; : : : ; Xk / 
Multin.n; p0;1 ; p0;2 ; : : : ; p0;k /, and for large enough n, Dk 1 is distributed approximately as

k
X .Xi np0;i /2
Dk 1 D  2 .k 1/:
np0;i
iD1

Since each Xi represents the observed frequency of the observations in Ai and np0;i is the
expected frequency of the observations in Ai , if the null hypothesis is true, we would expect the
value of Dk 1 to be small. We should reject the null hypothesis if the value of Dk 1 appears
to be too large. To make sure that we commit a Type I error at most 100˛% of the time, we
need

k
!
X .Xi np0;i /2
P .Dk 1  2 .k 1; ˛// D P  2 .k 1; ˛/ D ˛:
np0;i
iD1

The value of the test statistic Dk 1 with observations X1 D x1 ,: : : ,Xn D xn ; is


k
X .xi np0;i /2
dk 1 D :
np0;i
iD1
5.6. MORE TESTS OF HYPOTHESES 131
Example 5.23 A tetrahedral die is tossed 68 times to determine if the die is fair or not. The
sides are labeled 1–4. Consider the following set of hypotheses.

H0 W the die is fair vs. H1 W the die is not fair.

The observed and expected frequencies are listed in the table below.

Side Observed Frequency Expected Frequency


1 22 0:25  68 D 17:0
2 15 17:0
3 19 17:0
4 12 17:0

The value of D3 is

.22 17/2 .15 17/2 .19 17/2 .12 17/2


d3 D C C C D 3:4118:
17 17 17 17

Set ˛ D 0:05. Since d3 < 2 .3; 0:05/ D 7:815, the null hypothesis is not rejected, and there is
not enough evidence to claim that the die is not fair. In addition, the p-value of the test is
P .D3  3:4118/ D 0:3324 > 0:05.

Example 5.24 The printer’s proofs of a new 260-page probability and statistics book contains
typographical errors (or not) on each page. The number of pages on which i errors occurred is
given in the table below.

Num of Errors 0 1 2 3 4 5 6
Num of Pages 77 90 55 30 5 3 0

Suppose we conjecture that these 260 values resulted from sampling a Poisson random variable
i
X with  D 2. The random variable X  Poisson./ if P .X D i/ D e  i Š ; i D 0; 1; 2; : : :. The
hypothesis test is

H0 W data is Poisson with  D 2 vs. H1 W data is not Poisson with  D 2.

To apply our method we must first compute the probabilities P .X D i/, i D 0; : : : ; 5, and also
P .X  6/ since the Poisson random variable takes on every nonnegative integer value. The
results are listed below.
132 5. HYPOTHESIS TESTING

2 2i
i Probability e iŠ
Expected Frequency 260  P .X D i/
0 0:13534 35: 188
1 0:27067 70: 374
2 0:27067 70: 374
3 0:18045 46: 917
4 0:09022 23: 457
5 0:03609 9: 383
6 1 P .X  5/ D 0:01656 4: 306

Since the expected frequency of at least 6 errors is less than 5, we must combine the entries for
i D 5 and i  6. Our revised table is displayed below.

i Probability Expected Frequency


0 0:13534 35: 188
1 0:27067 70: 374
2 0:27067 70: 374
3 0:18045 46: 917
4 0:09022 23: 457
5 0:05265 13: 689

The value of D5 is
.77 35: 188/2 .90 70: 374/2 .55 70: 374/2
d5 D C C C
35: 188 70: 374 70: 374
.30 46: 917/2 .5 23: 457/2 .3 13: 689/2
C C D 87: 484.
46: 917 23: 457 13: 689
Let ˛ D 0:01. Since d5  2 .5; 0:01/ D 15:086, we reject H0 . Another way to see this is by
calculating the p-value P .D5  87:484/ D 2:26  10 17 < 0:01: It is extremely unlikely that
the data is generated by a Poisson random variable with  D 2.

Example 5.25 Continuing with the previous example, you may question why we took  D 2.
Actually it was just a guess. A better way is to estimate the value of  directly from the data
instead of attempting to guess it. Since the expectation of a Poisson random variable X with
parameter  is E.X/ D , we can estimate  from the sample mean as
77  0 C 90  1 C 55  2 C 30  3 C 5  4 C 3  5
 D x 260 D D 1: 25.
260
The estimated expected frequencies with this value of  are listed below.
5.6. MORE TESTS OF HYPOTHESES 133
1:25 1:25i
i P .X D i / D e iŠ
Expected Frequency 260  P .X D i/
0 0:2865 74:490
1 0:3581 93:106
2 0:2238 58:188
3 0:0933 24:258
4 0:0291 7:566
5 0:0073 1:898
6 1 P .X  5/ D 0:0019 0:494

We must combine the entries for i D 4, i D 5, and i  6 to replace the bottom three rows with
i  4; P .X  4/ D 0:0383, and expected frequency 9:958.
Since one of the parameters, , had to be estimated using the sample values, it turns out
that under the null hypothesis, the D statistic loses a degree of freedom. That is, D4  2 .3/:
The value of D4 is now computed as

.77 74: 490/2 .90 93:106/2 .55 58:188/2


d4 D C C
74: 490 93:106 58:188
2 2
.30 24:258/ .8 9: 958/
C C D 2:107:
24:258 9: 958

Take ˛ D 0:01. Since d4 < 2 .3; 0:01/ D 11:345 or, calculating P .D4  2:107/ D 0:55049 > ˛ ,
we do not reject H0 . It is plausible that the population is described by a Poisson random variable
with  D 1:25.

Remark 5.26 The previous example addresses an important issue. If any of the parameters
of the proposed distribution must be estimated, then the D statistic under H0 loses degrees
of freedom. In particular, if the proposed distribution has r parameters that must be estimated
from the data, then Dk 1  2 .k 1 r/ under the null hypothesis. The proof is fairly involved
and therefore is omitted.

Example 5.27 Simulation on a computer of a random variable depends on generating random


numbers in Œ0; 1: Since computers are deterministic and not stochastic, any algorithm generat-
ing pseudorandom numbers must be tested to see if it actually produces pseudorandom samples
from the uniform random variable on the interval Œ0; 1. On one run of such a program, the
following set of 50 numbers was generated.
134 5. HYPOTHESIS TESTING
Pseudorandom Numbers from Œ0; 1
0:00418818 0:489868 0:860478 0:531297 0:134487
0:299301 0:0126372 0:00376535 0:0380281 0:00198181
0:640423 0:54803 0:51956 0:143379 0:0504356
0:636537 0:136595 0:229917 0:211021 0:0756147
0:791362 0:651962 0:726103 0:986798 0:128636
0:474316 0:491401 0:693047 0:188199 0:14045
0:47789 0:38617 0:00837938 0:198406 0:33602
0:229248 0:46666 0:398312 0:340956 0:00351918
0:168613 0:858063 0:240602 0:347544 0:155587
0:0405615 0:427129 0:963142 0:886288 0:0283893
How do we know these numbers are really random, i.e., generated correctly from X 
UnifŒ0; 1? Here’s how to apply a goodness-of-fit test for continuous distributions. First, parti-
tion the range of the random variable in some way, for example, into equal-length subintervals.
Let A1 D Œ0; 0:1, and for 2  i  10, let Ai D .0:1.i 1/; 0:1i. The frequencies of the num-
bers in each of the intervals Ai are given in the following table. The expected frequencies are
calculated assuming they do come from UnifŒ0; 1:

Interval Frequency Expected Frequency


Œ0:0; 0:1 11 5
.0:1; 0:2 9 5
.0:2; 0:3 5 5
.0:3; 0:4 5 5
.0:4; 0:5 6 5
.0:5; 0:6 3 5
.0:6; 0:7 4 5
.0:7; 0:8 2 5
.0:8; 0:9 3 5
.0:9; 1:0 2 5
We consider the test of hypotheses

H0 W data generated from UnifŒ0; 1 vs. H1 W data not generated from UnifŒ0; 1.

The value of D9  2 .9/ is computed as


.11 5/2 .9 5/2 .5 5/2 .5 5/2 .6 5/2
d9 D C C C C
5 5 5 5 5
.3 5/2 .4 5/2 .2 5/2 .2 5/2 .2 5/2
C C C C C D 17:0:
5 5 5 5 5
5.6. MORE TESTS OF HYPOTHESES 135
2
Let ˛ D 0:05. Since P .D9  17/ D 0:0487 < ˛ or  .9; 0:05/ D 16:919 < 17, we reject H0 that
the data is from the uniform distribution on Œ0; 1. Notice that whether to reject the null hy-
pothesis is unclear, and perhaps the experiment should be repeated. The results are statistically
significant but not highly significant.

5.6.2 CONTINGENCY TABLES AND TESTS FOR INDEPENDENCE


Consider an experiment in which outcomes have two traits, say A and B . For example, A might
denote yearly income and B educational level. Within each trait, there are a finite number of
mutually exclusive categories. Yearly income might be categorized as low, middle, or high, and
educational level as high school, some college, bachelor’s degree, some post-graduate education,
or graduate degree. How do we determine if trait A is independent of trait B ? To generalize, let
A1 , A2 , …, Ar denote the mutually exclusive categories within trait A, and let B1 , B2 , …, Bc
denote the categories within B . The trait A can be viewed as a random variable with r categorical
responses (the Ai ), and likewise for B with c responses. In this view, we are asking if the random
variables A and B are independent. Suppose that in addition to letting A denote the trait, we
also let A denote the event of the experiment containing all outcomes with trait A, and similarly
for B . Let
P .Ai \ Bj / D pij :
Perform the experiment n times, and let Xij be the frequency of the occurrence (observation) of
the event Ai \ Bj . The values of Xij are usually displayed in what is referred to as a contingency
table. Specifically, Xij is placed in row i and column j . The general form of a contingency table
is displayed below.
Trait B
Categories B1 B2  Bc 1 Bc Row Totals
A1 X11 X12  X1.c 1/ X1c R1
A2  R2
:: :: :: :: :: :: ::
Trait A : : : : : : :
Ar 1  Rr 1
Ar Xr1 Xr2  Xr.c 1/ Xrc Rr
Column Totals C1 C2  Cc 1 Cc n

The adjective “contingent” is often used to describe the situation when an event can occur only
when some other event occurs first. For example, earning a high salary in the current economy is
contingent upon finding a high-tech job. In this sense, a contingency table’s function is to reveal
some type of dependency or relatedness between the traits.
We know that
.X11 ; X12 ; : : : ; X1c ; : : : ; Xr1 ; : : : ; Xrc /  Multin.n; p11 ; p12 ; : : : ; p1c ; : : : ; pr1 ; : : : ; prc /;
136 5. HYPOTHESIS TESTING
and for large enough n, Drc 1 is approximately distributed as
r X
X c
.Xij npij /2
Drc 1 D  2 .rc 1/:
npij
i D1 j D1

This result can be used in a test of hypotheses for the independence of the traits (random vari-
ables) A and B . Clearly, traits A and B are independent if P .Ai j Bj / D P .Ai / for every i and
j . We know that independence of events, in this case traits, is equivalent to

P .Ai \ Bj / D P .Ai jBj /P .Bj / D P .Ai /P .Bj /

for every i and j . Consider now the hypotheses

H0 W P .Ai \ Bj / D P .Ai /P .Bj /, i D 1; : : : ; r , j D 1; : : : ; c vs.


H1 W P .Ai \ Bj / ¤ P .Ai /P .Bj / for some i and j .

If traits A and B are independent, the null should hold. Again, the null is formulated to assume
independence because otherwise we have no way to account for any level of dependence.
Let the row frequencies and column frequencies be denoted, respectively, by pi  D P .Ai /
and pj D P .Bj /. The set of hypotheses above can now be rewritten more compactly as

H0 W pij D pi  pj , i D 1; : : : ; r , j D 1; : : : ; c vs.


H1 W pij ¤ pi  pj for some i and j .

In other words, the null states that the probability of each cell should be the product of the
corresponding row and column probabilities. Under the null hypothesis, Drc 1 is approximately
distributed as
X r X c
.Xij npi  pj /2
Drc 1 D  2 .rc 1/:
npi  pj
i D1 j D1

The problem is that we do not know any of the probabilities involved here. Our only course of
action is to estimate them from the sample values. Defining
c
X r
X
Xi  D Xij D row sum and Xj D Xij D column sum;
j D1 iD1

we can estimate pi  and pj as


Xi  row sum Xj column sum
pOi  D D and pOj D D
n n n n
where n is the total number of observations. Since we have estimated unknown parameters, the
number of degrees of freedom of the D statistic is reduced. Once we estimate pOij , j D 1; : : : ; c
5.6. MORE TESTS OF HYPOTHESES 137
1, then pOic is fixed. Similarly, pOrj is fixed. So exactly .r 1/ C .c 1/ D r C c 2 parameters
must be estimated, reducing the number of degrees of freedom by r C c 2. But rc 1 .r C
c 2/ D rc r c C 1 D .r 1/.c 1/. Therefore, Drc 1 is approximately distributed as
r X
X c
.Xij npOi  pOj /2
Drc 1 D  2 ..r 1/.c 1//:
npOi  pOj
i D1 j D1

If traits A and B are really independent, we would expect the value of Drc 1 to be small. As
before, we will reject H0 at level ˛ if the value of Drc 1 is too large, specifically, if the value is
at least 2 ..r 1/.c 1/; ˛/. If we use the p-value approach, we would compute

P .Drc 1  drc 1/ D 2 cdf.drc 1 ; 1; .r 1/.c 1//:

Finally, as a rule of thumb, the estimated expected frequencies should be at least 5 for each i and
j . If not, rows and columns should be collapsed to achieve this requirement.

Example 5.28 Data is collected to determine if political party affiliation is related to whether
or not a person opposes, supports, or is indifferent to water use restrictions in a certain South-
western American city that is experiencing a severe drought. A total of 500 adults who belonged
to one of the two major political parties were contacted in the survey. We would like to know
if a person’s party affiliation and his or her opinion about water restrictions are related. The
hypotheses to be tested are

H0 W party affiliation and water use restriction opinion are independent vs.
H1 W party affiliation and water use restriction opinion are dependent.

The results of the survey are presented in the following contingency table.

Response
Categories Approves (A) Opposes (O) Indifferent (I) Row Totals
Party Democrat (D) 138 64 83 285
115:14 84:36 85:5
Affiliation Republican (R) 64 84 67 215
86:86 63:64 64:5
Column Totals 202 148 150 500

The estimated probabilities are


285 215 202
pOD D D 0:57; pOR D D 0:43; pOA D D 0:404;
500 500 500
148 150
pOO D D 0:296; pOI D D 0:3:
500 500
The estimated expected frequencies in the table (displayed below the number of observations)
are calculated by taking 500  row prob.  col. prob: For example, the expected frequency of
138 5. HYPOTHESIS TESTING
Democrats who approve is 500pOD pOA D 500  0:57  0:404 D 115:14: The value of D23 1 D
D5 is calculated as
X X .Xij npOi  pOj /2
d5 D
npOi  pOj
i 2fD;Rg j 2fA;O;I g

.138 115: 14/2 .64 84: 36/2 .83 85: 5/2


D C C
115: 14 84: 36 85: 5
.64 86:86/2 .84 63: 64/2 .67 64:5/2
C C C D 22:135.
86:86 63:64 64:5
Because of the estimated parameters, D5  2 .2/. Set ˛ D 0:05. The p-value of 22:135 is
P .D5  22:135/ D 0:000016 < ˛ constituting strong evidence against H0 . A person’s opinion
on water use restrictions during the current drought appears to be dependent on political party
affiliation.

Remark 5.29 A note of caution is in order here. The statistic Drc 1 is discrete, and we are
using a continuous distribution, namely the 2 distribution, to approximate it. If Drc 1 is ap-
proximated by 2 .1/ (for example, for a 2  2 contingency table) or when at least one of the
estimated expected frequencies is less than 5, a continuity correction has been suggested to im-
prove the approximation just as we do in using a normal distribution to approximate a binomial
distribution. The suggestion is that the D statistic be corrected as
r X c ˇ ˇ
X .ˇXij npOi pOj ˇ 0:5/2
Drc 1 D .
npOi  pOj
i D1 j D1

This correction has a tendency however to over-correct and may lead to larger Type II errors.

Example 5.30 A sample of 185 prisoners who experienced trials in a certain criminal jurisdic-
tion was taken, and the results are presented in the 2  2 contingency table below.
Verdict
Categories Acquitted (A) Convicted (C) Row Totals
Offender Female (F) 39 5 44
39:495 4:505
Gender Male (M) 127 14 141
126:45 14:55
Column Totals 166 19 185
The estimated probabilities are
44 141
pOF  D D 0:238, pOM  D D 0:762,
185 185
166 19
pOA D D 0:897; pOC D D 0:103.
185 185
5.6. MORE TESTS OF HYPOTHESES 139
Since one of the estimated expected frequencies is less than 5, and we are working with a
2  2 contingency table, we will use the continuity correction. We compute
.j39 39: 495j 0:5/2 .j5 4: 505j 0:5/2
d3 D C
39: 495 4: 505
.j127 126: 450j 0:5/2 .j14 14:550j 0:5/2
C C D 0:000198.
126: 450 14:550
If ˛ D 0:05, then 2 .1; 0:05/ D 3:841. We fail to reject H0 . The gender of the offender and
whether or not the offender is convicted or acquitted appear not to be related.

Test for Homogeneity


We end our treatment of 2 tests with a test for homogeneity. In our discussion of the test
for independence, we let A1 ; A2 ; : : : ; Ar denote the mutually exclusive categories within trait A,
and B1 ; B2 ; : : : ; Bc denote the categories within B . Recall that the trait A can be viewed as a
random variable with r categorical responses (the Ai ), and likewise for B with c responses. A
single random sample of size n consists of n pairs of these categorical responses, the first one
coming from A and the second from B . The data is assembled in a r  c contingency table.
In a test for homogeneity, what we are interested in is whether for each category Ai of the trait
A, the distribution of the B responses is the same. The following example should clarify the
question of interest.
Example 5.31 The rash of school shootings across the USA has prompted the call for armed
guards in schools. A poll was conducted across five Midwestern states: Indiana, Illinois, Michi-
gan, Wisconsin, and Ohio. It was decided that a random sample of 100 parents in each state
should be asked whether they approved or disapproved of armed guards in the schools. The
results of the poll are given below in a 5  2 contingency table.
Approval/Dissapproval
Categories Approves (A) Dissapproves (D) Row Totals
IN 65 35 100
IL 71 29 100
State MI 78 22 100
WI 82 18 100
OH 70 30 100
Column Totals 366 134 500
The A category is the state, the B category the approval/disapproval. We are interested in asking
whether the percentage of parents in each of the Midwestern states that approve or dissaprove
of armed gaurds in the schools is the same. If so, we would say that the distribution of ap-
proval/dissapproval across the states is the same.
140 5. HYPOTHESIS TESTING
We now develop the test of homogeneity. Before starting, a remark is in order. Typically,
the test of homogeneity is developed by taking random samples from distinct populations (the A
categories) whose sizes are determined a priori. (This is the case in the example above.) That is,
the sizes of these random samples are determined before sampling is done. In our presentation,
a single sample of size n is taken from a single population. The effect of doing this is that the
sizes of these random samples restricted to each category are themselves random. However,
it can be proved that the test of homogeneity derived under the assumption that the samples
are taken from distinct populations results in exactly the same test that is described here. The
technical details are not of immediate interest and are omitted.
As in our discussion of testing independence, suppose that in addition to letting Ai denote
the category, we also let Ai denote the event in the experiment containing all outcomes within
category Ai , and similarly for Bj . As before, let pij D P .Ai \ Bj /, pi D P .Ai /, and pj D
P .Bj /. The null hypothesis states that the distribution of the B categories be the same across all
the A categories. Formally,
H0 W for each i , P .Bj j Ai / D P .Bj / D pj for every j vs.
H1 W some i , P .Bj j Ai / ¤ P .Bj / D pj for some j .
Therefore, under H0 ,
pij D P .Ai \ Bj / D P .Bj j Ai /P .Ai / D P .Bj /P .Ai / D pi  pj .
These probabilities must be approximated by the data in the contingency table. If Xi  is the
observed frequence of category Ai (row sum), Xj is the observed frequency of category Bj
(column sum), and n is the total number of observations, then
Xi row sum Xj column sum
pOi  D D and pOj D D .
n n n n
Therefore, under H0 , if Xij is the observed frequence of Ai \ Bj ,
row sum  column sum
pOij D npOi  pOj D .
n
The above analysis results in the same test statistic as that derived for the test of independence,
namely,
r X
X c
.Xij npOi  pOj /2
Drc 1 D  2 ..r 1/.c 1// (approximate).
npOi pOj
i D1 j D1

We now return to the example introduced above.


Example 5.32 (Example 5.31 Continued) The estimated probabilities are
pOA D 366
500
D 0:732 , pOD D 134
500
D 0:268,
pOI N  D pOIL D pOMI  D pOW I  D pOOH  D 0:2.
5.6. MORE TESTS OF HYPOTHESES 141
The estimated expected frequencies are listed in the updated contingency table below.

Approval/Dissapproval
Categories Approves (A) Dissapprove (D) Row Totals
IN 65 35 100
73:2 26:8
IL 71 29 100
73:2 26:8
State MI 78 22 100
73:2 26:8
WI 82 18 100
73:2 26:8
OH 70 30 100
73:2 26:8
Column Totals 366 134 500

The value of D25 1 D D9 is approximately distributed as 2 .4/. Computing the value of D9 we


get

X X .Xij npOi  pOj /2


d9 D
npOi  pOj
i 2fA;Dg j 2fIN;IL;MI;WI;OHg
2
.65 73:2/ .35 26:8/2 .71 73:2/2 .29 26:8/2
D C C C C
73:2 26:8 73:2 26:8
2 2 2 2
.78 73:2/ .22 26:8/ .82 73:2/ .18 26:8/
C C C C
73:2 26:8 73:2 26:8
2 2
.70 73:2/ .30 26:8/
C
73:2 26:8
D 9:3182.

Set ˛ D 0:01. The p-value is P .D9  9: 3182/ D 0:053 62 > ˛ . We retain the null. The distri-
bution of parents across the five Midwestern states that approve and dissaprove having armed
guards in the schools appear to be the same.

Remark 5.33 The TI calculator can perform the 2 test. Enter the observed values in a list,
say L1 , and the expected values in a second list L2 . Press STAT ! TESTS ! 2 GOF Test.
The calculator will return the value of the 2 statistic as well as the p-value. In addition, it will
return the vector of each term’s contribution to the statistic so that it can be determined which
terms contribute the most to the total. For a test of independence/homogeneity, the observed
values and expected values can be entered into matrices A and B . The calculator’s 2 GOF test
will return the value of the statistic as well as the p-value.
142 5. HYPOTHESIS TESTING
5.6.3 ANALYSIS OF VARIANCE
Earlier in the chapter, we presented a method to test the set of hypotheses H0 W X D
Y vs. H1 W X ¤ Y based on the Student’s t random variable. In this section, we will de-
scribe a procedure that generalizes the two-sample test to k  2 samples. Specifically, there are
j treatments resulting in outcomes Xj  N.j ;  /; and suppose that X1j ; X2j ; : : : ; Xnj j is a
random sample of size nj from Xj , j D 1; 2; : : : ; k . We will assume that the random samples
are independent of one another, and the variances are the same for all the random variables
Xj ; j D 1; 2; : : : ; k , with common value  2 . The hypothesis test that determines if the treat-
ments result in the same means or if there is at least one difference is

H0 W 1 D 2 D    D k vs.
H1 W j ¤ j 0 for some j ¤ j 0 .

The test for this set of hypotheses will be based on the F random variable as opposed to the
Student’s t when there are only two treatments. The tests will be equivalent in the case k D 2:
The development which follows has traditionally been referred to as the analysis of variance (or
ANOVA for short), and is part of the statistical theory of the design of experiments. In the next
chapter we will also present an ANOVA in connection with linear regression.
We now introduce some specialized notation that is traditionally used in ANOVA. Let
n D n1 C n2 C    C nk , the total number of random variables across the k random samples. In
addition, let
k k
1X X nj
D nj j D j
n n
j D1 j D1

k
X nj
which is clearly a weighted average of all the means j , j D 1; 2; : : : ; k , since D 1. If we
n
j D1
assume the null hypothesis, then  D 1 D    D k , and  is the common mean. Finally, let

nj k nj
1 X 1 XX
X j D Xij and X  D Xij .
nj n
i D1 j D1 i D1

The quantity X j represents the sample mean of the j th sample, and X  is the mean across all
the samples.
By considering the identity Xij X  D .Xij X j / C .X j X  /; squaring both
sides, and then taking the sum over i and j , we arrive at the following.
5.6. MORE TESTS OF HYPOTHESES 143
A Fundamental Identity
nj
k X nj
k X k
X X X
2
.Xij X  / D .Xij X j /2 C nj .X j X  /2
j D1 i D1 j D1 i D1 j D1
„ ƒ‚ … „ ƒ‚ … „ ƒ‚ …
SST SSE within treatments SSTR between treatments

In the above equation, SST stands for total sum-of-squares and measures the total variation,
SSE stands for error sum-of-squares and represents the sum of the variations within each sam-
ple, and finally, SSTR stands for treatment sum-of-squares and represents the variation across
samples. A considerable amount of algebra is required to verify the identity and is omitted.

The Test Statistic


Now suppose that the null hypothesis H0 W 1 D 2 D    D k is true. In this case,
X11 ; : : : ; Xn1 1 ; : : : ; X1k ; : : : ; Xnk k can be viewed as a random sample from X  N.; /. There-
fore, since sums of squares of independent standard normals is 2 , we have
k nj
1 XX 1
.Xij X  /2 D SST  2 .n 1/:
2 2
j D1 i D1

Similarly, for treatment nj ,


nj
1 X
.Xij X j /2  2 .nj 1/;
2
i D1

and so by independence,
k nj
1 XX 1
.Xij X j /2 D SSE  2 .n k/
2 2
j D1 i D1

since
k nj !
1 X 1 X
SSE D .Xij X j /2  2 .n1 1/ C    C 2 .nk 1/  2 .n k/:
2 2
j D1 i D1

Notice that .1= 2 /SSE  2 .n k/ is true whether or not H0 is true. So we have the distri-
butions of SST and SSE. To complete the picture, we need to know something about how SSTR
is distributed. The way to determine this is to use the theorem from Chapter 2 that if two ran-
dom variables have the same moment generating function, then they have the same distribution.
Since .1= 2 /SST  2 .n 1/ under H0 , its mgf is
.n 1/=2
M.1= 2 /SST .t / D .1 2t / .
144 5. HYPOTHESIS TESTING
.n k/=2
Also, since .1= 2 /SSE  2 .n k/, we have M.1= 2 /SSE .t/ D .1 2t/ : By indepen-
dence, and since .1= 2 /SST D .1= 2 /SSE C .1= 2 /SSTR,

M.1= 2 /SST .t/ D M.1= 2 /SSE .t/  M.1= 2 /SSTR .t /.

Therefore,
n 1
M.1= 2 /SST .t / .1 2t / 2
.k 1/=2
M.1= 2 /SSTR .t/ D D n k
D .1 2t / ,
M.1= 2 /SSE .t/ .1 2t / 2

which is the mgf of a 2 random variable with k 1 degrees of freedom. Therefore, under H0 ,

k
1 X
nj .X j X  /2 D .1= 2 /SSTR  2 .k 1/:
2
j D1

Now consider the expected values of SSE and SSTR. If H0 is true, it is not hard to show
that
E.SSE/ D .n k/ 2 and E.SSTR/ D .k 1/ 2 :
In summary, assuming H0 , we have
1 1
2
SSE  2 .n k/; SSTR  2 .k 1/;
 2
E.SSE/ D .n k/ 2 ; and E.SSTR/ D .k 1/ 2 :

Therefore, since the ratio of 2 rvs, each divided by its respective degrees of freedom has an
F -distribution as shown in Section 2.6.3, we have
1
, 1
 2 SSTR 2
SSE SSTR=.k 1/
D  F .k 1; n k/: (5.7)
k 1 n k SSE=.n k/

If H0 is true, since E.SSE/ D .n k/ 2 and E.SSTR/ D .k 1/ 2 , we would expect the ratio
to be close to 1. However, if H0 is not true, then it can be shown (details omitted) that
k
X
E.SSTR/ D .k 1/ 2 C nj .j /2 > .k 1/ 2
j D1

since the j s are not the same. The denominator in (5.7) should be about  2 since E.SSE/ D
.n k/ 2 even when H0 is not true. The numerator, however, should be greater than  2 , and
so the ratio should be greater than 1. Therefore,

SSTR=.k 1/
F D  F .k 1; n k/
SSE=.n k/
5.6. MORE TESTS OF HYPOTHESES 145
is our test statistic, and we will reject H0 when its observed value, f , is too large. To prevent
committing a Type I error more than 100˛% of the time, we will reject H0 when f  F .k
1; n k; ˛/.

Remark 5.34 The calculations for ANOVA are simplified using the following formulas:

nj
k X nj
k X
!2
X 1 X
SST D Xij2 Xij
n
kD1 i D1 kD1 iD1

k nj
k X
!2 nj !
X Sj2 1 X X
SSTR D Xij where Sj D Xij
nj n
j D1 kD1 iD1 i D1

SSE D SST SSTR:

To organize the calculations, ANOVA tables are used. Table entries are obtained by substituting
data values for the Xij in the formulas in the above remark.

Source DF SS MSS F-statistic p-Value

SSTR MSTR
Treatment k 1 SSTR MSTR D k 1
f D MSE
P .F .k 1; n k/  f /

SSE
Error n k SSE MSE D n k
* *

Total n 1 * * * *

The table is filled in with numerical values from left to right until a p-value is calculated.

Example 5.35 Sweetcorn, as opposed to field corn which is fed to livestock, is for human
consumption. Three samples from three different types of sweetcorn, Sweetness, Allure, and
Montauk, were compared to determine if the mean heights of the plants were the same. The
following table gives the height (in feet) of 17 samples of Sweetness, 12 samples of Allure, and
15 samples of Montauk. Assume normal distributions for each treatment. Each treatment also
146 5. HYPOTHESIS TESTING
has a variance of 0:64 feet.
Sweetness Allure Montauk
5:48 6:06 7:51 6:73 5:83 5:42
5:21 6:14 6:54 5:85 5:80 6:92
5:08 4:99 6:66 7:28 6:27 5:41
4:14 5:88 5:29 6:83 5:44 6:65
6:56 5:49 5:17 6:77 6:54 5:60
4:81 6:81 7:45 7:25 6:78 6:05
6:70 6:44 7:19 6:08
6:99 6:66 6:16
6:37

We list the results of the ANOVA analysis in the following table.


Source DF SS MSS F-statistic p-Value
Treatment 2 3:860 1:930 f D 3:543 0:038
Error 41 22:336 0:545 * *
Total 43 * * * *
If we set ˛ D 0:05, then F .2; 41; 0:05/ D 3:226  f D 3:543, and so we reject H0 W S D A D
M . The p-value is calculated from P .F .2; 41/  3:543/ D F cdf.3:543; 1; 2; 41/ D 0:038 < ˛:
The data is statistically significant evidence against the null, and we conclude that the mean
heights of the three types of plants are not the same.

Remark 5.36 To calculate the value F .d1; d 2; ˛/ for a given ˛; d1, and d 2; using a TI calcu-
lator, enter the following program as INVF into your calculator:

(a) Input “RT TAIL,” A (d) solve(1-Fcdf(0,X,N,D)-A, X, 1.5*N,0,9999) ! X


(b) Input “D1:,” N (e) Disp X
(c) Input “D2:,” D (f ) Stop

Press PRGM, then INVF, and you will be prompted for the area to the right of c; i.e., ˛ as in
P .F  c/ D ˛: Then enter the degrees of freedom in the order D1, D2, and press ENTER.
The result is the value of c that gives you ˛:

The Special Case of Two Samples


Let X11 ; X21 ; : : : ; Xn1 1 and X12 ; X22 ; : : : ; Xn2 2 be independent random samples from random
variables X1  N.1 ; / and X2  N.2 ;  /, respectively. (Note that the variances are equal.)
5.6. MORE TESTS OF HYPOTHESES 147
There are two methods we can use to test the hypotheses
H0 W 1 D 2 vs. H1 W 1 ¤ 2 :
One test is the two-sample t -test described earlier in this chapter which uses the statistic
.X 1 X 2 /
T D q  t .n1 C n2 2/
Sp n11 C n12

where Sp2 is the pooled variance. Or we could use the ANOVA F -test just introduced. Which of
these methods is better and in what sense? To answer this question, we will apply the ANOVA
procedure to the two-sample problem. In this situation, the F statistic becomes
MSTR SSTR=.k 1/ SSTR
F D D D
MSE SSE=.n k/ SSE=.n1 C n2 2/
since k D 2 and n D n1 C n2 . Computing SSTR, we get
!2 !2
n1 X 1 C n2 X 2 n1 X 1 C n2 X 2
SSTR D n1 X 1 C n2 X 2
n1 C n2 n1 C n2
 !2  !2
n2 X 1 X 2 n1 X 1 X 2
D n1 C n2
n1 C n2 n1 C n2

n1 n22 C n2 n21 2
D X 1 X 2
.n1 C n2 /2
2
n1 n2 2 X 1 X 2
D X 1 X 2 D .
n1 C n2 1 1
C
n1 n2
Computing SSE, we obtain
nj
2 X
X
SSE D .Xij X j /2 D .n1 1/SX21 C .n2 1/SX22 ;
j D1 i D1

and so
SSE .n1 1/SX21 C .n2 1/SX22
D ;
n1 C n2 2 n1 C n2 2
which is just the pooled variance Sp2 . Therefore,
2
SSTR X 1 X 2
D    F .1; n1 C n2 2/:
SSE=.n1 C n2 2/ 2
1 1
Sp C
n1 n2
148 5. HYPOTHESIS TESTING
We reject H0 W 1 D 2 at level ˛ if

.x 1 x 2 /2
f D    F .1; n1 C n2 2; ˛/:
1 1
sp2 C
n1 n2

However, we know that F .1; m/ D .t .m//2 , and so we reject H0 when

.x 1 x 2 /2
   .t.n1 C n2 2; ˛//2 ” jtj  t.n1 C n2 2; ˛/
1 1
C s2
n1 n2 p
where t is the value of the statistic
X 1 X 2
T D r .
1 1
Sp C
n1 n2
This is exactly the condition under which we reject H0 using the T statistic for a two-sample
t -test. The two methods are equivalent in that one of the methods will reject H0 if and only if
the other method does.

5.7 PROBLEMS
5.1. An industrial drill bit has a lifetime (measured in years) that is a normal random variable
X  N.; 2/. A random sample of the lifetimes of 100 bits resulted in a sample mean
of x D 1:3.
(a) Perform a test of hypothesis H0 W 0 D 1:5 vs. H1 W 0 ¤ 1:5 with ˛ D 0:05.
(b) Compute the probability of a Type II error when  D 2. That is, compute ˇ.2/.
(c) Find a general expression for ˇ./:
5.2. Suppose a test of hypothesis is conducted by an experimenter for the mean of a normal
random variable X  N.; 3:2/. A random sample of size 16 is taken from X . If H0 W
 D 42:9 is tested against H1 W  ¤ 42:9, and the experimenter rejects H0 if x is in the
region . 1; 41:164 [ Œ44:636; 1/, what is the level of the test ˛ ?
5.3. Compute the p-values associated with each of the following sample means x computed
from a random sample from a normal random variable X . Then decide if the null
hypothesis should be rejected if ˛ D 0:05.
(a) H0 W  D 120 vs. H1 W  < 120, n D 25,  D 18, x D 114:2.
(b) H0 W  D 14:2 vs. H1 W  > 14:2, n D 9,  D 4:1, x D 15:8.
5.7. PROBLEMS 149
(c) H0 W  D 30 vs. H1 W  ¤ 30, n D 16,  D 6, x D 26:8.
5.4. An engineer at a prominent manufacturer of high performance alkaline batteries is at-
tempting to increase the lifetime of its best-selling AA battery. The company’s cur-
rent battery functions for 100:3 hours before it has to be recharged. The engineer ran-
domly selects 15 of the improved batteries and discovers that the mean operating time
is x D 105:6 hours. The sample standard deviation is s D 6:25.
(a) Perform a test of hypotheses
H0 W  D 100:3 vs. H1 W  > 100:3 with ˛ D 0:01.
(b) Compute the power function value .103/.
(c) Compute the general form of the power function ./.
5.5. Suppose a random sample of size 25 is taken from a normal random variable X 
N.;  / with  unknown. For the following tests and test levels, determine the crit-
X 0
ical regions if t represents the value of the test statistic T D :
sX =5
(a) H0 W  > 0 vs. H1 W  D 0 , ˛ D 0:01.
(b) H0 W  < 0 vs. H1 W  D 0 , ˛ D 0:02.
(c) H0 W  D 0 vs. H1 W  ¤ 0 , ˛ D 0:05.
5.6. Compute the p-values associated with each of the following sample means x computed
from a random sample from a normal random variable X  N.;  / where  is un-
known. Then decide if the null hypothesis should be rejected if ˛ D 0:05.
(a) H0 W  D 90:5 vs. H1 W  < 90:5, s D 9:5, n D 17, x D 85:2.
(b) H0 W  D 20:2 vs. H1 W  > 20:2, s D 6:3, n D 9, x D 21:8.
(c) H0 W  D 35 vs. H1 W  ¤ 35, s D 11:7, n D 20, x D 31:9.
5.7. Consider the calculation of a Type II error for a test for the variance of a random variable
X  N.;  /. Derive the general forms of ˇ.12 / for the following tests.
 2 D 02 vs. H1 W  2 < 02 has ˇ.12 /
(a) H0 W 
2
D P 2 .n 1/ > 02 2 .n 1; 1 ˛/ .
1

(b) H0 W  2
 D 02 vs. H1 W  2 > 02 hasˇ.12 /
02
D P 2 .n 1/ < 12
2 .n 1; ˛/ :

 2 D 02 vs. H1 W  2 ¤ 02 has ˇ.12 /


(c) H0 W  
2 02
D P 02 2 .n 1; 1 ˛=2/ < 2 .n 1/ < 12
2
 .n 1; ˛=2/ .
1
150 5. HYPOTHESIS TESTING
5.8. Pet Friendly, a pet supply company, sells 25-lb bags of a popular brand of cat litter,
OdorGone. When properly filled, the bags have a standard deviation of 1 lb of litter. A
random sample of 20 bags of litter are weighed with the results listed below.

25:23 25:50 26:18 25:44 26:04 26:01 25:30 24:49 25:21 25:68
25:18 25:01 26:09 24:49 24:54 25:12 25:84 24:22 25:14 25:67

(a) Perform a test of hypothesis H0 W  2 D 1 vs. H1 W  2 > 1 with ˛ D 0:05.


(b) Compute the probability of a Type II error when  2 D 1:5, and then compute
.1:5/.
(c) Determine the general form of . 2 /.
5.9. Compute the p-values associated with each of the following sample variances sX2 com-
puted from a random sample from a normal random variable X  N.; /. Then decide
if the null hypothesis should be rejected if ˛ D 0:05.
(a) H0 W  2 D 4:8 vs. H1 W  2 < 4:8, sX2 D 3:4, n D 60.
(b) H0 W  2 D 2:3 vs.H1 W  2 > 2:3, sX2 D 3:1, n D 35.
(c) H0 W  D 9 vs. H1 W  < 9, sX D 7:1, n D 20.
5.10. An electrical supply company produces devices that operate using a thermostatic control.
The standard deviation of the temperature at which these devices actually operate should
not exceed 2ı C. The quality control department tests H0 W  D 2 vs. H1 W  > 2. A
random sample of 30 devices is taken, and it is determined that s D 2:39ı C.
(a) Conduct the test with ˛ D 0:05.
(b) Compute the p-value of the test.
(c) Compute the probability of a Type II error when  D 3. Compute .3/.
5.11. The personnel department at a suburban Chicago skilled nursing facility wants to test
the hypothesis that  D 2 hours for the time it takes a nurse to complete his or her
round of patients against the alternative hypothesis that  ¤ 2. A random sample of
the times required by 30 nurses to complete their rounds resulted in a sample standard
deviation of s D 1:8 hours. Perform a test of hypothesis H0 W  D 2 vs. H1 W  ¤ 2 with
˛ D 0:05.
5.12. The production of plastic sheets used in the construction industry is monitored at a
company’s plant for possible fluctuations in thickness (measured in millimeters, mm).
If the variance in the thickness of the sheets exceeds 2.25 square millimeters, there is
cause for concern about product quality. The production process continues while the
variance appears to be smaller than the cutoff. Thickness measurements for a simple
5.7. PROBLEMS 151
random sample of 10 sheets produced during a particular shift were taken, yielding the
following result.

226; 226; 227; 226; 225; 228; 225; 226; 229; 227

(a) Perform a test of hypotheses H0 W  2 D 2:25 vs. H1 W  2 < 2:25 with ˛ D 0:05.
(b) Calculate the Type II error ˇ.2/:

5.13. Cars intending on making a left turn at a certain intersection are observed. Out of 600
cars in the study, 157 of them pulled into the wrong lane.

(a) Perform a test of hypothesis H0 W p0 D 0:30 vs. H1 W p0 ¤ 0:30 with ˛ D 0:05


where p0 is the true proportion of cars pulling into the wrong left turn lane.
(b) Compute the p-value of the test.
(c) Compute ˇ.0:27/.
1 1
5.14. A die is tossed 800 times. Consider the hypothesis H0 W p0 D 6
vs. H1 W p0 ¤ 6
where
p0 is the proportion of 6s that appear.

(a) What is the range on the number of times x that a 6 would have to be rolled to
reject H0 at the ˛ D 0:05 level?
(b) What is the range on the number of times x that a 6 would have to be rolled to
retain H0 at the ˛ D 0:01 level?

5.15. (Small Sample Size) A cosmetics firm claims that its new topical cream reduces the
appearance of undereye bags in 60% of men and women. A consumer group thinks the
percentage is too high and conducts a test of hypothesis H0 W p0 D 0:6 vs. H1 W p0 <
0:6. Out of 8 men and women in a random sample, only 3 saw a significant reduction
in the appearance of their undereye bags.

(a) Perform a test of hypothesis H0 W p0 D 0:6 vs. H1 W p0 < 0:6 with ˛ D 0:05 by
computing the p-value of the test.
(b) What is the critical region of the test H0 W p0 D 0:6 vs. H1 W p0 ¤ 0:6 if ˛ D 0:1?

5.16. The director of a certain university tutoring center wants to know if there is a difference
between the mean lengths of time (in hours) male and female freshman students study
over a 30-day period. The study involved a random sample of 34 female and 29 male
students. Sample means and standard deviations are listed in the table below.

Female .X/ m D 34 x D 105:5 sX D 20:1


Male .Y / n D 29 y D 90:9 sY D 12:2
152 5. HYPOTHESIS TESTING
(a) Can it be assumed that X2 D Y2 ? Test the hypothesis H0 W X2 D Y2 vs. H1 W X2 ¤
Y2 at the ˛ D 0:05 level.
(b) Depending on the outcome of the test in (a), test the hypothesis H0 W X D Y
vs. H1 W X ¤ Y again at the ˛ D 0:05 level.
(c) What is the p-value of the test in (b)?
5.17. Random samples of sizes m D 11 and n D 10 are taken from two independent nor-
mal random variables X and Y , respectively. The samples yield sX2 D 6:8 and sY2 D 7:1.
Perform the following tests.
(a) H0 W X2 D Y2 vs. H1 W X2 > Y2 with ˛ D 0:1.
(b) H0 W X2 D Y2 vs. H1 W X2 < Y2 with ˛ D 0:05.
(c) H0 W X2 D Y2 vs. H1 W X2 ¤ Y2 with ˛ D 0:01.
5.18. A zookeeper at a major U.S. zoo wants to know if polar bears kept in captivity have
a lower birth rate than those in the wild. The peak reproductive years are between 15
and 30 years of age in both captive and wild bears. Random samples of the number of
cubs born to female captive bears and wild bears (from a certain Arctic population) were
taken with the following results.

Captive Bears .X/ Wild Bears .Y /


m D 24 n D 18
x D 19:1 y D 16:3
sX D 2:3 sY D 4:1

The variances of the two populations are unknown but assumed equal.
(a) Perform a test of hypothesis H0 W X D Y vs. H1 W X > Y with ˛ D 0:05.
(b) Compute the p-value of test in (a).
5.19. An entomologist is making a study of two species of lady bugs (Coccinellidae). She is
interested in whether there is a difference between the number of spots on the carapace
of the two species. She takes a random sample of 20 insects from each species and counts
the number of spots. The results are presented in the table below.

Species 1 .X / Species 2 .Y /
m D 20 n D 20
x D 3:8 y D 3:6
sX D 1:2 sY D 1:3

(a) Perform a test of hypothesis H0 W X D Y vs. H1 W X ¤ Y with ˛ D 0:05.


5.7. PROBLEMS 153
(b) Compute the p-value of test in (a).
5.20. Random samples of sizes m D 9 and n D 14 are taken from two independent normal
random variables X and Y , respectively. The variances are unknown but assumed equal.
Assume that the pooled variance sp2 D 3581:6. If H0 W X D Y vs. H1 W X ¤ Y is
to be tested with ˛ D 0:05, what is the smallest value of jx yj that will result in the
null hypothesis being rejected?
5.21. A nutritionist wishes to study the effects of diet on heart disease and stroke in middle-
aged men. He conducts a study of 1,000 randomly selected initially healthy men between
the ages of 45 and 60. Exactly half of the men were placed on a restricted diet (X sample)
while the other half were allowed to continue their normal diet (Y sample). After an
8-year period, 85 men in the diet group had died of myocardial infarction (heart attack)
or cerebral infarction (stroke) while 93 men died of heart attack or stroke in the control
group.
(a) Perform a test of hypothesis H0 W pX D pY vs. H1 W pX ¤ pY with ˛ D 0:05.
(b) What is the p-value of the test in (a)?
(c) Calculate ˇ. :05/.
5.22. A utility outfielder on the Chicago Cubs had a batting average of 0:276 out of 300
at bats last season and a batting average of 0:220 out of 235 at bats this past season.
Management wants to reduce the salary of the player for next season since his perfor-
mance appears to have degraded. Does management have a sound statistical argument
for cutting the player’s salary?
(a) Perform a suitable test of hypothesis to either validate or invalidate management’s
decision.
(b) Compute the p-value of the test in (a).
5.23. A random sample of 550 Californians (X sample) and 690 Iowans (Y sample) were
asked if they would like to visit Europe within the next ten years with the result that
61% of Californians and 53% of Iowans would like to take the trip.

(a) Perform a test of hypothesis H0 W pX D pY vs. H1 W pX > pY with ˛ D 0:01.


(b) Compute the p-value of the test in (a).
(c) Compute ˇ.0:1/.
5.24. Ten students at a certain high school are randomly chosen to participate in a study of the
effectiveness of a taking a course in formal logic on their abstract reasoning capability.
The students take a test measuring abstract reasoning before and after taking the course.
The results of the two tests are displayed in the following table.
154 5. HYPOTHESIS TESTING
Student i Score Before Course Xi Score After Course Yi Difference D D Yi Xi
1 74 78 4:0
2 83 79 4:0
3 75 76 1:0
4 88 85 3:0
5 84 86 2:0
6 63 67 4:0
7 93 93 0:0
8 84 83 1:0
9 91 94 3:0
10 77 76 1:0

(a) Perform a paired t test of hypothesis H0 W D D 0 vs. H1 W D > 0 with ˛ D 0:05.

(b) What is the p-value of the test in (a)?

5.25. A new diet requires that certain food items be weighed before being consumed. Over
the course of a week, a person on the diet weighs ten food items (in ounces). Just to make
sure of the weight, she weighs the items on two different scales. The weights indicated
on the scales are close to one another, but are not exactly the same. The results of the
weighings are given below.

Food Item i Weight Xi on Scale 1 Weight Yi on Scale 2 Difference Di D Xi Yi


1 19:38 19:35 0:03
2 12:40 12:45 0:05
3 6:47 6:46 0:01
4 13:47 13:52 0:05
5 11:23 11:27 0:04
6 14:36 14:41 0:05
7 8:33 8:35 0:02
8 10:50 10:52 0:02
9 23:42 23:41 0:01
10 9:15 9:17 0:02

(a) Perform a paired t test of hypothesis H0 W D D 0 vs. H1 W D ¤ 0 with ˛ D


0:05.

(b) What is the p-value of the test in (a)?

5.26. The table on the left lists the percentages in the population of each blood type in India.
The table on the right is the distribution of 1,150 blood types in a small northern Indian
town.
5.7. PROBLEMS 155
Blood Type Percentage of Population Blood Type Number of Residents
O+ 27:85 O+ 334
A+ 20:8 A+ 207
B+ 38:14 B+ 448
AB+ 8:93 AB+ 92
O- 1:43 O- 23
A- 0:57 A- 12
B- 1:79 B- 23
AB- 0:49 AB- 11
Does the town’s distribution of blood types conform to the national percentages for
India? Test at the ˛ D 0:01 level.
5.27. A six-sided die is tossed independently 180 times. The following frequencies were ob-
served.
Side 1 2 3 4 5 6
Frequency # 30 30 30 30 60 #
For what values of # would the null hypothesis that the die is fair be rejected at the
˛ D 0:05 level?

5.28. The distribution of colors in the candy M&Ms has varied over the years. A statistics
student conducted a study of the color distribution of M&Ms made in a factory in
Tennessee. After the study, she settles on the following distribution of the colors blue,
orange, green, yellow, red, and brown at the Tennessee plant.

Color Percentage (Tennessee) Color Number (New Jersey)


Blue 20:7 Blue 213
Orange 20:5 Orange 208
Green 19:8 Green 183
Yellow 13:5 Yellow 131
Red 13:1 Red 137
Brown 12:4 Brown 128

She wanted to see if the same distribution held at another plant located in New Jersey.
A sample of 1,000 M&Ms were inspected for color at the New Jersey plant. The results
are in the table above right. Is the New Jersey plant’s distribution of colors the same as
the Tennessee plant’s distribution? Test at the ˛ D 0:05 level.
5.29. At a certain fishing resort off the Southeastern cost of Florida, a record is kept of the
number of sailfish caught daily over a 60-day period by the guests staying at the resort.
156 5. HYPOTHESIS TESTING
The results are in the table below.

Sailfish Caught 0 1 2 3 4 5 6
No. Days 8 14 14 17 3 3 1

An ecologist is concerned about declining fish populations in the area. He proposes that
the data follows a Poisson distribution with a Poisson rate  D 2. Does it appear that
the data follows such a distribution? Test at the ˛ D 0:05 level.
5.30. A homeowner is interested in attracting hummingbirds to her backyard by installing a
bird feeder customized for hummingbirds. Over a period of 576 days, the homeowner
observes the number of hummingbirds visiting the feeder during a certain half-hour
period during the afternoon.

Hummingbird Visits 0 1 2 3 4 5
No. Days 229 211 93 35 7 1

(a) Is it possible the data in the table follows a Poisson distribution? Test at the ˛ D
0:05 level. What value of  should be used?
(b) Apply the test with  D 0:8

5.31. A traffic control officer is tracking speeders on Lake Shore Drive in Chicago. In par-
ticular, he is recording (from a specific vantage point) the time intervals (interarrival
times) between drivers that are speeding. A sample of 100 times (in seconds) are listed
in the table below.

Interval Œ0; 20/ Œ20; 40/ Œ40; 60/ Œ60; 90/ Œ90; 120/ Œ120; 180/ Œ180; 1/
No. in the
41 19 16 13 9 2 0
Interval

The officer hypothesizes that the interarrival time data follows an exponential distribu-
tion with mean 40 seconds. Test his hypothesis at the ˛ D 0:05 level.
5.32. A fisherman, who also happens to be a statistician, is counting the number of casts he has
to make in a certain small lake in southern Illinois near his home before his lure is taken
by a smallmouth bass. The data below represents the number of casts until achieving
50 strikes while fishing during a recent vacation. The fisherman hypothesizes that the
number of casts before a strike follows a geometric distribution. Test his hypothesis at
the ˛ D 0:01 level.

Casts to Strike 1 2 3 4 5 6 7 8 9
Frequency 4 13 10 7 5 4 3 3 1
5.7. PROBLEMS 157
5.33. A criminologist is studying the occurrence of serious injury due to criminal violence for
certain professions. A random sample of 490 causes of injury in the chosen professions
is taken. The results are displayed in the table. Does it appear that serious injury due to
criminal violence and choice of profession are independent? Test at the ˛ D 0:01 level.

Profession
Police Cashier Taxi Driver Security Row
(P) (C) (T) Guard (S) Totals
Criminal Violence (V) 82 107 70 59 318
Other Causes (O) 92 9 29 42 172
Column Totals 174 116 99 101 490

5.34. For a receiver in a wireless device, like a cell phone, for example, two important charac-
teristics are its selectivity and sensitivity. Selectivity refers to a wireless receiver’s capa-
bility to detect and decode a desired signal in the presence of other unwanted interfering
signals. Sensitivity refers to the smallest possible signal power level at the input which
assures proper functioning of a wireless receiver. A random sample of 170 radio receivers
produced the following results.

Sensitivity
Low (LN) Average (AN) High (HN) Row Totals

Low (LS) 6 12 12 30
Selectivity Average (AS) 33 61 18 112
High (HS) 13 15 0 28
Column Totals 52 88 30 170

Does it appear from the data that selectivity and sensitivity are dependent traits of a
receiver? Test at the ˛ D 0:01 level.
5.35. Consider again the small northern Indian town with 1,150 residents in problem 5.26
that has the following distribution of blood types.

Blood Type O+ A+ B+ AB+ O- A- B- AB-


No.Residents 334 207 448 92 23 12 23 11
Test the hypothesis that blood type and rH factor (positive or negative) are independent
traits at the ˛ D 0:05 level.
5.36. A random sample of 300 adults in a certain small city in Illinois are asked about their
favorite PBS programs. In particular, they are asked about their favorite shows among
158 5. HYPOTHESIS TESTING
home improvement shows, cooking shows, and travel shows. The results of the survey
are listed in the table below.
PBS Program
Home Improvement Cooking Travel Row
(H) (C) (T) Totals
Gender Male (M) 45 65 50 160
Female (F) 55 45 40 140
Column Totals 100 110 90 300

Is gender and the genre of television program independent? Test the hypothesis at the
˛ D 0:05 level.
5.37. A ketogenic diet is a type of low-carbohydrate diet whose aim is to metabolize fats into
ketone bodies (water-soluble molecules acetoacetate, beta-hydroxybutyrate, and acetone
produced by the liver) rather than into glucose as the body’s main source of energy. A di-
etician wishes to study whether the proportion of adult men on ketogenic diets changes
with age. She samples 100 men currently on diets in each of five age groups: Group I:
20–25, Group II: 26–30, Group III: 31–35, Group IV: 36–40, and Group V: 41–45.
Her results are displayed in the table below. Let pi , i 2 fI; II; III; I V; V g denote the
proportion of men on a ketogenic diet in group i . Test whether the proportions are the
same across the age groups. Conduct the test at the ˛ D 0:05 level.

Group
(I) (II) (III) (IV) (V) Row Totals
Diet Ketogenic (K) 26 22 25 20 19 112
Nonketogenic (N) 74 78 75 80 81 388
Column Totals 100 100 100 100 100 500

5.38. Suppose the result of an experiment can be classified as having one of three mutually
exclusive A traits, A1 , A2 , and A3 , and also as having one of four mutually exclusive
B traits, B1 , B2 , B3 , and B4 . The experiment is independently repeated 300 times with
the following results.

B Trait
B1 B2 B3 B4 Row Totals
A1 25 5# 25 # 25 C # 25 C 5# 100
A Trait A2 25 25 25 25 100
A3 25 C 5# 25 C # 25 # 25 5# 100
Column Totals 75 75 75 75 300
5.7. PROBLEMS 159
What is the smallest integer value of # for which the null hypothesis that the traits are
independent is rejected? Test at the ˛ D 0:05 level. (Note that 0  #  5.)

5.39. A random sample of size n D 10 of top speeds of Indianapolis 500 drivers over the past
30 years is taken. The data is displayed in the following table.

Top Speeds (in mph)


202:2 203:4 200:5 206:3 198:0
203:7 200:8 201:3 199:0 202:5

Test the set of hypotheses

H0 W m0 D 200 vs. H1 W m0 > 200

at the ˛ D 0:05 level.

5.40. Random samples of size 6 are taken from three normal random variables A, B, and C
having equal variances. The results are displayed in the table below.

Variable A Variable B Variable C


(A) (B) (C)
15:75 12:63 9:37
11:55 11:46 8:28
11:16 10:77 8:15
9:92 9:93 6:37
9:23 9:87 6:37
8:20 9:42 5:66

Test if A D B D C at the ˛ D 0:01 level by carrying out the following steps.

(a) Compute X A , X B , X C , and X  . (e) Compute the p-value of f .


(b) Compute SSTR, SSE, and SST . (f ) Display the ANOVA table.
(c) Compute MSTR and MSE. (g) Test if A D B D C at the ˛ D
(d) Compute f: 0:01 level.

5.41. Alice, John, and Bob are three truck assembly plant workers in Dearborn, MI. The times
(in minutes) each requires to mount the windshield on a particular model of truck the
160 5. HYPOTHESIS TESTING
plant produces are recorded on five randomly chosen occasions for each worker.

Alice John Bob


(A) ( J) (B)
8 10 11
10 8 10
10 9 10
11 9 9
9 8 9

Assuming equal variances, test if A D J D B at the ˛ D 0:05 level by carrying out


the following steps.

(a) Compute X A , X J , X B , and X  . (e) Compute the p-value of f .


(b) Compute SSTR, SSE, and SST . (f ) Display the ANOVA table.
(c) Compute MSTR and MSE. (g) Test if A D J D B at the ˛ D
(d) Compute f: 0:05 level.

5.42. A new drug, AdolLoft, was developed to treat depression in adolescents. A research
study was established to assess the clinical efficacy of the drug. Patients suffering from
depression were randomly assigned to one of three groups: a placebo group (P), a low-
dose group (L), and a normal dose group (N). After six weeks, the subjects completed
the Beck Depression Inventory (BDI-II, 1996) assessment which is composed of ques-
tions relating to symptoms of depression such as hopelessness and irritability, feelings
of guilt or of being punished, and physical symptoms such as tiredness and weight loss.
The results of the study on the three groups of five subjects is given below.

Placebo (P) Low Dose (L) Normal Dose (N)


38 31 11
42 c23 5
25 22 26
47 19 18
39 8 14
For the BDI-II assessment, a score of 0–13 indicates minimal depression, 14–19 indi-
cates mild depression, 20–28 indicates moderate depression, and 29–63 indicates severe
depression. Assuming identical variances, test if P D L D N at the ˛ D 0:01 level.
5.43. Milk from dairies located in central Illinois is tested for Strontium-90 contamination.
The dairies are located in three counties: Macon (5 dairies), Sangamon (6 dairies), and
5.7. PROBLEMS 161
Logan (5 dairies). Contamination is measured in picocuries/liter.

Macon Sangamon Logan


(M) (S) (L)
11:7 8:8 12:1
10:4 9:9 9:5
9:5 11:2 9:0
13:8 10:5 10:3
12:0 9:1 8; 7
8:5

Assuming equal variances, test if M D S D L at the ˛ D 0:05 level.


5.44. Consider the following partially completed ANOVA table.

Source DF SS MSS F-statistic p-Value


Treatment 2:124 0:708 0:75
Error 20 * *
Total * * *

Fill in the missing values.


5.45. The police department of a Chicago metropolitan suburb is conducting a study of marks-
manship skill among police officers that have served varying lengths of time in the de-
partment. Groups of randomly selected officers that have served 5 years, 10 years, 15
years, and 20 years in the department are selected for the study. There are five officers
in each group. Each officer is given 75 shots at a target at a distance of 25 yards, and
the number of bull’s-eyes are recorded.

5 Years 10 Years 15 Years 20 Years


(5Y) (10Y) (15Y) (20Y)
60 55 45 42
59 57 39 41
62 64 43 39
56 61 45 43
55 50 46 45

Assuming equal variances, test if 5Y D 10Y D 15Y D 20Y at the ˛ D 0:01 level by
carrying out the following steps.
162 5. HYPOTHESIS TESTING
(a) Compute X 5Y , X 10Y , X 15Y , (e) Compute the p-value of f .
X 20Y , X  .
(f ) Display the ANOVA table.
(b) Compute SSTR, SSE, and SST:
(c) Compute MSTR and MSE. (g) Test if 5Y D 10Y D 15Y D 20Y
(d) Compute f: at the ˛ D 0:01 level.

5.8 SUMMARY TABLES


The following tables summarize the tests of hypotheses developed in this chapter.
Parameter Conditions Test Decision Rule
 D 0 known

H0 W  D 0
.XN p0 / Reject H0 if z  z˛
0 = n
test statistic: Z D H1 W  > 0
H0 W  D 0
Reject H0 if z  z˛
H1 W  < 0
 unknown
 1; ˛/
H0 W  D 0
.XN 
p0 /
Reject H0 if t  t.n
SX = n
test statistic: T D H1 W  > 0

1; ˛/
H0 W  D 0
Reject H0 if t  t .n
H1 W  < 0
 D 0 known
2

H0 W  2 D 02
2 Xi 0
n  2
P Reject H0 if 2  2 .n; ˛/
0
test statistic: X D H1 W  2 > 02
i D1

˛/
H0 W  2 D 02
Reject H0 if 2  2 .n; 1
H1 W  2 < 02
 unknown
Xi
n  2
XN n
2
P
 0 1; ˛/
test statistic: X 2 D H0 W  2 D 02
Reject H0 if 2  2 .n
2
i D1 H1 W  2 > 02
.n 1/SX
02
D

1; 1 ˛/
H0 W  2 D 02
Reject H0 if 2  2 .n
H1 W  2 < 02

One-sided tests for normal parameters


5.8. SUMMARY TABLES
163
164

Parameter Conditions Test Decision Rule


5. HYPOTHESIS TESTING

 D 0 known

H0 W  D 0
.XN p0 / Reject H0 if jzj  z˛=2
0 = n
test statistic: Z D H1 W  ¤ 0
 unknown
 1; ˛=2/
H0 W  D 0
.XN 
p0 /
Reject H0 if jtj  t.n
SX = n
test statistic: T D H1 W  ¤ 0

˛=2/
 D 0 known
2
H0 W  2 D 02 Reject H0 if 2  2 .n; 1
2 Xi 0
n 
P 2
0
test statistic: X D H1 W  2 ¤ 02 or 2  2 .n; ˛=2/
iD1
 unknown
Xi 1; 1 ˛=2/
n  2
XN
2
P
 0
test statistic: X 2 D H0 W  2 D 02 Reject H0 if 2  2 .n
iD1
2
H1 W  2 ¤ 02 or 2  2 .n 1; ˛=2/
.n 1/SX
02
D

Two-sided tests for normal parameters


Parameter Conditions Test Decision Rule
X2 D 2
0;X and Y2 2 known
D 0;Y
Y .XN m YNn / d0 H0 W d D d0
2 2
d D X test statistic: Z D r Reject H0 if z  z˛
0;X 0;Y H1 W d > d 0
m C n
H0 W d D d0
Reject H0 if z  z˛
H1 W d < d 0
X2 ; Y2 unknown, X2 D Y2
,
1 1
.XN mqYNn / d0
test statistic: T D
Y Sp mCn
H0 W d D d0
2; ˛/
2
d D X Reject H0 if t  t.m C n
.m 1/SX
2 C.n 1/SY H1 W d > d 0
Sp2 D mCn 2
(pooled variance)
2; ˛/
H0 W d D d0
Reject H0 if t  t .m C n
H1 W d < d 0
Reject ˛/,
1 1
6 H0 if t  t .;
2 7
m rC n
X2 ; Y2 unknown, X2 ¤ Y2 6 7
6 7
.X 1 1
r
N N H0 W d D d0
Y D4 5,
2 m2 .m 1/ n2 .n 1/
d D X rm Yn / d0 2C
S2
test statistic: T D
SX H1 W d > d 0
m C nY 2
sX
rD 2
sY
Reject ˛/,
1 1
6 H0 if t  t.;2 7
m rC n
6 7
6 7
1 1
r
H0 W d D d0 D4 5,
m2 .m 1/ n2 .n 1/
2C
H1 W d < d 0
2
sX
rD 2
sY
2 X ; Y unknown
X 2 1; n 1; ˛/
H0 W r D r 0
rD 2
Y
SX 1 Reject H0 if f  F .m
test statistic: F D 2
SY
 r0 H1 W r > r 0

1; n 1; 1 ˛/
H0 W r D r 0
Reject H0 if f  F .m
H1 W r < r 0
2
D unknown
D 1; ˛/
H0 W D D d0
D (paired samples) N d0 Reject H0 if t  t.m
p
SD = m
test statistic: T D H1 W D > d0

1; ˛/
H0 W D D d0
Reject H0 if t  t .m
H1 W D < d0
5.8. SUMMARY TABLES

One-sided tests for the difference of two normal parameters


165
166

Parameter Conditions Test Decision Rule


2 2
X2 D and
0;X Y2 D 0;Y known
. X m Y
Reject H0 if
Y N N n / d0 H0 W d D d0
2
r
2
d D X test statistic: Z D
0;X H1 W d ¤ d0 jzj  z˛=2
m n
C 0;Y
X2 ; Y2 unknown, X2 D Y2
5. HYPOTHESIS TESTING

1 1
.XN mqYNn / d0
, Reject H0 if
Sp
test statistic: T D
Y
H0 W d D d0
mCn
2; ˛=2/
d D X
.m 1/SX Y
2 C.n 1/S 2 H1 W d ¤ d0 jtj  t .m C n
mCn 2
Sp2 D
(pooled variance)
Reject H0 if
2
X2 ; Y2 unknown, X2 ¤ Y2
.X . m1 rC n1 /
jtj t .; ˛=2/, 
Y N N H0 W d D d0
d D X test statistic: T D 2
rm Yn / d0
2 1 2 1 ,
SX SY
D
m2 .m 1/ n2 .n 1/
H1 W d ¤ d0 r C
m C n 2
sX
rD 2
sY

2 X ; Y unknown Reject H0 if
X
2
2
H0 W r D r0
Y
SX 1
2
rD f  F .m 1; n 1; ˛=2/
SY r0
test statistic: F D  H1 W r ¤ r0
or f  F .m 1; n 1; 1 ˛=2/
2 Reject H0 if
D unknown
D (paired samples)
H0 W D D d0
D
N d0
p
t  t.m 1; ˛=2/
SD = m
test statistic: T D H1 W D ¤ d0
or t  t.m 1; ˛=2/

Two-sided tests for the difference of two normal parameters


Parameter Conditions Test Decision Rule
approximate test Reject H0 if
p N
H0 W p D p0
p0 .1 p0 /
test statistic: Z D q X p0
n
H1 W p ¤ p0 jzj  z˛=2
H0 W p D p0
Reject H0 if z  z˛
H1 W p > p 0
H0 W p D p0
Reject H0 if z  z˛
H1 W p < p 0
approximate test Reject H0 if
pY
H 0 W d D d0
.XN m YNn / d0
d D pX test statistic: Z D q
X
N m .1 X Nm/ N N
,
m
H 1 W d ¤ d0 jzj  z˛=2
C Yn .1n Yn /
H0 Wd D d0
H1 > d0
Reject H0 if z  z˛
Wd
H0 Wd D d0
H1 < d0
Reject H0 if z  z˛
Wd
approximate test
n/ Reject H0 if
pY test statistic: Z D p .XN m YNq
1 1
, H0 W d D 0
PN0 .1 PN0 / m
d D pX Cn H1 W d ¤ 0 jzj  z˛=2
PN0 D mXN m CnYNn
mCn
(pooled proportion)
H0 Wd D0
H1 >0
Reject H0 if z  z˛
Wd
H0 Wd D0
H1 <0
Reject H0 if z  z˛
Wd

Tests for binomial proportions


5.8. SUMMARY TABLES
167
169

CHAPTER 6

Linear Regression
6.1 INTRODUCTION AND SCATTER PLOTS
In this chapter we discuss one of the most important topics in statistics. It provides us with a way
to determine an algebraic relationship between two variables x and the random variable Y which
depends on x and ultimately to use the relationship to predict one variable from knowledge of
the other. For instance, is there a relationship between a student’s GPA in high school and the
GPA in college? Is there a connection between a father’s intelligence and a daughter’s? Can
we predict the price of a stock from the S&P 500 index? These are all matters regression can
consider.
We have a data set of pairs of points f.xi ; yi /; i D 1; 2 : : : ; ng: The first step is to create a
scatterplot of the points. For example, we have the plots in Figures 6.1 and 6.2.

20 20

15 15

y
y

10 10

5 5

0 0
0 5 10 15 20 0 5 10 15 20
x x

Figure 6.1: Scatterplot of x -y data. Figure 6.2: Scatterplot of x -y data with line
fit to the data.

It certainly looks like there is some kind of relationship between x and y and it looks like it
could be linear. By eye we can draw a line that looks like it would be a pretty good fit to the data.
The questions which we will answer in this chapter are as follows.
• How do we measure how good using a line to approximate the data will be?
• How do we find the line which approximates the data as well as possible?
• How do we use the line to make predictions and quantify the errors?
First we turn to the problem of finding the best fit line to the data.
170 6. LINEAR REGRESSION
6.2 INTRODUCTION TO REGRESSION
If we have no information about an rv Y; the best estimate for Y is EY: The reason for this is
the following fact:
min E.Y a/2 D E.Y EY /2 D Var.Y /:
a

In other words, EY minimizes the mean square distance of the values of Y to any number a:
We have seen this to be true earlier but here is a short recap.
First consider the real valued function f .a/ D E.Y a/2 : To minimize this function take
a derivative and set to zero.

f 0 .a/ D 2E.Y a/ D 0 H) a D EY:

Since f 00 .a/ D 2 > 0; this a D EY provides the minimum. We conclude that if we have no
information about Y ’s distribution and we have to guess something about Y; then a D EY is
the best guess.
Now suppose we know that there is another rv X which is related to Y and we assume
that it is a linear relationship. Think of X as the independent variable and Y as the dependent
variable, i.e., X is the input and Y is the response. We would really like to precisely describe
the linear relationship between X and Y: To do so, we will find constants a; b to minimize the
following function giving the mean squared distance of Y to a C b X :

f .a; b/ D E.Y a bX/2 :

To minimize this function (which depends on two variables) take partial derivatives and set to
zero:
fa D 2E.Y a bX/ D 0; and fb D 2EŒX.Y a bX/ D 0:
Solving these simultaneous two equations for a; b we get
E.XY / EXEY E.X Y / EXEY
bD D and a D EY b EX: (6.1)
E.X 2 / .EX/ 2 Var.X /

Now we rewrite these solutions using the covariance. Recall that the covariance of X; Y is given
by
Cov.X; Y / D E.X EX/.Y EY / D E.X Y / EXEY
Cov.X;Y /
and we can rewrite the slope parameter as b D 2
X
. We have

Cov.X; Y /
bD and a D EY bEX:
X2

Cov.X;Y /
We can also rewrite the slope b using the correlation coefficient .X; Y / D X Y
: We have
6.2. INTRODUCTION TO REGRESSION 171

Cov.X; Y / Cov.X; Y / Y Y
bD 2
D D .X; Y / :
X X Y X X
This gives us the result that b D  XY : We summarize these results.

Proposition 6.1 The minimum of f .a; b/ D E.Y a bX/2 over all possible constants a; b
is provided by b D .X; Y / XY ; a D EY bEX: The minimum value of f is then given by

f .a; b/ D E.Y a bX/2 D .1 2 /Y2 :

Proof. We have already shown the first part and all we have to do is find the value of the function
f at the point that provides the minimum. We first plug in the value of a D EY b EX and
then rearrange:
E.Y a bX/2 D EŒ.Y EY / C b.EX X /2
 
D E .Y EY /2 C 2b.EX X/.Y EY / C b 2 .EX X/2
D Var.Y / C b 2 Var.X / 2bCov.X; Y /
Y2 2 Y Y
D Y2 C 2  2  Y X using b D  ;
X2 X X X
D Y2 C 2 Y2 22 Y2 D .1 2 /Y2 :

Remark 6.2 The regression line, or least squares fit line, we have derived is written as
Y D a C bX D EY bEX C bX D EY C b.X EX/ H)
Y
Y EY D  .X EX/:
X

This shows that the regression line always passes through the point .EX; EY / and has
slope  XY : Consequently, a one standard deviation increase in X from the mean results in a
 Y unit increase from the mean (or decrease if  < 0) in Y:

Example 6.3 Suppose we know that for a rv Y; Y2 D 10 and the correlation coefficient between
X; Y is  D 0:5: If we ignore X and try to guess Y; the best estimate is EY which will give us
an error of E.Y EY /2 D Var.Y / D 10: If we use the information on the correlation between
X; Y we have instead

EŒY .a C bX/2 D 1 2 Y2 D .1 :25/10 D 7:5
172 6. LINEAR REGRESSION
and the error is cut by 25%.

What we have shown is that the best approximation to the rv Y by a linear function
a C bX of a rv X is given by the rv W D a C bX with the constants a; b given by b D  Y =X
and a D EY bEX: It is not true that Y D a C bX but that W D a C bX is a rv with mean
square distance E.Y W /2 D .1 2 /Y2 ; and this is the smallest possible such distance for any
possible constants a; b:

Remark 6.4 Since E.Y a bX/2 D .1 2 /Y2  0; it must be true that jj  1: Also, if
jj D 1; then the only possibility is that Y D a C bX so that Y is exactly a linear function of X:
Now we see that  is a quantitative measure of how well a line approximates Y: The closer
 is to ˙1, the better the approximation. When jj D 1 we say that Y is perfectly linearly
correlated with X: When  D 0 the error of approximation by a linear function is the largest
possible error. When  D 0 we say that Y is uncorrelated with X:
As a general rule, if jj  0:5 the correlation is weak and when jj  0:8 we say the linear
correlation is strong.

There is another interpretation of 2 through the equation E.Y a bX/2 D .1


2 /Y2  0: Rearranging this we have

E.Y a bX/2 Y2 E.Y a bX/2


2 D 1 D :
Y2 Y2

It is common to refer to 2 as the coefficient of determination. It has the interpretation that it


represents the proportion of the total variation in the Y -values, explained by the variation in the
Y -values due to the linear relationship itself. For instance, if Y2 D 1500 and E.Y a bX/2 D
1100; then, by Proposition 6.1, 2 D :2667 so that 26.67% of the variation in the Y -values is
explained by the linear relationship, while about 74% is unexplained. Another way to interpret
2 is that it represents the total proportion of the variation in the Y -values which is reduced by
taking into account the predictor value X:

6.2.1 THE LINEAR MODEL WITH OBSERVED X


So far we have assumed that .X; Y / are both random variables with some joint distribution. In
many applications X is not a random variable but a fixed observable variable and Y is a random
variable whose mean depends in a linear way on the observable X D x: The linear model assumes

Y D a C bx C "; where "  N.0;  /:


The random variable " represents the random noise at the observation x: (See Figure 6.3.) The
mean of Y will change in a linear way with x and for each x the Y values will distribute according
6.2. INTRODUCTION TO REGRESSION 173
to a normal distribution with standard deviation : We have with this model
p p
EY D E.a C bx C "/ D a C bx; and SD.Y / D E.Y a bx/2 D E."2 / D :

If we knew ; then we are saying that for each observed value of x; we have Y  N.a C bx;  /:
Thus, for each fixed observed value x; the mean of Y is a C bx and the Y -values are normally
distributed with SD given by :
Now suppose the data pairs .xi ; yi /; i D 1; 2; : : : ; n; are observations from .xi ; Yi /; where
Y1 ; Y2 ; : : : ; Yn is a random sample from Y D yi D a C b xi C "i ; with independent errors "i 
N.0;  /:

Example 6.5 Suppose the relationship between interest rates and the value of real estate
is given by a simple linear regression model with the regression line Y D 137:5 12:5x C ";
where x is the interest rate and Y is the value of the real estate. We suppose the noise term
"  N.0; 10/. Then for any fixed interest rate x0 ; the real estate will have the distribution
N . 12:5x0 C 137:5; 10/: For instance, if x0 D 8; EY D 37:5; and then

P .Y > 45jx0 D 8/ D normalcdf.45; 1; 12:5.8/ C 137:5; 10/ D 0:2266:

The mean value of real estate when interest rates are 8% is 37.5 and there is about a 22% chance
the real estate value will be above 45.
Notice that because the slope of the regression line is negative, higher interest rates will
give lower values of real estate. Suppose Y1 D 12:5x1 C 137:5 C "1 is an observation when
x1 D 10 and Y2 D 12:5x2 C 137:5 C "2 is an observation when x2 D 9: By properties of the
normal distribution Y1 Y2  N. 12:5.x1 x2 /; 14:14/: What are the chances real estate val-
ues will be higher with rates at 10% rather than at 9%? That is, what is P .Y1 > Y2 /? Here’s the
answer:
P .Y1 Y2 > 0/ D normalcdf.0; 1; 12:5; 14:14/ D 0:1883:
There is an 18.83% chance that values will be higher at 10% interest than at 9%. Next, suppose
the value of real estate at 9% is actually 35. What percentile is this? That’s easy because we are
assuming Y  N.25; 10/ so that P .Y < 35 j x D 9/ D normalcdf. 1; 35; 25; 10/ D 0:841 and
so 35 is the 84th percentile of real estate values when interest rates are 9%. This means that 84%
of real estate values are below 35 when interest rates are 9%.

The term regression implies a return to a less developed state. The question naturally
arises as to why this term is applied to linear regression analysis. Informally, Galton, the first
developer of regression analysis, noticed that tall fathers had tall sons, but not quite so tall as
the father. He termed this as regression to the mean, implying that the height of the sons is
regressing more toward what the mean height should be for men of a certain age. It was also
noticed that students who did very well on an exam would not do quite so well on a retake of
the exam. They were regressing to the mean.
174 6. LINEAR REGRESSION
The mathematical explanation for this is straightforward from the model Y D a C bx C
"; "  N.0;  /: Suppose we fix the input x and take a measurement Y: This measurement itself
follows a normal distribution with mean a C bx and SD : Suppose the measurement is Y D
y1 and this measurement is 1.5 standard units above average. That would put it at the 93.32
percentile. If that’s a test score it’s a good score. On another measurement of Y D y2 for the
same x; 93.32% of the observations are below y1 : This means there is a really good chance the
second observation will be below y1 : There is some chance y2 > y1 ; ( 7%) but it’s unlikely
compared to the chance y2 < y1 : That is what regression to the mean refers to. The regression
fallacy is attributing the change in scores to something important rather than just to the chance
variability around the mean.

6.2.2 ESTIMATING THE SLOPE AND INTERCEPT FROM DATA


Now suppose we have observed data points .xi ; yi /; i D 1; 2; : : : ; n: If we take the least squares
regression line
Y
Y EY D  .X EX/
X
and replace each statistical quantity with its estimate using data, we get the best fit line for the
data as

n n
sY 1X 1X
y yDr .x x/; xD xi ; yD yi
sX n n
i D1 i D1

and the sample correlation coefficient

n
X
1
.xi x/.yi y/ n
n 1 1 X xi x yi y
i D1
rD D ;
sX sY n 1 sX sY
i D1

where the usual formulas for the sample variances are


n
X n
X
1 1
sY2 D .yi y/2 ; sX2 D .xi x/2 :
n 1 n 1
i D1 i D1

This says that the sample correlation coefficient is calculated by converting each pair .xi ; yi / to
standard units and then (almost) averaging the products of the standard units.
If we wish to write the regression equation in slope intercept form we have

sY
y D aO C bO x where aO D y bO x; bO D r :
sX
6.2. INTRODUCTION TO REGRESSION 175
This is the regression equation derived from the probabilistic model Y D a C bx C ": If we
ignore the probability aspects we can derive the same equations as follows. The proof is a calculus
exercise.
n
X
Proposition 6.6 Given data points .xi ; yi /; i D 1; 2; : : : ; n; set f .a; b/ D .yi a bxi /2 :
i D1
sY
Then the minimum of f over all a 2 R; b 2 R is achieved at bO D r ; aO D y O and the
bx
sX
minimum is f .a; O D .n 1/.1 r 2 /s 2 :
O b/ Y

Example 6.7 Suppose we choose 11 families randomly and we let xi Dheight of brother,
yi Dheight of sister. We have the summary statistics x D 69; y D 64; sX D 2:72; sY D 2:569:
The correlation coefficient is given by r D 0:558: The equation of the regression line is therefore

2:569
y 64 D 0:558 .x 69/ D 0:527.x 69/:
2:72
If a brother’s height is actually 69 C 2:72; then the sister’s mean height will be
64 C 0:558.2:569/ D 65:43: The minimum squared error is f .27:637; 0:527/ D 10.1
0:5582 /2:5692 D 45:448: Any other choice of a; b will result in a larger error.

Example 6.8 Suppose we know that a student scored in the 82nd percentile of the SAT exam
in high school and we know that the correlation between high school SAT scores and first-year
college GPA is  D :9: Assuming that high school SAT scores and GPA scores are normally
distributed, what will this student’s predicted percentile GPA be?
This problem seems to not provide enough information. Shouldn’t we know the means
and SDs of the SAT and GPA scores? Actually we do have enough information to solve it.
First, rewrite the regression equation as

y y x x
Dr ;
sY sX

where x D SAT and y D GPA: Knowing that the student scored in the 82nd percentile of SAT
scores, which is assumed normally distributed, tells us that this student’s SAT score in stan-
x x
dard units is invNorm.:82/ D 0:9154 D : Therefore, the student’s GPA score in standard
sX
y y
units is D 0:9.0:9154/ D 0:8239: Therefore, the student’s predicted percentile for GPA is
sY
normalcdf. 1; :8239/ D 0:795; or 79.5 percentile.
176 6. LINEAR REGRESSION
6.2.3 ERRORS OF THE REGRESSION
Next, we need to go further into the use of the regression line to predict y values for given x
values and to determine how well the line approximates the data. We will present an ANOVA
for linear regression to decompose the variation in the dependent variable.
We use the notation

SST D E.Y EY /2 D Y2 ; SSE D E.Y a bX/2 ; SSR D E.a C bX EY /2 :

SST is the total variation in the Y -values and is decomposed in the next proposition into the
variation of the Y -values from the fitted line, .SSE/; and the variance of the fitted values to the
mean of Y , .SSR/.

Proposition 6.9 Let b  D  XY and a D EY b  EX: Then

SST  Y2 D E.Y EY /2 D E.Y a b  X /2 C E.a C b  X EY /2  SSE C SSR:


„ ƒ‚ … „ ƒ‚ … „ ƒ‚ …
SST SSE SSR
(6.2)

Proof.

Y2 D E.Y EY /2 D E.Y a b  X C a C b  X EY /2


D E.Y a b  X/2 C 2E.Y a b  X/.a C b  X EY / C E.EY a b  X/2
D SSE C 0 C SSR:

The middle term is zero because

E.Y a b  X/.a C b  X EY / D Eb  .Y EY /.X EX/ .b  /2 E.X EX/2


Y Y2 2 Y
D E .Y EY /.X EX/ 2  D  Y X
2 X
2 Y2 D 0:
X X X

When we have data points .xi ; yi /, SSE is the variation of the residuals .yi a bxi /2 :
SSR is the variation due to the regression .y a b  xi /2 : Using this decomposition we can
get some important consequences.
SSR SSE
Proposition 6.10 We have (a) SSE D .1 2 /SST , and, (b) 2 D D1 :
SST SST
6.2. INTRODUCTION TO REGRESSION 177
y

N(ax0+b, σ)
Data Point (x1,y1)
y = ax0+b
Residual

Fitted point

x
x0

Figure 6.3: Model for linear regression.

Proof. To show (a), using Proposition 6.1


 
SSE D E.Y a b  X/2 D 1 2 Y2 D 1 2 SST:

Using (a),
SSR SST SSE SSE
SST D SSE C SSR D SST.1 2 / C SSR H) 2 D D D1 :
SST SST SST

Now suppose we are given a line y D a C bx and we have data points f.xi ; yi /g: We will
sY
set bO D r and aO D y bO x as the coefficients of the best fit line when we have the data.
sX

Definition 6.11 The fitted values are yOi D aO C bO xi : These are the points on the regression
line for the associated xi : Residuals are "i D yi yOi ; the difference between the observed data
values and the fitted values.
P
It is always true for the regression line y D aO C bO x that "i D 0 since aO D y bO x (Fig-
ure 6.3). Therefore basing errors on the sum of the residuals won’t work.
We have the observed quantities yi ; the calculated quantities yOi D yOi .xi /; and the resid-
uals "i D yi yOi which is the amount by which the observed value differs from the fitted value.
The residuals are labeled "i because "i D yi aO bx O i:
Another way to think of the "i ’s is as observations from the normal distribution giving
the errors at each xi : We have chosen the line so that
n
X n
X n 
X 2
"2i D .yi yOi /2 D yi aO O i
bx
i D1 i D1 i D1
178 6. LINEAR REGRESSION
is the minimum possible.
In the preceding definitions of SST; SSE; and SSR when we have data points .xi ; yi / we
replace Y by yi ; X by xi , and  by r: We have:
n
X n
X
(a) Error sum of squares, deviation of yi from yOi : SSE D .yi yOi /2 D "2i :
i D1 iD1

n
X
(b) Regression sum of squares, deviation of yOi from y : SSR D .yOi y/2 : This is the amount
i D1
of total variation in the y -values that is explained by the linear model.
n
X
(c) Total sum of squares, deviation of data values yi from y : SST D .yi y/2 : In other
i D1
SST
words n 1
is the sample variance of the y -values.

The algebraic relationships between the quantities SST; SSR; SSE become

SSR SSE 
SST D SSR C SSE and r 2 D D1 and SSE D 1 r 2 SST:
SST SST

We will use the notation


n
X n
X n
X
Syy D .yi y/2 D SST Sxx D .xi x/2 Sxy D .xi x/.yi y/
i D1 i D1 i D1
sY Sxy
aO D y bO x; bO D r D :
sX Sxx

The expression for the slope follows from the computation


1 Pn
sY i D1 .xi x/.yi y/ sY Sxy
bO D r D n 1
D :
sX sX sY sX Sxx

Remark 6.12 For computation, the following formulas can simplify the work:
X X X
Sxx D xi2 nx 2 ; Syy D yi2 ny 2 ; Sxy D xi yi nxy:
6.2. INTRODUCTION TO REGRESSION 179
2
The Estimate of 
Recall that we assumed Y D aO C bx O C "; where "  N.0; /: We have for fixed x the mean of
O and Var.Y / D  2 : The variance  2 measures the spread of
the data given x is E.Y jx/ D aO C bx
O The estimate of  2 is given by
the data around the mean aO C bx:
Pn
2 i D1.yi yOi /2 SSE
s D D :
n 2 n 2
The sample value s is called the standard error of the estimate and represents the deviation of
the y data values from the corresponding fitted values of the regression line. We will see later that
n
1 X
2
S D .Yi aO bO xi /2 is an unbiased estimator of  2 ; and S 2 will be associated with
n 2
i D1
a 2 .n 2/ random variable. The degrees of freedom will be n 2 because of the involvement
of two unknown parameters a; O
O b:
The value of s measures how far above or below the data points are from the regression
line. Since we are assuming a linear model with noise which is normally distributed, we can say
that roughly 68% of the data points will lie within the band created by two parallel lines to the
regression line, one above the line and one below. If the width of this band is 2s; it will contain
roughly 95% of all the data points. Any data point lying outside the 2s band is considered an
outlier.
Remark 6.13 We have shown in Proposition 6.6 that given data points .xi ; yi /; i D 1; 2; : : : ; n;
if we set
n
X
f .a; b/ D .yi a bxi /2 ;
i D1

then the minimum of f over all a 2 R; b 2 R is achieved at bO D r ssXY ; aO D y bO x and the


  
minimum is f a;O bO D .n 1/ 1 r 2 s 2 : It is obvious from the definitions that
Y

   n 
X 2 n
X
O bO D .n
f a; 1/ 1 r 2 sY2 D yi aO bO xi D "2i :
i D1 i D1

Thus, a simple formula for the SE of the regression is


r r
SSE n 1 p
sD D sY 1 r 2:
n 2 n 2

Example 6.14 The table contains data for the pairs (height of father, height of son) as well as the
predicted (mean) height of the son for each given height of the father. The resulting difference of
180 6. LINEAR REGRESSION
the observed height of the son from the prediction, i.e., the residual, is also listed. The residuals
should not exhibit a consistent pattern but be both positive and negative. Otherwise, a line may
be a bad fit to the data.
X , Father 65 63 67 64 68 62
Y , Son 68 66 68 65 69 66
yO , predicted 66.79 65.84 67.74 66.31 68.22 65.36
", residuals 1.21 0.16 0.26 -1.31 0.78 0.64
X , Father 70 66 68 67 69 71
Y , Son 68 65 71 67 68 70
yO , predicted 69.17 67.27 68.22 67.74 68.69 69.65
", residuals -1.17 -2.27 2.72 -0.74 -0.69 0.35
The regression line is y D 35:82480 C 0:476377 x and the sample correlation coefficient
is r D 0:70265: Also, sX D 2:7743; x D 66:67; sY D 1:8809; y D 67:583. The coefficient of de-
termination is r 2 D 0:4937; which means 49% of the variation in the y -values is explained by
the regression. The standard
p error
p of the estimate of  is s D 1:40366 which may be calculated
using s D 1:8809  11=10  1 0:4937:

6.3 THE DISTRIBUTIONS OF aO AND bO


If we look at the model Y D a C bx C "; "  N.0;  /; from a probabilistic point of view we de-
rived that bO D r.x; Y / SsxY and aO D Y ˇx:
O Our point of view is that aO and bO are estimated from
a random sample of Y1 ; : : : ; Yn associated with the fixed deterministic values x1 ; : : : ; xn : Thus,
from this point of view, aO and bO are random variables and we need to know their distributions
to estimate various errors. The following theorem summarizes the results.
Theorem 6.15 O 1 ; : : : ; xn ; Y1 ; : : : ; Yn / and intercept a.x
The slope b.x O 1 ; : : : ; xn ; Y1 ; : : : ; Yn / both
have a normal distribution. In fact,
0 sP 1
n 2  
x
i D1 i A O 
@
aO  N a;  ; and b  N b; p :
nSxx Sxx

n
X
O We have from the fact that
Proof. We will only show the result for b: .xi x/ D nx nx
i D1
D 0;
Pn n
SxY X
i D1 .xi x/.Yi Y/ xi x
bO D D D Yi :
Sxx Sxx Sxx
i D1
6.3. THE DISTRIBUTIONS OF aO AND bO 181
Remember that Yi here is random while xi is deterministic. This computation shows that bO is a
linear combination of independent normally distributed random variables and therefore bO also
has a normal distribution. Now to calculate the mean and SD of bO we compute
n
X n
X
O D xi x xi x
EŒb E ŒYi  D .a C bxi / D b;
Sxx Sxx
i D1 i D1

xP xi P P
where we use the facts Sxx
D 0 and .xi
x/xi D .xi x/2 D Sxx : To find the SD of
O we have from the independence of the random variables Yi ;
b;
n 
X 2
O D xi x 1 2
VarŒb Var.Yi / D  2 2
Sxx D :
Sxx Sxx Sxx
i D1

For aO D Y bO x; we have E.a/ O D EY b x D a C b x b x D a: Also, assuming Y and bO


are independent rvs (which we skip showing), we see that Var.a/ O D
O D Var.Y / C x 2 Var.b/
2 2 2
n
C x Sxx : A little algebra gives the result for the variance. Finally, Y and bO are normal and
independent so that aO is also normal. 
The problem with this result is that the distributions depend on the unknown parameter
P
: As we usually do in statistics, we replace  2 with its estimate s 2 D n 1 2 .yi yOi /2 : The
random variable analogue of this is
n 
X 2 n
X
1 1 SSE
2
S D Yi aO bO xi D "2i D :
n 2 n 2 n 2
i D1 i D1

Since "i  N.0;  /; i D 1; 2; : : : ; n; are independent and normally distributed, the sum of
squares of normals has a 2 distribution. The numerator of S 2 seems to be 2 .n/: However,
there are two parameters aO and bO in this expression so the degrees of freedom actually turns out
to be n 2; not n. That is,
n 2 2 SSE
2
S D 2  2 .n 2/:
 
O we have the following.
This means that if we replace  by s in the distributions of aO and b;

Theorem 6.16
aO a bO b
sP  t.n 2/ and p  t .n 2/:
n 2 S= Sxx
i D1 xi
S
nSxx
182 6. LINEAR REGRESSION
Proof. We have the standardized random variables

aO a aO a bO b bO b
D sP  N.0; 1/ and D p  N.0; 1/:
SD.a/
O n
xi2 O
SD.b/ = Sxx
iD1

nSxx
p
Therefore, since S D  2 .n 2/=.n 2/

aO a
sP
n
i D1 xi2

aO a nSxx N.0; 1/
sP D Dr D t.n 2/:
n
xi2 S= 2 .n 2/
i D1
S n 2
nSxx

The computation for bO is similar. 

As usual, when we replace  by the sample standard deviation, the normal distribution gets
changed to a t -distribution.

6.4 CONFIDENCE INTERVALS FOR SLOPE AND


INTERCEPT AND HYPOTHESIS TESTS
Now that we have the Standard Errors for the slope and intercept of a linear regres-
sion, we may construct confidence intervals for these parameters and perform hypothesis
tests. With an abuse of notation, we will use aO D a..x O and bO D
O 1 ; y1 /; : : : ; .xn ; yn // D y bx
Ob..x1 ; y1 /; : : : ; .xn ; yn // D r sY to denote the intercept and slope, respectively, when we have
sX
data points .xi ; yi /; i D 1; 2; : : : ; n: That is, they are not random variables in discussing CIs.
Confidence Intervals: The 100.1 ˛/% confidence interval for the intercept a is
sP
xi2
aO ˙ t.n 2; ˛=2/SE.a/
O D aO ˙ t.n 2; ˛=2/s ;
nSxx

and the slope b is


s
bO ˙ t.n O D bO ˙ t .n
2; ˛=2/SE.b/ 2; ˛=2/ p :
Sxx

Example 6.17 Consider the data from Example 6.14 concerning heights of fathers and sons.
O We have aO D 35:8248; bO D 0:4764: Using Theorem 6.16
We calculate a 95% CI for aO and b:
6.4. CONFIDENCE INTERVALS FOR SLOPE AND INTERCEPT AND HYPOTHESIS TESTS 183
rP
x2 p p
we calculate using SE.a/
O D s nsxxi ; that s D SSE=.n 2/ D 19:702=10 D 1:4036: Since
P 2 O D
xi Dp53418; x D 66:66; we have Sxx D 53418 12  66:662 D 84:667 and then SE.b/
1:4036= 84:667 D 0:1525: For 95% confidence, we have t .10; 0:05/ D invT.0:975; 10/ D
2:228 and then the CI for the slope is 0:4764 ˙q2:228  0:1525; i.e., .0:136; 0:816/. The 95%
53418
CI for the intercept is 35:8248 ˙ 2:228  1:4036 1284:67
D 35:8248 ˙ 22:675:

Hypothesis Tests for Slope and Intercept


Once we know the distributions of the slope and intercept it is straightforward to do hypothesis
tests on them. If we specify an intercept a0 or a slope b0 the test of the hypothesis that the
parameter takes on the specified value is summarized.

• Test for a W H0 W a D a0 ; a0 specified.


aO a0
Test Statistic: ta0 D . This is the observed value of the intercept converted to stan-
SE.a/
O
dard units.

H1 W a < a 0 then, reject null if ta0  t.n 2; ˛/


H1 W a > a 0 then, reject null if ta0  t.n 2; ˛/
H1 W a ¤ a0 then, reject null if jta0 j  tn 2;˛=2 :

• Test for b W H0 W b D b0 ; b0 specified.


bO b0
Test Statistic: tb0 D . This is the observed value of the slope converted to standard
O
SE.b/
units.

H1 W b < b0 then, reject null if tb0  t.n 2; ˛/


H1 W b > b0 then, reject null if tb0  t.n 2; ˛/
H1 W b ¤ b0 then, reject null if jtb0 j  t .n 2; ˛=2/:

• We can use the test for the slope to determine if there is a linear relationship between the
x and Y variables. This is based on the fact that b D  Y =X so that  D 0 if and only
if b D 0: Therefore to test for a linear relationship the null is H0 W b D 0: The alternative
O
hypothesis is that there is no linear relationship, H1 W b ¤ 0: The test statistic is t0 D b 0O :
SE.b/
The null is rejected at level ˛ if jt0 j > t.n 2; ˛=2/:
184 6. LINEAR REGRESSION

0.05
10.0

9.9
5 10 15 20

y
9.8
-0.05
9.7

9.6 -0.10

0 10 20 30 40
x

Figure 6.4: World record 100 meter times, Figure 6.5: Residuals.
1964–2009.

In all cases we may calculate the p-values. For instance, if H1 W b < b0 in the test for the slope
H0 W b D b0 , the p-value is P .tb0  t.n 2//:

Example 6.18 Figure 6.4 (Figure 6.5 is a plot of the residuals) is a plot of the world record 100
meter times (for men) measured from 1963 until 2009. There are 24 data points with the last
data point (46,9.58) corresponding to (2009,9.58). This point beat the previous world record
.2008; 9:69/ by 8 seconds!
The regression line is given by y D 10:0608 0:00743113x with a correlation coefficient
of r D 0:91455: It is easy to calculate SSR D 0:256318; SSE D 0:050139; SST D 0:030645; and
SE.a/O pD 0:0227721; SE.b/ O D 0:0000700655: The standard error of the estimate of the regression
is s D SSE=22 D 0:047735: Also, Sxx D 4641:625: The 95% CI for the slope is 0:0074313 ˙
t .22; 0:025/ p0:047735 which gives . 0:00888; 0:005978/. If we test the hypothesis H0 W b D 0
4641:625
0:00743113
the test statistic is t D :047735= p
4641:625
D 10:606: Since t.22; 0:025/ D 2:818 we will reject
the null if ˛ D 0:01: Observe also that if we project the linear model into the future, in the year
3316, the world record time would be zero.

ANOVA for Linear Regression


We have encountered this topic when we considered hypothesis testing in Chapter 5. Analysis of
variance (ANOVA) in regression is a similar method to decompose the variation in the y values
into the components determined by the source of the variation. For instance, the simple formula
SST D SSR C SSE we derived earlier is an example of the decomposition. As in Chapter 5, we
will exhibit the ANOVA in tabular form:
6.4. CONFIDENCE INTERVALS FOR SLOPE AND INTERCEPT AND HYPOTHESIS TESTS 185
Source of Degrees of Sum of Squares Mean Square F-statistic
Variation Freedom
Pn
Regression 1 SSR D i D1 .yOi y/2 MSR D SSR
1
MSR
MSE
D f .1; n 2/
Pn
Residuals n 2 SSE D i D1 .yi yOi /2 MSE D SSE
n 2
Pn
Total n 1 SST D i D1 .yi y/2
P
The degrees of freedom for SST is n 1 because SST D niD1 .yi y/2 is the sum of n
P
squares subject to one constraint .yi y/ D 0 which eliminates one degree of freedom.
By definition, if we take a sum of squares and divide by its degree of freedom we call that
the Mean Square. Therefore,

MST D SST=.n 1/; MSR D SSR=1; MSE D SSE=.n 2/:

Since s 2 D SSE=.n 2/ is the estimate of  2 ; we have MSE D s 2 :


Next consider the ratio
!2 !2
MSR SSR bO 2 Sxx bO bO
D SSE D D p D :
MSE n 2
s2 s= Sxx SE.b/ O

bO
Notice that O
D t.n 2/; i.e., it has a t -distribution with n 2 degrees of freedom. This
SE.b/
is the test statistic for the hypothesis H0 W b D 0: The square of a t -distribution with k de-
grees of freedom is an F .1; k/ distributed random variable. Therefore, the last column says that
MSR=MSE has an F -distribution with degrees of freedom .1; n 2/: This gives us the value of
the test statistic squared in the hypothesis test H0 W b D 0 against H1 W b ¤ 0 and we may reject
H0 if F > f .1; n 2; ˛/ at level ˛:

Example 6.19 A delivery company records the following delivery times depending on miles
driven.
Distance 2 2 2 5 5 5 10 10 10 15 15 15
Time 10.2 14.6 18.2 20.1 22.4 30.6 30.8 35.4 50.6 60.1 68.4 72.1

The regression line becomes y D 4:54677 C 3:94728 x: The statistics for the regression
line are summarized in the table

Estimate Standard Error t-Statistic p-Value


aO
4:54677 3:8823 1:17115 0:268687
bO
3:94728 0:412684 9:5649 2:386  10 6
p p
For example, SE.b/O D 0:412684 D p s ; since s D 11=10  21:4932  1 :9494552 D
Sxx
7:076; Sxx D 294: The value of the t -statistic for the hypothesis test H0 W b D 0 against H1 W
186 6. LINEAR REGRESSION
b ¤ 0 is t D 9:5649 which gives a two-sided p-value of 2:386  10 6 : This is highly statistically
significant and we have strong evidence that distance and time are highly correlated.
Also, using the table it is easy to construct confidence intervals for the slope and inter-
cept. For example a 99% CI for the slope is 3:94728 ˙ t.10; 0:005/  0:412684 D 3:94728 ˙
3:16927  0:412684:
The ANOVA table is

Source DF SS MS F-statistic p-Value


6
Regression 1 4580.82 4580.82 91.4873 2:386  10
Residuals 10 500.705 50.0705
Total 11 5081.52
We see that the p-value for the F-statistic gives the same result for the hypothesis test on the
slope.

Remark 6.20 The random variable T D XS=pn0 when we have a random sample X1 ; : : : ; Xn
from a normal population has a Student’s t -distribution with n 1 degrees of freedom. Also,
n 2 2
2
S  2 .n 2/: So now the T variable may be rewritten as

p X 0
X 0 = n Z
T D p Dp Dp
S= n 2
S = 2 S 2 = 2

and then
Z 2 =1 2 .1/
T2 D D D F .1; n 2/
S 2 = 2 2 .n 2/=.n 2/
since Z 2  2 .1/: Therefore, T 2 .n 2; ˛=2/ D F .1; n 2; ˛/: We have to use ˛=2 for the
t -distribution because of the fact that T is squared, i.e, P .F .1; n 2; ˛/ > f / D P .jT .n
2; ˛=2/j > t / D ˛: That’s why the two-sided test has p-value P .F .1; n 2/ > f /:

6.4.1 CONFIDENCE AND PREDICTION BANDS


The use of the regression line is to predict the mean response E.Y jx/ to a given input. However,
we may want to predict the actual response (and not the mean) to the given input x: Denote the
given input as xp : We want to predict both p D a C b xp and we want to predict the actual
response yp to this given xp : For instance, a pediatrician may want to know the mean height
for a 7-year-old (this is a mean estimation problem) but the parent of a 5-year-old may want to
predict the height of her specific child when the child becomes 7.1

1 In general, predictions of y for a given x using a regression line are only valid in the range of the data of the x values.
6.4. CONFIDENCE INTERVALS FOR SLOPE AND INTERCEPT AND HYPOTHESIS TESTS 187
We denote the predicted value of the rv Y by yp and the estimate of E.Yp / by p : In the
absence of other information, the best estimates of both will be given by yp D p D aO C bx O p;
the difference is that the error of these estimates are not the same. In particular, we will have a
confidence interval for p ; but a prediction interval (abbreviated PI) for yp .

Proposition 6.21 Let xp be a particular value of the input x: The


q standard error for the mean
2
value associated with xp ; namely p D aO C bO xp is SD.Yp / D  n1 C .xpSxxx/ : If  is unknown
q
2
and we replace  by s; the standard error is SD.Yp / D s n1 C .xpSxxx/ :
The 100.1 ˛/% Confidence Interval for the conditional mean of the response corre-
sponding to xp is
s
1 .xp x/2
p ˙ t .n 2; ˛=2/s C :
n Sxx

Aqfuture predicted value of Y corresponding to xp ; namely yp D aO C bO xp has standard error


.x x/2
s 1 C n1 C pSxx when  is unknown.
The 100.1 ˛/% Prediction Interval for the observed value of the response correspond-
ing to xp is
s
1 .xp x/2
yp ˙ t.n 2; ˛=2/s 1C C :
n Sxx

We will not derive these results but simply note that the only difference between them
is the additional 1 inside the square root. This makes the PI for a response wider than the CI
for the mean response to reflect the additional uncertainty in predicting a single response rather
than the mean response.
If we let xp vary over the range of values for which linearity holds we obtain confidence
curves and prediction curves which band the regression line. The bands are narrowest at the
point on the regression line .x; y/: Extrapolating the curves beyond that point results in ever
widening bands and beyond the range of the observed data, linearity and the bands may not
make sense. Making predictions beyond the range of the data is a bad idea in general.

Example 6.22 The following table exhibits the data for 15 students giving the time to complete
a test x and the resulting score y:
188 6. LINEAR REGRESSION
y y
100
100

80
80

60 60

40 40

20 20

x x
0 20 40 60 80 100 50 55 60 65 70
Data Data Minus Outlier

150 20

10
100

2 4 6 8 10 12 14
50
-10

20 40 60 80 100 -20

-50
Confidence and Prediction Bands Residuals

Figure 6.6: Plots for Example 6.22.

index 1 2 3 4 5 6 7
time .x/ 59 49 61 52 61 52 48
score .y/ 50 95 73 59 98 84 78
index 8 9 10 11 12 13 14 15
time .x/ 53 68 57 49 70 62 52 10
score .y/ 65 79 84 46 90 60 57 15
The data has summary statistics x D 57:3571; y D 72:7143; sX D 7:17482; sY D 16:7398
and r D 0:2046.
Figure 6.6 has a scatterplot of the data. As soon as we see the plot we see that point 15
is an outlier and is either a mistake in recording or the student gave up and quit. We need to
remove this point and we will consider it dropped. Figure 6.6 shows the data points with the
outlier removed and the fitted regression line. This line has equation
y D 45:34 C 0:4773x:
6.4. CONFIDENCE INTERVALS FOR SLOPE AND INTERCEPT AND HYPOTHESIS TESTS 189
A plot of the data, the fitted line, and the mean confidence bands (95% confidence) and
single prediction bands show that the prediction bands are much wider than the confidence
bands reflecting the uncertainty in prediction. It is also clear that the use of these bands should
be restricted to the range of the data.
The equations of the bands here are given by
p
y D 0:477x C 45:337 ˙ 2:179 0:435x 2 49:86x C 1450:66 (6.3)

for the 95% confidence bands, and


p
y D 0:477x C 45:337 ˙ 2:179 0:435x 2 49:86x C 1741:53 (6.4)

for the 95% prediction bands. This means that for a given input xp Dtest time, the predicted
test score would be 45:34 C 0:477xp : The 95% CI for this mean predicted test score is given in
(6.3) and the 95% PI for this particular xp is given in (6.4), with x D xp : For example, suppose a
student takes 50 minutes to complete the test. According to our linear model, we would predict
a mean score of 69.2. The 95% CI for this mean score is Œ54:70; 83:70. On the other hand, the
95% PI for the particular score associated with a test time=50 is Œ29:31; 109:1; which is a very
wide interval and doesn’t even make sense if the maximum score is 100.
If we test the hypothesis H0 W b D 0 against the alternative H1 W b ¤ 0; we get the results
that the p-value is p D :4829, which means that the null hypothesis is plausible, i.e., it may not
be reasonable to predict test score from test time.
We have the ANOVA table
Source DF SS MS F-statistic p-Value
Regression 1 152.469 152.469 0.524191 0.482936
Residuals 12 3490.39 290.866
Total 13 3642.86

Finally, we plot the residuals in order to determine if there is a pattern which would inval-
idate the model. The summary statistics for this regression tell us the correlation coefficient is
r D 0:2046 so the coefficient of determination is r 2 D 0:0419 and only about 4% of the variation
in the scores are explained by the linear regression line. The 95% CI for the slope and intercept
are . :959108; 1:91375/ and . 37:6491; 128:322/, respectively.

Remark 6.23 One important point of the previous example and linear regression in general
is the identification and elimination of outliers. Because we are using a linear model, a single
outlier can drastically affect the slope, invalidating the results. To identify possible outliers use
the following method.
Using all the data points calculate the regression line and the estimate s of : Suppose the
regression line becomes y D a C bx: Now consider the two lines y D a C bx ˙ 2.s/: In other
190 6. LINEAR REGRESSION
words we shift the regression line up and down by twice the SD of the residuals. Any data point
lying outside this band around the regression line is considered an outlier. Remove any such
points and recalculate. We use twice the SD to account for approximately 95% of reasonable
data values.

6.4.2 HYPOTHESIS TEST FOR THE CORRELATION COEFFICIENT


The correlation coefficient depends on a random sample .Xi ; Yi / of size n from a joint nor-
mal distribution. The random variable giving the correlation for this random sample is R D
R.X1 ; : : : ; Xn ; Y1 ; : : : ; Yn /;
n
X .Xi X/ .Yi Y/
RD qP qP :
n 2 n
i D1 i D1 .Xi X/ iD1 .Yi Y /2

To analyze R we would be faced with the extremely difficult job of determining its distribution.
The only case when this is not too hard is when we want to consider the hypothesis test H0 W
 D 0 because we know that this hypothesis is equivalent to testing H0 W b D 0 if the slope of
the regression line is zero. This is due to the formula b D  XY :
We will work with the sample values .xi ; yi /; x; y; sx ; sy ; and r: Now we have seen that
r 2 D bO 2 SSyy
xx SSR
D SST D 1 SSE
SST
so that
q s
1 r
sx
S
n 1 xx S xx Sxx SSE .n 2/s 2
r D bO D bO q D bO D bO ; and 1 r 2 D D :
sy 1
S Syy SST SST SST
n 1 yy

bO 0
We also know that the test statistic for H0 W b D 0 is t D and is distributed as t.n 2/:
O
SE.b/
O Dp s
Since SE.b/ we have
Sxx
p r s
bO S xx Sxx .n 2/SST
tD D bO D bO :
O
SE.b/ s SST .n 2/s 2

But using the formulas for r and 1 r 2 we have


r s p
Ob S xx .n 2/SST r n 2
D p :
SST .n 2/s 2 1 r2
We conclude that the test statistic for H0 W  D 0 is
p
r n 2
tD p and is distributed as t.n 2/:
1 r2
6.4. CONFIDENCE INTERVALS FOR SLOPE AND INTERCEPT AND HYPOTHESIS TESTS 191

0.4
4.5
0.2

4.0
5 10 15 20
-0.2
3.5
-0.4

60 65 70 75 -0.6

Height vs. Esteem Residuals

Figure 6.7: Height vs. esteem.

p
R n 2
More precisely, the rv T D p  t.n 2/:
1 R2
Example 6.24 Does a person’s self esteem depend on their height? (See Figure 6.7.) The fol-
lowing table gives data from 20 people.
Person Height Self Esteem Person Height Self Esteem
1 68 4.1 11 68 3.5
2 71 4.6 12 67 3.2
3 62 3.8 13 63 3.7
4 75 4.4 14 62 3.3
5 58 3.2 15 60 3.4
6 60 3.1 16 63 4.0
7 67 3.8 17 65 4.1
8 68 4.1 18 67 3.8
9 71 4.3 19 63 3.4
10 69 3.7 20 61 3.6
The sample correlation coefficient is r D 0:730636. The fitted regression equation is
y D :866269 C 0:07066x . The 95% CI’s for the slope and intercept are Œ0:0379; 0:10336 and
Œ 3:00936; 1:27682; respectively. The calculation for the slope and intercept are shown in the
next table.

Estimate Standard Error t-Statistic p-Value


aO D Intercept 0:866269 1:02007 0:849223 0:406911
bO D Slope 0:0706616 0:0155639 4:54009 0:000253573
192 6. LINEAR REGRESSION
The ANOVA table is
Source DF SS MS F-statistic p-Value
Regression 1 1.84144 1.84144 20.6124 0.000253573
Error 18 1.60806 0.0893366
Total 19 3.4495

To test the hypothesis H0 W  D 0 against H1 W  ¤ 0; we see that the t statistic for the
correlation (and also the slope) is
p
:73 20 2
t .18/ D 4:54009 D p
1 :732
which results in a p-value2 of 0.00025, i.e., P .jt .18/j  4:54009/ D 0:00025: Thus, we have
high statistical significance and plenty of evidence that the correlation (and the slope) is not
zero. Incidentally, the residual plot shows that there may be an outlier at person 12.

Example 6.25 Suppose we calculate r D 0:32 from a sample of size n D 18: First we perform
the test H0 W  D 0; H1 W  > 0: The statistic is
p p
r n 2 :32 16
tD p Dp D 1:35:
1 r2 1 :322
For 16 degrees of freedom we have P .t .16/ > 1:35/ D 0:0979; so we do not reject the null.
Next, we find the sample size necessary in order to conclude that r D 0:32 differs signif-
icantly from 0 at the level ˛ D 0:05 level. In order
p
to reject H0 W  D 0 with a two-sided alter-
rp n 2
native, we would need jt.n 2; 0:025/j  t D : The sample size needed is the solution to
1 r2
the equation p
:32 n 2
t.n 2; 0:05/ D p
1 :322
for n: In general, this cannot be solved p exactly. By trial and error, we have for n D
38; t .36; 0:025/ D
p 2:02809 and 0:33776 38 2 D 2:02656: Also, for n D 39; t.37; 0:025/ D
2:026 < 0:33776 37 D 2:0541: Consequently, the first n which works is n D 39:

Testing H0 W  D 0 Against H1 W  ¤ 0
It is possible to show that
 
1 1CR e 2Z 1
Z D ln with inverse R D 2Z ;
2 1 R e C1
2 The p-value is obtained using a TI-83 as 2tcdf.4:54009; 999; 18/ D 0:0002:
6.4. CONFIDENCE INTERVALS FOR SLOPE AND INTERCEPT AND HYPOTHESIS TESTS 193
 
has an approximate normal distribution with mean 12 ln 1C 1 
and SD D p1 :
n 3
This means
we can base a hypothesis test H0 W  D 0 on the test statistic
1 1Cr 1 1 C 0
ln ln
2 1 r 2 1 0
zD r
1
n 3
observed expected
which comes from the usual formula z D SE
: Then we proceed to reach a conclusion
based on our choice of alternative:
H1 W  <  0 then, reject null if z z˛
H1 W  >  0 then, reject null if z  z˛
H1 W  ¤ 0 then, reject null if jzj  z˛=2 :

Using the statistic z , we may also construct a 100.1 ˛/% CI for  by using
1 1Cr 1
ln ˙ z˛=2 p
2 1 r n 3
and then transforming back to r: Here’s an example.
Example 6.26 A sample of size n D 20 resulted in a sample correlation coefficient of r D
0:626: The 95% CI for 21 ln 1Cr
1 r
D 0:7348 is 0:7348 ˙ 1:96  p1 D 0:7348 ˙ 0:4754: Using
17
the inverse transformation, we get the 95% CI for  given by
 20:2594 
e 1 e 21:2102 1
; D .0:2537; 0:8367/:
e 20:2594 C 1 e 21:2102 C 1
Suppose a second random sample of size n D 15 results in a sample correlation coefficient of
r D 0:405. Based on this, we would like to test H0 W  D 0:626 against H1 W  ¤ 0:626: Since
our sample correlation coefficient 0:405 2 .0:2537; 0:8367/ we cannot reject the null and this is
not sufficient evidence to conclude the correlation coefficient is not 0:626:
Suppose another random sample led to a correlation coefficient (sample size n D 24) of
r D 0:75: We now want to test H0 W  D 0:626; H1 W  > :626. Here we calculate
1 1 C :75 1 1 C :626
ln ln
z D 2 1 :75r 2 1 :626 D 1:0913:
1
24 3
Since P .Z > 1:0913/ D 0:1375 we still cannot reject the null. If we had specified ˛ D 0:05; then
z0:05 D 1:645 and since 1:645 > 1:0913; we cannot reject the null at the 5% level of significance.
194 6. LINEAR REGRESSION
Example 6.27 This example shows that we can compare the difference of correlations for two
independent
q random samples. All we need note is that the SE for the difference will be given
by SE D n11 3 C n21 3 :
Suppose we took two independent random samples of sizes n1 D 28; n2 D 35 and calcu-
late the correlation coefficient of each sample to be r1 D :5; r2 D :3. We want to test H0 W 1 D
2 ; H1 W 1 ¤ 2 : The test statistic is
1 1 C :5 1 1 C :3
ln ln
z D 2 r1 :5 2 1 :3 D 0:8985:
1 1
C
28 3 35 3
Since P .Z > :8985/ D 0:184; the p-value is :36 and we cannot reject the null.

6.5 PROBLEMS
6.1. Show that the equations for the minimum of f .a; b/ in Proposition 6.6 are
@f
D 2n .y a bx/ D 0
@a
n
X
@f 
D 2 xi yi axi bxi2 D 0:
@b
i D1

Solve the first equation for a and substitute into the second to find the formulas for aO
and bO in Proposition 6.6.
6.2. We have seen that if we consider x as the independent and y as the dependent variable,
the regression line is y y D r ssyx .x x/: What is the regression line if we assume
instead that y is the independent variable and x is the dependent variable? Derive the
P
equation by minimizing f .a; b/ D niD1 .xi a b yi /2 : Find the value of f at the
optimal a; O
O b:
6.3. If the regression line with dependent Y is Y D a C b X and the line with dependent X is
X D c C d Y , derive that b  d D 2 : Then, given the two lines Y D a C 0:476 X; X D
c C 1:036 Y , find :
6.4. If Yi D 1:1 C 2:5xi C "i ; "i  N.0; 1:7/ is a regression model with independent errors,
find
(a) The distribution of Y1 Y2 when Y1 corresponds to x1 D 3 and Y2 corresponds to
x2 D 4.
(b) P .Y1 > Y2 /.
6.5. PROBLEMS 195
6.5. Given the data in the table, find the equation of the regression line in the form .y
s
y/ D r syx .x x/ and also with x as the dependent variable. Find the minimum value of
P
f .a; b/ D niD1 .xi a byi /2 with dependent variable x and with dependent variable
y:

x 1 4 2.2 3.7 4.8 6 6.7 7.2


y 5 4.6 3.8 4.7 5.2 5.9 6 7.8

6.6. Math and verbal SAT scores at a university have the following summary statistics:
M SAT D 570; SDM SAT D 110 and V SAT D 610; SDV SAT D 120;
Suppose the correlation coefficient is r D 0:73:
(a) If a student scores 690 on the MSAT, what is the prediction for the VSAT score?
(b) If a student scores 700 on the VSAT, what is the prediction for the MSAT score?
(c) If a student scores in the 77th percentile for the MSAT, what percentile will she
score in the VSAT?
(d) What is the standard error of the estimate of the regression?
6.7. We have the following summary statistics for a linear regression model relating the
heights of sisters and brothers:
B D 68; SDB D 2:4; and S D 62; SDS D 2:2; n D 32:
The correlation coefficient is r D 0:26:
(a) What percentage of sisters were over 68 inches?
(b) Of the women who had brothers who were 72 inches tall, what percentage were
over 68 inches tall?
6.8. Suppose the correlation between the educational levels of brothers and sisters in a city
is 0:8: Both brothers and sisters averaged 12 years of school with an SD of 2 years.
(a) What is the predicted educational level of a woman whose brother has completed
18 school years?
(b) What is the predicted educational level of a brother whose sister has completed 16
years of school?
6.9. In a large biology class the correlation between midterm grades and final grades is about
0.5 for almost every semester. Suppose a student’s percentile score on the midterm is
.a/ 4% .b/ 75% .c/ 55% .d / unknown
Predict the student’s final grade percentile in each case.
196 6. LINEAR REGRESSION
6.10. In many real applications it is known that the y -intercept must be zero. Derive the least
P
squares line through the origin that minimizes f .b/ D niD1 .yi bxi /2 for given data
points f.xi ; yi /gniD1 :
6.11. Find the equations, but do not solve, for the best quadratic approximation to the data
points f.xi ; yi /gniD1 : That is, find the equations for a; b; c which minimize f .a; b; c/ D
Pn
i D1 .yi a bxi cxi2 /2 : Now find the best quadratic approximation to the data
. 3; 7:5/; . 2; 3/; . 1; 0:5/; .0; 1/; .1; 3/; .2; 6/; .3; 14/:

6.12. This problem shows how to get the estimates of a; b; and  by maximizing a function
called the likelihood function. Recall that we assume " D Y a bx  N.0; /: De-
fine, for the data points xE D .x1 ; : : : ; xn / and yE D .y1 ; : : : ; yn /;
n  
 Y 1 .yi a bxi /2
L a; b;  I x;
E yE D p exp :
 2 2 2
i D1

This is called the likelihood function. It is a measure of the probability P .Y1 D


y1 ; : : : ; Yn D yn /; that we obtain the sample data .y1 ; : : : ; yn / from the experiment.
Maximizing this function is based on the idea that we want to determine the param-
eters a; b;  which make obtaining the data we did obtain as likely (or probable) as
possible.

(a) Take the log of L; say G.a; b;  I x; E D ln L.a; b;  I x;


E y/ E Use calculus to find
E y/:
@G @G @G
;
@a @b
; and @
: Set them to zero and show that the equations for a; b; and 
become
n
@G 1 X
D 2 .yi a bxi / D 0
@a 
i D1
n
@G 1 X
D 2 .yi a bxi /xi D 0
@b 
i D1
n
!
@G 1 X
D n 2 .yi a bxi /2 D 0:
@ 3
i D1

(b) What is the connection of the first two equations with Proposition 6.6?
1X
(c) The third equation gives the estimator for  2 ; s 2 D .yi a bxi /2 : Assum-
n
i
ing
1 2 1 X
S D .Yi a bxi /2  2 .n 2/;
2 2
i
6.5. PROBLEMS 197
2 2 2 2 1 P
find E.S = / and then find an unbiased estimator of  using s D n i .yi
a bxi /2 :

6.13. In the following table we have predicted values from a regression line model and the
actual data values. Calculate the residuals and the SE estimate of the regression, s:

yi 55 64 48 49 58
yOi 62 61 49 51 50

6.14. The table contains Consumer Price index data for 2008–2018 for gasoline and food

Year 08 09 10 11 12 13 14 15 16 17 18
Gas 34.5 -40.4 51.3 13.4 9.7 -1.5 0.1 -35.4 -7.3 20.3 8.5
Food 4.8 5.2 -0.2 1.8 4.2 1.6 1.1 3.1 0.9 -0.1 1.6

Find the correlation coefficient and the regression equation. Find the standard error of
the estimate of :
6.15. In a study to determine the relationship between income and IQ, we have the following
summary statistics:

mean income D 95;000; SD D 38;000; and mean IQ D 105; SD D 12; r D 0:6:

(a) Find the regression equation for predicting income from IQ.
(b) Find the regression equation for predicting IQ from income.
(c) If the subjects in the data are followed for a year and everyone’s income goes up by
10%, find the new regression equation for predicting income from IQ.
6.16. In a sample of 10 Home Runs in Major League Baseball, the following summary statis-
tics were computed:

Mean Standard Deviation


Distance 416.6 35.88
SpeedOffBat 104.26 3.26
Apex 95.3 25.74

We are also given the correlation data that the correlation between SpeedOffBat and
Distance is 0.098, the correlation between the Apex and Distance is -0.058, and the
correlation between SpeedOffBat and Apex is 0.3977.
(a) Find the regression lines for predicting distance from SpeedOffBat and from Apex.
(b) Find the ANOVA tables for each case.
198 6. LINEAR REGRESSION
(c) Test the hypotheses that the correlation coefficients in each case are zero.
6.17. The duration an ulcer lasts for the grade of ulcer is given in the table.

Stage(x) 4 3 5 4 4 3 3 4 6 3
Days(y) 18 6 20 15 16 15 10 18 26 15
Stage(x) 3 4 3 2 3 2 2 3 5 6
Days(y) 8 16 17 6 7 7 8 11 21 24

Find the ANOVA table and test the hypothesis that the slope of the regression line is
zero.
6.18. Consider the data:

x -1 0 2 -2 5 6 8 11 12 -3
y -5 -4 2 -7 6 9 13 21 20 -9

Find the regression line and test the hypothesis that the slope of the line is zero.
6.19. The following table gives the speed of a car, x; and the stopping distance, y; when the
brakes are applied with full force.

x km/h 10 30 50 70 90 110 120


y meters 0.7 6.2 17.2 33.8 55.8 83.4 99.2

Fit a regression line to this data and plot the line and the data points on the same plot.
Also plot the residuals. Test the hypothesis that the data is uncorrelated.
6.20. In the hypothesis testing chapter we considered testing H0 W 1 D 2 from two inde-
pendent random samples. We can set this up as a linear regression problem using the
following steps.
• The sample sizes are n1 and n2 : The y data values are the observations from the
two samples, labeled y1 ; : : : ; yn1 ; yn1 C1 ; : : : ; yn1 Cn2 : The x -data values are defined
by 8
<1 if yi comes from sample 1
xi D ; i D 1; 2; : : : ; n1 C n2 :
:0 if yi comes from sample 2

• Calculate the regression line for the data values f.xi ; yi /ginD1
1 Cn2
:

O bO and show that aO D y 2 and bO D y 1 y 2 : Here, y 1 is the


(a) Use the formulas for a;
sample mean for sample 1 and y 2 is the sample mean for sample 2.
6.5. PROBLEMS 199
2 2
(b) Show that MSE for regression is the same as the pooled estimate s of  with
n1 C n2 2 degrees of freedom.
(c) Show that the regression t-test of H0 W b D 0 is the same as the pooled variances
t-test of H0 W 1 D 2 :
6.21. Given the following data set

x -6 -2 2 2 4
y 12 8 6 2 2

solve the following without the use of a calculator or computer.


(a) Plot the scatter diagram, and indicate whether x; y appear linearly related.
(b) Find the regression equation for the data.
(c) Plot the regression equation and the data on the same graph. Does the line appear
to provide a good fit for the data points?
(d) Compute SSE and s 2 .
(e) Estimate the expected value of Y when x D 1.
(f ) Find the correlation coefficient r and the coefficient of determination.
6.22. A study of middle- to upper-level managers is undertaken to investigate the relationship
between salary level, Y , and years of work experience, X . A random sample sample of
20 managers is chosen with the following results (in thousands of dollars):
X X
xi D 235; yi D 763; Sxx D 485:75; Syy D 2236:1; Sxy D 886:85:

It is further assumed that the relationship is linear.


O bO and the estimated regression equation.
(a) Find a;
(b) Find the correlation coefficient, r .
(c) Find r 2 and interpret it value.
(d) Suppose the errors have distribution "  N.0; 50/: Find P .Y > 100/ when x D 12:
6.23. An analysis of family income and energy use produced the following table from a ran-
dom sample of 25 families:

Value SE t -statistic p-Value


Intercept 82.036 2.054 39.94 0
Slope 0.93051 0.05727 16.25 0
r 2 D :92
200 6. LINEAR REGRESSION
The ANOVA table is

Source DF SS MS F-statistic p-Value


Regression 7,626.6 264.02 0
Error 23
Total 8,291
(a) Fill in all the missing entries.
(b) Find the regression equation.
(c) Is there sufficient evidence to conclude that X and Y are linearly related? Do the
hypothesis test.
(d) Find a confidence interval and a prediction interval for families with an annual
income of $40,000. Assume that x D 35;000 and sX2 D 3;000.
6.24. You want to predict the cardiac output for a level of exercise of 750 kg-m
per minute. The regression line is y D 4:97 C 0:0133x with data statistics s D
P P P P P
0:68; xi D 9000; yi D 219:5; xi2 D 6;300;000; yi2 D 2812:05; and xi yi D
128;790: There are 20 data points. Find the predicted cardiac output and find a 95%
prediction interval. The value x D 750 is within the range of the data points.
6.25. Given the following summary statistics of data
X X
n D 17; x D 660; x 2 D 35990;
X X X
y D 5712; y 2 D 2243266; xy D 188429;

O aO , and s 2 .
calculate Sxx ; Sxy ; Syy ; b;
201

APPENDIX A

Answers to Problems
A.1 ANSWERS TO CHAPTER 1 PROBLEMS
1.1 p D 0:3 and p D 3=7.
1.2 (a) P .A \ B/ D 1=12: (b) P .Ac [ B/ D 9=12.
1.4 (a) Let C D cigarettes, Cg D cigars. Then P .C c \ Cgc / D 0:64:
(b) P .Cg \ C c / D :04:

1.5 (a) A \ B c \ C c : (g) .A [ B [ C / \ .A \ B \ C /c


(b) A \ C \ B c (h) .A \ B c \ C c / [ .Ac \ B \ C c / [
.Ac \ B c \ C / [ .Ac \ B c \ C c / or
(c) A [ B [ C
..A \ B/ [ .B \ C / [ .A \ C //c
(d) .A \ B/ [ .A \ C / [ .B \ C /
(i) ..A \ B/ \ C c / [ ..A \ C / \ B c / [
(e) A \ B \ C ..B \ C / \ Ac /
(f ) Ac \ B c \ C c D .A [ B [ C /c (j) A [ B [ C [ .Ac B c C c / D S:

1.7 P .A/ D 2=3:


 
1.9 Solve x 2 2 D 2 x 2
4
to get x D 7:
1.10 P .C c \ D/ D P .D/ P .C \ D/ D 0:2.
p
1.11 Call the two outcomes C1 ; C2 . Then p D . 5 1/=2 D 0:618.
1.12 P .1st H on toss 5/ D .1 p/4 p: P .5 tosses to 2 H’s/ D 4.1 p/3 p 2 .
3
1.14 P .A/ D :
9
1.16 P .A \ B/  0:9 C 0:9 1 D 0:8:
1.17 P .other toss is Hjone is H/ D 1=3:
1.18 This hand has the pattern AABBC where A, B, and C are from distinct kinds. The
probability is 0.047539.
1.19 One way to do this is with truth tables. For example:
202 A. ANSWERS TO PROBLEMS
A B A\B .A \ B/c Ac Bc Ac [ B c
0 0 0 1 1 1 1
0 1 0 1 1 0 1
1 0 0 1 0 1 1
1 1 1 0 0 0 0

In the table 0 indicates an outcome is not in the event and a 1 indicates it is. Notice that
all possibilities are covered in this table and that the column .A \ B/c and Ac [ B c are
identical. Therefore, the two events must be the same.

P .Ac \ B c / D 1 P .A [ B/ D 1 P .A/ P .B/ C P .A \ B/;


c c c c
P ..A \ B/ [ .A \ B // D P .A \ B/ C P .A \ B / D P .B/ C P .A/ 2P .A \ B/:

1.21 The answer is no.


1.22 P .H / D 1=2:
1.23 (a) P .C2 > C1 / D :47058:
(b) P .C1 D C2 / D 3=51:
1.24 Let W D fteam wins the gameg: P .W / D 0:25:

1.25 P .coin is HHj5 H’s/ D 0:8:


1.26 (a) P .C / D 1=2: The events are pairwise independent.
(b) A; B; C are not mutually independent.
1.27 (a) P .A [ B/ D 1=2.
(b) P .B/ D 1=3:
1.28 P .Ac \ B \ C / D 1=8:
1.29 P .B/ D 1=2:
1.30 P .Win/ D 2=7:
1.31 (a) P .DjT / D 0:2.
0:729.:04/
(b) P .DjT T T / D D 0:9.
0:729.:04/ C :003375.:96/
1.32 P .at least 1 6/ D 11=36; P .at least 1 6jfaces different/ D 10=36:
1
1.33 P .NS jD/ D :
4
A.2. ANSWERS TO CHAPTER 2 PROBLEMS 203
c c c
1.34 (i) P .A \ B \ C / D :5.:8/:9, P .A \ B\C [A\B \C [ A \ B \ C / D :04 C
:09C:36, P .Ac B c C c / D :5.:2/:1; (ii) P .A \ B \ C 2 / C P .a \ C \ B c / C P .B \ C \
Ac /; (iii) P ..A [ B [ C /c / D P .Ac /P .B c /P .C c /:

1.35 Let x be the number of red balls in the second box, x D 11:
 7 31  7 3
1.37 (a) P .7H / D 107
0:4 0:6 2 C 10 7
0:7 0:3 .
 
(b) Let A Dfirst toss is H, then P .7H jA/ D 96 :46 :63 11
4
C 96 :76 :33 11
7
:

P P P .A \ B \ Ei / P .B \ Ei /
1.38 P .AjB/ D i P .A \ Ei jB/ D i :
P .B \ Ei / P .B/

1.39 If we calculate the proportion of male math majors at each university, we get 0.2 at
university 1 and 0.3 at university 2. For females it is 0:15 < 0:2 at university 1 and
0:25 < 0:3 at university 2. These inequalities are reversed in the amalgamated table.

Amalgamated Math Major Other Total Proportion


Males 230 370 1100 0.209
Females 1150 3850 5000 0.23

1.40 Let R D recover, D D die. We have P .RjM \ D/ D :27 < P .RjM \ D c / D :33 and
P .RjF \ D/ D :642 < P .RjF \ D c / D :66 so the recovery probability is lower for
both males and females on the drug. However, P .Rj/ \ D/ D 0:538‹P .RjO \ D c / D
:44 so that amalgamated, the drug is better,

A.2 ANSWERS TO CHAPTER 2 PROBLEMS


6
2.1 P .X D 0/ D 36 ; P .X D 1/ D 10
36
8
; P .X D 2/ D 36 6
; P .X D 3/ D 36 ; P .X D 4/ D
4
36
; P .X D 5/ D 36 : Then P .0 < X  3/ D 36 : Also, P .1  X < 3/ D 18
2 24
36
:

2.2 (a) P .X D 1/ D 1=4; P .X D 2/ D 1=6; P .X D 3/ D 1=12:


(b) P .1=2 < X < 3=2/ D F .3=2/ F .1=2/ D 1=2:

2.3 (a) P .e X  x/ D P .X  ln x/ D FX .ln x/ if x > 0 and 0 otherwise.


(b) P .aX C b  x/ D P .X  x a b / D FX ..x b/=a/ if a > 0; P .aX C b  x/ D
P .X  .x b/=a/ D 1 FX ..x b/=a/ C P .X D x a b / if a < 0:
n.nC1/
2.4 (a) c D 2
:
(b) c D 2:
204 A. ANSWERS TO PROBLEMS
8
ˆ
ˆ0; x0
ˆ
ˆ
ˆ
ˆ
ˆ
ˆ3=4x; 0  x  1;
<
2.5 (a) F .x/ D 3=4; 1  x  2;
ˆ
ˆ
ˆ
ˆ
ˆ
ˆ 3=4 C 1=4.x 2/; 2x3
ˆ

1; x > 3:

2.7 (a) c D 1:
8
ˆ 0; x 3
ˆ
ˆ
ˆ
ˆ 1
ˆ
ˆ .x C 3/2 ; 3<x 2
ˆ
ˆ
< 2
(b) FX .x/ D 1
ˆ ; 2x2
ˆ 2
ˆ
ˆ
ˆ 1
ˆ
ˆ 1 .3 x/2 ; 2 < x  3
ˆ 2

1; x > 3:

2.8 (a) P .X  0:55/ D 0:595.


Rc
(b) 0 f .x/ dx D 0:5 H) c D 0:5:
Rc
(c) 0 f .x/ dx D 0:75 H) c D 0:646447.

2.9 Using a calculator we have P .X  c/ D 0:9 and c D invNorm.0:9; 12; 4/ D 17:1262:

2.10 0:364577.
p
2.11 P .X  70/ D normalcdf.0; 70:5; 75; 100.:75/.:25// D 0:149348 using the
normal approximation. The exact value using the binomial is P .X  70/ D
binomcdf.100; 0:75; 70/ D 0:149541:

2.12 Using the binomial distribution we have (a) P .X D 3/ D binompdf.2000; 0:001; 3/ D


0:180537: (b) P .X > 2/ D 1 P .X  1/ D 1 binomcdf.2000; 0:001; 1/ D 0:59412:
Using the normal distribution, (a) P .X D 3/ D normalcdf.2:5; 3:5; 2; 1:4135/ D
0:217469 (b) P .X > 2/ D normalcdf.2:5; 1; 2; 1:4135/ D 0:36177: Using the Pois-
son approximation (a) P .X D 3/ D poissonpdf.2; 3/ D 0:180447. (b) P .X > 2/ D 1
P .X  1/ D 1 poissoncdf.2; 1/ D 0:59399: Poisson is a much better approximation
but note that n  p D 2000.0:001/ D 2 < 5; so there is no real justification for using the
normal approximation anyway.
A.2. ANSWERS TO CHAPTER 2 PROBLEMS 205
2.13 Let Xi be the number of occurrences of each outcome in a single game, i D 1; 2; : : : ; 6.

P .X1 D 1; X2 D 2; X3 D 0; X4 D 2; X5 D 0; X6 D 0/
!
5
D .0:662/1 .0:052/2 .0:213/0 .0:018 /2 .0:009 /0 .0:046 /0
1; 2; 0; 2; 0; 0

D 0:0000174:

.r k/.n k/
2.14 :
.k C 1/.N r n C k C 1/
2.15 (a) X  Binom.8; 0:1/ H) P .X D 2/ D 0:1488; Y  Poisson.0:8/ H) P .Y D
2/ D 0:143785:
(b) X  Binom.10; 0:95/ H) P .X D 9/ D 0:315125; Y  Poisson.9:5/ H)
P .Y D 9/ D 0:130003:
2.16 (a) P .X D 1/ D binompdf.50; 0:01; 1/ D 0:305559.
(b) P .X  1/ D 0:394994:
(c) P .X  2/ D 0:0894353:
2.19 P .1=2 < X  3=4/ D 5=16: The pdf of X is f .x/ D 2x; 0 < x < 1; and 0 otherwise.
2.20 (a) The largest area possible is 12 : ; f.x; y/ j x 2 Œ2; 3; y 2 Œ1; 3=2g:
(b) F .a/ D P .A  a/ D P .h  2a/ D 2a 1; 21  a  1:
8
<2; 1  a  1
(c) f .a/ D 2
:0 otherwise:

2.22 (a) P .X < 4:5/ D 0:25:


(b) 0.
(c) continuous.
2.23 P .X > 5/ D 0:3678:
2.24 m D 1=0:2 ln.0:5/ D 3:46574:
2.25 P .Z  1:28155/ D 0:9; P . 1:64485 < Z < 1:64485/ D 0:9:
2.26 (a) P .X > 2/ D 0:090718:
(b) P .X > 10jX > 9/ D P .X > 1/ D 0:301194:
4=9
2.27 P .X > 8/ D e :
206 A. ANSWERS TO PROBLEMS
2.28 3333:33.
2.30 (a) Denote them by SA and SB: Then E.SA/ D 200; SD.SA/ D 300 and E.SB/ D
225; SD.SB/ D 736:122: The coefficient of variations are CA D 3=2 D 1:5; CB D
736:122=225 D 3:27: Stock A is a better deal.
(b) The coefficient of variation measure how much risk (measured by SD) there is
relative to the amount of expected return measured by .
(c) A is still better because Eg.A/ D 18:228; SD.g.A// D 17:852; C.g.A// D
8:2304=18:228 D 0:4515 and Eg.B/ D 10:31; SD.g.B// D 17:852; C.g.B// D
17:852=10:31 D 1:7315: A still has less risk relative to the return expected.
2.31 (a) P .X < 1=2/ D 41 : P .1=4 < X  1=2/ D 3=16: P .X < 3=4jX > 1=2/ D 5=12:
2 C 2e t .t 1/
(b) EX D 23 ; SD.X / D 0:236; EŒe tX  D ; t ¤ 0:
t2
2.32 (a) EX D 15 ; VarŒX D 25; medŒX D 1
5
ln 2:
pe t .1 p/
2.33 M.t / D ; e t .1 p/ < 1; t < ln.1 p/
1 p 1 e t .1 p/
1 2 p 1 p
EX D EX 2 D 2
; Var.X/ D .
p p p2
t
2.34 M.t / D e .e 1/
; EX D ; Var.X/ D :
2.35 (a) P .X D 2/ D :09; P .X D 1/ D :24; P .X D 0/ D :34; P .X D 1/ D
:24; P .X D 2/ D :09:; P .X  0/ D 0:67:
(b) EX D 0:
2.36 9=2.
35
2.37 MD.X/ D .
18
x 2 1
2.38 When fX .x/ D e ; x > 0; we have MD.X/ D e
: When fX .x/ D b a
; a  x  b;
b a
we have MD.X/ D 4
:
2.39 (a) 0:32:
(b) x:85 D 72:614:
nC1 n2 1
2.40 (a) EX D : Var.X/ D :
2 12
r r
(b) EX D : Var.X/ D :
r C1 .r C 1/2 .r C 2/
 r
pe t r r.1 p/
2.41 MN .t/ D : EN D ; and Var.N / D :
1 e t .1 p/ p p2
A.2. ANSWERS TO CHAPTER 2 PROBLEMS 207
k p N n
2.42 Normal with mean np D n ; and SD D np.1 p/ :
N N 1
2.43 (a) P .X D 2/ D p 2 . P .X D 3/ D 2p 2 .1 p/. P .X D 4/ D 3p 2 .1 p/2 .
(b) There are 2 H’s with the last toss resulting in a H.
2
(c) EX D :
p
2.44 2=3:

2.45 E.2 C X /2 D 14: and Var.4 C 3X/ D 45:

2.46 (a) P .X  2/ D 0:0351.


(b) P .X  1/ D 0:9649.
(c) 0:29166.
6
 94
 100
 100 10
2.47 P .X D 0/ D 6 0
0 94 10
10=100 10
10. EX D 10.:06/, Var.X/ D :06.:94/10 999
.

2 71
2.48 c D 49
: Y D total amount the insurance company has to pay out, EY D :
490
2.49 EX D 2: EŒX.X 1/ D 3:

2.50 (a) a D A=2:


1
(b)  D a
ln 2.

2.51  2 D 22:58845.

2.52 8 8
<x C 1 0  x  1; <y C 1 0  y  1;
fX .x/ D 2 fY .y/ D 2
:0 otherwise :0 otherwise:

The rvs are not independent. E.X C Y / D 76 

2.53 P .X < 5/  normalcdf.0:4:5; 8; 2:5922/ D 0:08746. The exact value is P .X < 5/ D


binomcdf.50; 0:16; 4/ D 0:08078:
1
2.54 Take k D 2 in P .jX j < 2 /  1 4
D 34 :

2.55 Chebychev gives P .jX 0j  2/  14 : The exact probability is P .jX j  2/ D P .X D


2/ C P .X D 2/ D 41 :

2.56 P .X D 1/ D 0:26695:
208 A. ANSWERS TO PROBLEMS
A.3 ANSWERS TO CHAPTER 3 PROBLEMS
3.1 (a) 25 samples of size 2.

Sample mean 0 1/2 1 3/2 2 5/2 3 7/2 4


Probability 0.04 0.08 0.12 0.16 0.20 0.16 0.12 0.08 0.04
(b) E.X / D 2: Var.X/ D 1:
p
3.2 E.X/ Dp 15=5
p and SD.X / D 2:059= 2: Without replacement, SD.X/ D
2:059= 2 3=4.
3.3 We have  D 70;  2 D 540: Also, P .X D x/ D 0:04; 0:14; 0:1225; 0:18; 0:315; 0:2025
for x D 40; 47:5; 55; 67:5; 75; 95; respectively. Using this distribution calculate E.X/ D
70 D  and Var.X/ D 270 D  2 =2:
p
3.4 (a) E.X / D 6 and SD.X/ D 1:15: (b)  D 6;  D 1:15 9 D 3:45:

3.5 (a) The mean number of defects in a sample of 2 is 2 53 : SE for the total number of
p
defects in a sample of size 2 is 2  1:491  :8944 D 1:886:
1
(b) :
15
3.6 (a) E.S25 / Dp2637:5; Var.Sum/ D 3025; SD.Sum/ D 55: We ignore the correction fac-
tor since .1000 25/=999 D 0:9879:
p
(b) E.X / D 105:5; SD.X/ D = 25 D 11=5 D 2:2.
(c) P .98  X  104/ D 0:24735: Therefore, if N D Number of sample means in this
range, N  Binomial.150; 0:24735/ and E.N / D 37:102:
(d) P .X < 97:5/ D 0:0000138 so the expected number of sample means less than 97.5
will be approximately 0.
3.7 P .X1 C    C X50 < 10/  0:1444.
3.8 (a) P .4:4 < X < 5:2/ D 0:6898.
(b) 85th percentile D 5:345.
(c) P .X > 7/ D 0:022:
3.9 P .X  8/  normcdf.0; 8:5; 16; 3:66/ D 0:0202:
p p
3.10 (a) A  N.25; 10= 50/; B  N.25; 10= 100/:
(b) P .19  A  26/ D 0:7602 and P .19  B  26/ D 0:8413.
3.11 n D 25:
A.3. ANSWERS TO CHAPTER 3 PROBLEMS 209
3.12 P .jX 5j  0:5/  normalcdf.4:5; 5:5; 5; 1/ D 0:3829:
p
3.13 (a) P .X  n=2/  normalcdf.0; np C 0:5; np; np.1 pp//:
P .X D n=2/  normalcdf.np 0:5; np C 0:5; np; np.1 p//:
Here’s the table.
P .X n=2/ P .X n=2/ P .X Dn=2/ P .X Dn=2/
Tosses n Exact Approx. Exact Approx.

10 0.623 0.6233 0.2461 0.2482


20 0.5881 0.5885 0.1762 0.1769
40 0.5627 0.5628 0.1254 0.1256
60 0.5513 0.5514 0.1026 0.1027
(b) When n D 4; p D 0:5 we have P .X D 2/ D 0:375 exactly and P .X D 2/ 
normalcdf(1.5, 2.5, 2,1) D 0.3829.
3.14 The exact answer is P .X  10/ D 0:0713: The approximate answer is P .X  10/ 
0:0667:
3.15 (a) Approximately normalcdf.47:5; 72:5; 60; 5:477/ D 0:9775 or another way
normalcdf.0:4; 0:6; 0:5; 5:477=120/ D 0:9715 using the proportions. The exact
number is binomcdf.120; 0:5; 72/ binomcdf.120; 0:5; 47/ D 0:9779: Therefore,
if 500 people do this, we expect about 489 people to get between 40 and 60%
heads.
(b) The chance we would get 453 people (or less) is P .X  453/ 
normalcdf.0; 453; 488:75; 3:316/  0:
3.16 0:0241:

3.17 (a) -1    -1 , +8    +8 , with 4 C 8s and 34 1s.


(b)  D 0:0526;  D 2:76203.
(c) P .N  4/ D 0:266.
(d) P .W  0/  0:46207:

3.19 (a) 0.9744. (b) 0.09857. (c) 0.88958. (d) 1:782.

3.20 (a) EY D 8 and Var.Y / D 16. (d) P .2 .1/ < 0:0855/p
D 0:23; and
2
(b) P .Y > 15:507/ D 0:05; P .Y < P
p .Z < b/ D P . p b < Z <
3:489/ D 0:1; P .Y < 13:361/ D b/ D 0:23 H) b D 0:29237:
0:9; P .Y > 2:733/ D 0:95: Then b D 0:0855:
(c) P .3:489 < Y < 13:361/ D 0:8:
210 A. ANSWERS TO PROBLEMS
3.21 P .PO  0:5325/  normalcdf.0:5325; 1; 0:51; 0:02499/ D 0:1839:
3.22 P .jt9 j > 3:1622/ D 0:00575:
3.23 With replacement: P .t79  5:96/ D 3:34  10 8 ; i.e., virtually no chance. Without
replacement: P .t79  6:95/  0:
3.24 P .X  1200/ D 0:315427.
3.25 P .:45  PO  :48/ D 0:654612:
3.26 Sample 1: pO D 0:2; SE D 0:0632: Sample 2: pO D 0:25; SE D 0:0684: Sample 3: pO D
0:325; SE D 0:0740:
2
O D 41 2 2 D
3.27 E O D ; Var./ 2
:
3.28 P .jXj  0:01/ D 0:52050:
3.29 n  2075:
3.30 (a) tcdf . 1:923; 1:923; 99/ D 0:94265: (b) n  152:
3.31 P .t48  2:25=.6=7// D 0:00579.

3.32 (a) P .X 125 Y 125  160/ D 0:9772. (b) P .X 125 Y 125  250/ D 0:0062:

3.33 P .PA PB  0:1/ D 0:15865:

3.34 (a) 0:05826. (b) 0:1952033.

3.35 0:071349.

3.36 (a) 0:02275. (c) 0:453: (e) Y  2 .80/:


(b) 0:3085. (d) c D 1:667:

A.4 ANSWERS TO CHAPTER 4 PROBLEMS

4.1 (a) 0:47; 0:0499. (c) T. (e) Doesn’t make sense.


(b) T. (d) .0:37218; 0:56782/. (f ) 0:47866.

4.2 (a) The 90% CI is .66:964; 67:936/: (b) .67:01; 67:89/.

4.3 (a) 47.5.


A.4. ANSWERS TO CHAPTER 4 PROBLEMS 211
(b) X  Binom.50; 0:95/: P .X D 40/  0; P .X  45/ D 0:1036; P .X > 40/  1.
4.4 n  .1:96=0:03/2 =4 D 1067:11.
4.5 .0:42185; 0:67815/.
4.6 (a) .14:041; 16:359/, .13:767; 16:633/, .13:361; 17:039/.
(b) .14:682; 15:718/, .14:559; 15:841/, .14:378; 16:022/.

4.7 The pivot is T D sX=pn and we start with P .T  t.n 1; ˛=2// D 1 ˛: Solving the
X
inequality for  to get the result.
4.8 (a) P .Xmin  m  Xmax / D 1 P .Xmin > m/ P .Xmax < m/. Now P .Xmin >
m/ D P .X1 > m; : : : ; Xn > m/ D P .X > m/n D .1=2/n ; and P .Xmax < m/ D
P .X1 < m; : : : ; Xn < m/ D P .X < m/n D .1=2/n ; P .Xmin  m  Xmax / D 1
2.1=2/n D 1 .1=2/n 1 :
(b) By the first part this is 1 .1=2/7 D 0:9921875.

4.9 (a) .11:15; 12:05/. (b) (b). (c) 1138.

4.10 n  884:
4.11 95% CI is .37:874; 41:826/. The histogram of the data is skewed right because there
are some cars with high mpg. A lower 95% CI is .39:85 1:729 4:221187 p
20
; 1/ D
.38:218; 1/: We are 95% confident that the mean mpg is at least 38.218 mpg.
 
.n 1/sX2 2
.n 1/sX
4.12 ;
2 .n 1;˛=2/ 2 .n 1;1 ˛=2/
D .34  4=51:9659; 34  4=19:80625/ D
.2:6171; 6:8665/:

4.13 (a) .23:722; 24:278/. (b) .17:568; 30:432/.

4.14 The mean difference is 4.2 pounds; the CI is . 1:721; 10:121/ with df D 60:292: There
is not enough evidence to conclude the difference is real.
4.15 95% CI is .1:7134; 2:0866/ with df D 284:865: 99% CI is .1:654; 2:146/. Since 0 is not
in either CI, there is evidence that the difference is real.
p
3s= 2n
4.16 The 99% CI for  is s ˙ 3 ps2n : The percentage error in the SD is s
D 300 p1 %.
2n
If we want this no more than 5% we need 300 p12n  5 H) n  1800:
212 A. ANSWERS TO PROBLEMS
4.17 (a) .4:057; 8:543/. (b) n  346:

4.18 (a) .20:122; 21:478/. (b) .12:473; 29:126/.

4.19 d D 3:7; s D 4:945; CI is .0:16238; 7:238/:


4.20 The CI is . 0:0263; 0:22626/:

A.5 ANSWERS TO CHAPTER 5 PROBLEMS


5.1 (a) z D 1:0: Retain H0 .
(b) ˇ.2/ D 0:2946.
(c) ˇ./ D P .5:54 5 < Z < 9:46 5/. The power function is then ./ D 1
P .5:54 5 < Z < 9:46 5/, and .1:5/ D 0:05:
5.2 ˛ D 0:03.

5.3 (a) The p-value is D 0.054. Retain H0 . (c) The p-value is D 0.032. Reject H0 .
(b) The p-value is D 0.121. Retain H0 .

105:6 100:3 (b) ˇ.103/ D 0:8211 ) .103/ D 1


5.4 (a) Reject H0 if t D p D
6:25= 15 ˇ.103/ D 1 0:8211 D 0:1789.
3:2843  t.14; 0:01/ D 2:624. Reject
(c) The general form of ./ is ./ D
H0 .
1 P .t .14/ < 64:778 0:61968/:

5.5 (a) Reject H0 if t  t .24; 0:01/ D 2:492. (c) Reject H0 if t  t.24; 0:025/ D
(b) Reject H0 if t  t .24; 0:02/ D 2:064 or t  t.24; 0:025/ D 2:064.
2:172.

5.6 (a) p-value D 0:0176: Reject H0 . (c) p-value D 2P .t.19/  1:1849/ D


(b) p-value D 0:234: Retain H0 . 2.0:1253/ D 0:2506: Retain H0 .


5.7 (a) We have ˛ D P .n 1/S 2 =02  2 .n 1; 1 ˛/ : Therefore by the definition of
Type II error,

ˇ.12 / D P .n 1/S 2 =12 > 02 =12 2 .n 1; 1 ˛/ :

The remaining parts are similar.


A.5. ANSWERS TO CHAPTER 5 PROBLEMS 213
.n 1/s 2 19.0:33167/ (b) ˇ.1:5/ D 0:6094; .1:5/ D 0:3906:
5.8 (a) 2 D 02
D 12
D
2  
6:3017: Reject H0 if  D 6:3017  30:1
(c) . 2 / D 1 P 2 .19/ < 2 .
2 .19; 0:05/ D 30:1. Retain H0 . 

5.9 (a) p-value D 0.0438. Reject H0 . (c) p-value D 0.107. Retain H0 .


(b) p-value D 0.0847. Retain H0 .

5.10 (a) 2 D 41:413: Reject H0 if 2 D (b) p-value D 0.0633.


41:413  2 .29; 0:05/ D 42:557. (c) ˇ.3/ D P .2 .29/ < 18:914/ D
Retain H0 . 0:0765 H) .3/ D 0:9235:
2
5.11 2 D 29.1:8/
22
D 23: 49: Reject H0 if 2  2 .29; 0:975/ D 16: 047 or 2 
2 .29; 0:05=2/ D 2 .29; 0:025/ D 45: 722. Retain H0 .
5.12 (a) Since x D 226:5; s 2 D 1:61; test H0 W  2 D 2:25 vs. H1 W  2 < 2:25: The test
n 1 2 10 1
statistic is 2
S  2 .n 1/ H) 1:61 D 6:44: The p-value of the test
0 2:25
is P .2 .9/  6:44/ D 0:3047. Retain H0 . It is plausible that the variance of the
thickness is 2.25. Note that the critical region is .0; 3:325/; 3:325 D inv2 .0:95; 9/.
2
(b) ˇ.2/ D P . .n 1/S
2 > 3:325j 2 D 2/ D P .2 .n 1/ > 3:325 2:25
2
j 2 D 2/ D
0
P .2 .n 1/ > 3:75/ D 0:927:
5.13 (a) z D 2:049: We reject H0 .
(b) p-value D 0:0405:
(c) ˇ.0:27/ D 0:6434.
p
5.14 (a) invNorm.0:975;
p 1=6; 800.1=6/.5=6// D 153:99 H) x D 154 or invNorm
.0:025; 1=6; 800.1=6/.5=6// D 112:67 H) x D 112.
Therefore x  112 or x  154:
(b) 107  x  160.
5.15 Note that np0 D 8.0:6/ D 4:8 < 5 and n.1 p0 / D 8.:4/ D 3:2 < 5. The normal ap-
proximation is not appropriate in this instance. The binomial distribution must be used
directly. Let X be the number of successes.
(a) p-value D P .X  3/ D binomcdf.8; 0:6; 3/ D 0:17367.
(b) The critical region when ˛ D 0:1 is x  2 and x D 8.
sX2 20:12
5.16 (a) Reject H0 if f D D D 2:714 4  F .33; 28; 0:025/ D 2:089 or f 
sY2 12:22
F .33; 28; 0:975/ D 0:489. Reject H0 .
214 A. ANSWERS TO PROBLEMS
(b) Because we rejected H0 in the test in (a), equal variances cannot be assumed. To
test equality of means, the degrees of freedom of the t distribution is given by
6  2 7
6 1 20:12 1 7
6 C 7
6 7
6 34 12:22 29 7
D6  7 D 55:
4 2 2 5
1 20:1 1
C
342 .33/ 12:22 292 .28/
ˇ ˇ ˇ ˇ
ˇ ˇ
ˇ x y ˇ ˇˇ 105:5 90:9 ˇˇ
Reject H0 if jtj D ˇˇ r 2 2 ˇˇ D ˇ q 2 ˇ D 3:5395  t.55; 0:025/ D 2:004.
ˇ smX C snY ˇ ˇ 20:1 12:22 ˇ
34 C 29

Reject H0 .
(c) p-value D 2P .t.55/  3:5395/ D 2.0:000412/ D 0:000824.
2
sX 6:8
5.17 (a) Reject H0 if f D 2
sY
D 7:1
D 0:95775  F .10; 9; 0:1/ D 2:4163: Retain H0 .
2
sX 6:8
(b) Reject H0 if f D 2
sY
D 7:1
D 0:95775  F .10; 9; 0:95/ D 0:3311: Retain H0 .
2
sX 6:8
(c) Reject H0 if f D 2
sY
D 7:1
D 0:95775  F .10; 9; 0:005/ D 6:4172 or
f D 0:95775  F .10; 9; 0:995/ D 0:16757: Retain H0 .

5.18 (a) Since the variances are assumed equal, we compute the pooled variance as sp2 D
10:186: Reject H0 if

19:1 16:3
tD r D 2:8137  t.40; 0:05/ D 1:684:
p 1 1
10:186 C
24 18
Reject H0 .
(b) p-value D P .t .40/  2:8137/ D 0:00378.
5.19 (a) To test equality of means, the degrees of freedom of the t -distribution is given by
 D 37: Reject H0 if
ˇ ˇ
ˇ ˇ
ˇ ˇ
ˇ 3:8 3:6 ˇ
ˇ
jtj D ˇ r ˇ D 0:50556  t.37; 0:025/ D 2:0262:
ˇ
ˇ 1:22 1:32 ˇ
ˇ C ˇ
20 20
Retain H0 .
(b) p-value D 2P .t.37/  0:50556/ D 2.0:30808/ D 0:61616.
A.5. ANSWERS TO CHAPTER 5 PROBLEMS 215
5.20 Reject H0 if
ˇ ˇ
ˇ ˇ
ˇ ˇ
ˇ .x y/ ˇ jx yj jx yj
ˇ
jt j D ˇ r ˇD r D  t.21; 0:025/ D 2:0796
1 1ˇˇ p 1 1 25:5692
ˇ
ˇ sp C ˇ 3581:6 C
m n 9 14
H) jx yj  2:0796.25:5692/ D 53:1737:

The smallest value of jx yj resulting in H0 being rejected is 53:1737.


85 93
500 C500
5.21 (a) The pooled proportion is calculated as p 0 D 500
1000
500
D 0:178.
Reject H0 if
ˇ ˇ ˇ ˇ
ˇ ˇ ˇ 85 93 ˇ
ˇ ˇ ˇ ˇ
ˇ x y ˇ ˇ 500 500 ˇ
ˇ
jzj D ˇ r ˇ Dˇ ˇ r ˇ
p 1 1 ˇ p 1 1 ˇ
ˇ ˇ ˇ ˇ
ˇ pN0 .1 pN0 / C ˇ ˇ 0:178.1 0:178/ C ˇ
n m 500 500
D 0:6614  z0:025 D 1:96:

Retain H0 .
(b) p-value D 2P .Z  0:6614/ D 2.0:2541/ D 0:5082.
(c) ˇ. :05/ D 0:45743:
5.22 (a) Let X and Y denote the sample of at bats last and this season, respectively. Con-
sider the hypothesis H0 W pX D pY vs. H1 W pX > pY with ˛ D 0:05. The pooled
proportion is calculated as p 0 D 300.0:276/C235.0:220/
535
D 0:2514: Reject H0 if

0:276 0:220
zD r D 1:4818  z0:05 D 1:645:
p 1 1
0:2514.1 0:2514/ C
300 235
Retain H0 .
(b) p-value D 0:069:
550.0:61/C690.0:53/
5.23 (a) The pooled proportion is calculated as p 0 D 1240:0
D 0:56548: Reject
H0 if
0:61 0:53
zD r D 2:8234  z0:01 D 2:326:
p 1 1
0:56548.1 0:56548/ C
550 690
Reject H0 .
216 A. ANSWERS TO PROBLEMS
(b) p-value D P .Z  2:8234/ D 0:002376:
(c) We show how to do this in general for a two-proportion, one-sided test. In our
case
q H1 W pX pY > 0: Suppose we have H1 W pX pY D # > 0. Then with SE D
N
X.1 N
X/ YN .1 YN /
nX
C nY
,
s  !
 1 1
ˇ.#/ D P XN YN < z˛ PN0 1 PN0 C
nX nY
0 s   1
 1 1
B z˛ PN0 1 PN0 C #C
B XN YN # nX nY C
DPB
B < C
C
@ SE SE A

0 s   1
 1 1
B z˛ PN0 1 PN0 C #C
B nX nY C
PB
BZ <
C
C
@ SE A

0 p q 1
1 1
z˛ pN0 .1 pN0 / C #
B nX nY C
 P @Z < q A
pNX .1 pNX / pN Y .1 pN Y /
m
C n
(when samples values are substituted for rvs).

Alternatively,
p find the critical region for given ˛ first using x D invNorm.1
˛; 0; p 0 .1 p 0 /.1=nX C 1=nY //: Then find the area to the left of x under the
normal curve using normalcdf. 1; x; #; SE/:
Given pX pY D # D 0:1, we have SE D 0:02817, and x D
invNorm.:99; 0; 0:02833/ D 0:06591. Then ˇ.0:1/ D normalcdf. 1, 0:06591; 0:1,
0:02817/ D 0:11311:
2
5.24 (a) d D 0:5; sD D 7:8333: Reject H0 if

0:5 0:0
p D 0:56493  t.9; 0:05/ D 1:8331:
7:8333
p
10

Retain H0 .
(b) p-value D 0:29296:
A.5. ANSWERS TO CHAPTER 5 PROBLEMS 217
2
5.25 (a) d D 0:02; sD D 0:0008222: Reject H0 if
0:02 0:0
p D 2:2057  t.9; 0:025/ D 2:2622 or 2:2057  2:2622:
0:0008222
p
10
Retain H0 .
(b) p-value D 2.0:027414/ D 0:054828:
5.26 Test H0 W pOC D 0:2785; pAC D 0:208; : : : ; pAB D 0:0049 vs. H1 : the proportions are
not the same. Expected frequencies:

Expected Number of Residents


Blood Type
with Blood Type
O+ 1150  0:2785 D 320: 28
A+ 1150  0:208 D 239: 2
B+ 1150  0:3814 D 438: 61
AB+ 1150  0:0893 D 102: 70
O- 1150  0:0143 D 16: 445
A- 1150  0:0057 D 6: 555
B- 1150  0:0179 D 20: 585
AB- 1150  0:0049 D 5: 635

Value of D7  2 .7/: d7 D 19:934: Reject H0 if d7 D 19: 934  2 .7; 0:01/ D 18:475:


Reject H0 .
5.27 Test H0 W the die is fair vs. H1 W the die is not fair.

Expected frequencies: .180/ 16 D 30.
1
Value of D5  2 .5/: d5 D .# 30/2 : Reject H0 if 151
.# 30/2  2 .5; 0:05/ D
15
11:07 ) #  17: 114 or #  42: 886. Reject H0 if #  17 or #  43.
5.28 Test H0 W pB D 0:207; pO D 0:205; : : : ; pB D 0:124 vs. H1 W the percentages at the two
plants are not the same. Expected frequencies:
Color Expected Number of M&M
Blue 207
Orange 205
Green 198
Yellow 135
Red 131
Brown 124
218 A. ANSWERS TO PROBLEMS
Value of D5  2 .5/: d5 D 1:8765: Reject H0 if d5 D 1:8765  2 .5; 0:05/ D 11:07.
Retain H0 .
5.29 Test H0 W data follows a Poisson distribution with  D 2 vs. H1 W data does not follow
a Poisson distribution with  D 2.
Expected frequencies:

Sailfish Caught Expected Number of Days


e 2 20
0 60  0Š
D 8:120 1

e 2 21
1 60  1Š
D 16:24

e 2 22
2 60  2Š
D 16: 24

e 2 23
3 60  3Š
D 10: 827
 P3 
e 2 2k
4 60  1 kD0 kŠ
D 8: 572 6

Value of D4  2 .4/: d4 D 4:4277: Reject H0 if d4 D 4: 4277  2 .4; 0:05/ D 9: 4877:


Retain H0 .
5.30 Test H0 W data follows a Poisson distribution vs. H1 W data does not follow a Poisson
distribution.
(a) The Poisson rate  can be estimated from 575 observations as
.229/.0/ C .211/.1/ C .93/.2/ C .35/.3/ C .7/.4/
N D D 0:922:
576
Expected frequencies:

Hummingbird Visits Expected Number of Days


e 0:922 0:9220
0 576  0Š
D 229:088

e 0:922 0:9221
1 576  1Š
D 211:219

e 0:922 0:9222
2 576  2Š
D 97:372
0:922 3
3 576  e 0:922

D 29:926
 P3 e 0:922 0:922k 
4 576  1 kD0 kŠ
D 8:394
A.5. ANSWERS TO CHAPTER 5 PROBLEMS 219
2 2
Value of D4   .3/: d4 D 1:075: Reject H0 if d4 D 1:075   .3; 0:05/ D
7: 814 7: Retain H0 .
(b) Now suppose  D 0:8. Expected frequencies:

Hummingbird Visits Expected Number of Days


e 0:8 0:80
0 576  0Š
D 258:813

e 0:8 0:81
1 576  1Š
D 207:051

e 0:8 0:82
2 576  2Š
D 82:820
0:8 3
3 576  e 3Š0:8 D 22:085
 P3 e 0:8 0:8k 
4 576  1 kD0 kŠ
D 5:230

Value of D4  2 .4/: d4 D 13:780. Reject H0 if d4 D 13:780  2 .4; 0:05/ D


9:4877. Reject H0 .

5.31 H0 W data follows an exponential distribution with mean 40 seconds vs. H1 W data does
not follow and exponential distribution with mean 40 seconds. Probabilities of an in-
terarrival time falling in each of the intervals are listed in the following tables.

Interval P (time falling in interval) Expected No. Interarrival


Interval
1 1
Times in Interval
Œ0; 20/ e 40 .0/ e 40 .20/ D 0:393 Œ0; 20/ 39:3
1 1 Œ20; 40/ 23:9
Œ20; 40/ e 40 .20/ e 40 .40/ D 0:239
Œ40; 60/ 14:5
1 1
Œ40; 60/ e 40 .40/ e 40 .60/ D 0:145 Œ60; 90/ 11:8
1
40 .60/
1
40 .90/
Œ90; 120/ 5:5
Œ60; 90/ e e D 0:118
Œ120; 1/ 5:0
1 1
Œ90; 120/ e 40 .90/ e 40 .120/ D 0:055
1 1
Œ120; 180/ e 40 .120/ e 40 .180/ D 0:039
9
Œ180; 1/ e 2 D 0:011

Value of D5  2 .5/: d5 D 5:3826: Reject H0 if d5 D 5:3826  2 .5; 0:05/ D 11:07.


1
Retain H0 . The possibility that the data is from an exponential distribution with  D 40
cannot be eliminated.
220 A. ANSWERS TO PROBLEMS
5.32 H0 W data follows a geometric distribution vs. H1 W data does not follow a geomet-
ric distribution. If X is the random variable that counts the number of casts, and
if X is geometric, then P .X D x/ D .1 p/x 1 p . We have the estimate for p as
pN D 1=3:82 D 0:26178:
Expected frequencies:

Number of Casts
Expected Frequency
Until a Strike
1 .50/ .0:26178/ D 13:089
2 .50/ .1 0:26178/.0:26178/ D 9:6626
3 .50/ .1 0:26178/2 .0:26178/ D 7:1331
4 .50/ .1 0:26178/3 .0:26178/ D 5:2658
X1
5 .50/ .1 0:26178/i .0:26178/ D 14:8495
i D4

We combine the cells for i  5 strikes so that all the expected frequencies are at least
5. Value of D4  2 .3/: d4 D 9:277. Reject H0 if d4 D 9:277  2 .3; 0:01/ D 11: 345:
Retain H0 . However, if ˛ D 0:05, the null hypothesis can be rejected since
d4 D 9:277  2 .3; 0:05/ D 7: 814 7:
At that level, it is likely that the data is not following a geometric distribution.
5.33 Test H0 W injury due to criminal violence and choice of profession are independent vs.
H1 W injury due to criminal violence and choice of profession are not related.
Estimated row and column probabilities:
318 172 174
pOV  D D 0:648 98; pOO D D 0:351 02; pOP D D 0:355 1;
490 490 490
116 99 101
pOC D D 0:236 73; pOT D D 0:202 04; pOS D D 0:206 12:
490 490 490
Estimated frequencies:
318 174 318 116
490pOV  pOP D 490   D 112:922; 490pOV  pOC D 490   D 75:280;
490 490 490 490
318 99 318 101
490pOV  pOT D 490   D 64: 249; 490pOV  pOS D 490   D 65:546;
490 490 490 490
172 174 172 116
490pOO pOP D 490   D 61:077; 490pOO pOC D 490   D 40: 718;
490 490 490 490
172 99 172 101
490pOO pOT D 490   D 34: 751; 490pOO pOS D 490   D 35: 453:
490 490 490 490
A.5. ANSWERS TO CHAPTER 5 PROBLEMS 221
2 2
Value of D7   .3/: d7 D 65:526: Reject H0 if d7 D 65:526   .3; 0:01/ D 11:345.
Reject H0 .
5.34 Test H0 W selectivity and sensitivity are independent vs. H1 W selectivity and sensitivity
are not independent
Estimated row and column probabilities:
30 112 28
pOLS  D D 0:176 47; pOAS  D D 0:658 82; pOHS  D D 0:164 71;
170 170 170
52 88 30
pOLN D D 0:305 88; pOAN D D 0:517 65; pOHN D D 0:176 47:
170 170 170
Estimated frequencies: (Note that all estimated frequencies are at least 5 except the
frequency of high selectivity and high sensitivity. However, it is acceptably close to 5.)

30 52 30 88
170pOLS pOLN D 170   D 9:176; 170pOLS pOAN D 170   D 15: 529;
170 170 170 170
30 30 112 52
170pOLS  pOHN D 170   D 5:294; 170pOAS  pOLN D 170   D 34:258;
170 170 170 170
112 88 112 30
170pOAS pOAN D 170   D 57: 976; 170pOAS  pOHN D 170   D 19: 765;
170 170 170 170
28 52 28 88
170pOHS  pOLN D 170   D 8:565; 170pOHS  pOAN D 170   D 14:495;
170 170 170 170
28 30
170pOHS  pOHN D 170   D 4:9411:
170 170

Value of D8  2 .4/: d8 D 18:012: Reject H0 if d8 D 18:012  2 .4; 0:01/ D 13: 277.


Reject H0 .
5.35 Test H0 W rH factor and blood type are independent traits vs. H1 W rH factor and blood
type are not independent.
Reformulate the table in terms of rH factor.

O A B AB Totals
Positive rH 344 207 448 92 1091
Negative rH 23 12 23 11 69
Totals 367 219 471 103 1150

Value of D7  2 .3/: d7 D 5:2709: Reject H0 if d7 D 5: 270 9  2 .3; 0:05/ D 7: 814 7.


Retain H0 .
222 A. ANSWERS TO PROBLEMS
5.36 Test H0 W gender and genre of programming are independent vs. H1 W gender and genre
of programming are dependent. Value of D5  2 .2/: d5 D 4:4334: Reject H0 if d5 D
4:4334  2 .2; 0:05/ D 5:9915. Retain H0 .
5.37 Test H0 W pI D pII D pIII D pI V D pV vs. H1 W the proportions are not the same.
Estimated row and column probabilities:
112 388 100
pOK D D 0:224; pON  D D 0:776; pOi 2fI;II;III;I V;V g D D 0:2:
500 500 500
Estimated frequencies:
112 100
500pOK pOi 2fI;II;III;I V;V g D 500   D 22:4;
500 500
388 100
500pON  pOi 2fI;II;III;I V;V g D 500   D 77: 6:
500 500
Value of D9  2 .4/: d9 D 2:140: Reject H0 if d9 D 2:140  2 .4; 0:05/ D 9:4877: Re-
tain H0 .
5.38 Test H0 W A and B traits are independent vs. H1 W the A and B traits are dependent.
Estimated row and column probabilities:
100 75
pOAi  D ; i D 1; 2; 3; pOBj D ; j D 1; 2; 3; 4:
300 300
Estimated frequencies:
100 75
300pOAi  pOBj D 300   D 25, i D 1; 2; 3, j D 1; 2; 3; 4:
300 300
Value of D11  2 .6/: d11 D 104
25
# 2 D 2 .6; 0:05/ D 12:59 H) # D 1:74: The small-
est value of # for which the null hypothesis is rejected is # D 2.
5.39 Let S denote the number of speeds greater than 200 mph. From the table, S D 8. Set
P 
˛ D 0:05. We choose the minimum value of k such that kiD1 10 i
.0:5/10  0:95. The
relevant section of the binomial table for n D 10 and p D 0:5 is given below.
k P .S  k j H0 is true/
6 0:827 15
7 0:944 34
8 0:988 28
9 0:998 05
We choose k D 8. We reject H0 if S > 8. Since S D 8, we retain H0 . The p-value is
P
P .S  8/ D 10 10
iD8 i .0:5/
10
D 0:05468.
A.5. ANSWERS TO CHAPTER 5 PROBLEMS 223
5.40 H0 W A D B D C vs. H1 W means are different.
(a) x  D 9:6717; x A D 10:968; x B D 10:68; x C D 7:3667:
(b) SSE D 52:61; SSTR D 48:061; SST D SSE C SSTR D 100:67:
(c) MSTR D SSTR
2
D 48:061
2
D 24:031, MSE D SSE
15
D 52:61
15
D 3:5073.
MSTR 24:031
(d) f D MSE
D 3:507 3 D 6:8517:
(e) p-value D P .F .2; 15/  6:8517/ D 0:008:
(f )
Source DF SS MSS F-statistic p-Value
Treatment 2 48: 061 24: 031 f D 6: 851 7 0:008
Error 15 52: 61 3: 507 3 * *
Total 17 100: 67 * * *
(g) Reject H0 at the ˛ D 0:01 level. The three random variables A, B , and C do not
have the same mean.
5.41 H0 W A D J D B vs. H1 W the means are not equal.
(a) x  D 9:4; x A D 9:6; x J D 8:8; x B D 9:8:
(b) SSE D 10:8; SSTR D 2:8; SST D SSE C SSTR D 10:8 C 2:8 D 13:6:
(c) MSTR D SSTR
2
D 2:8
2
D 1:4, MSE D SSE
12
D 10:8
12
D 0:9.
MSTR 1:4
(d) f D MSE D 0:9 D 1:5556:
(e) p-value D P .F .2; 12/  1:5556/ D 0:2508:
(f )
Source DF SS MSS F-statistic p-Value
Treatment 2 2:8 1:4 f D 1: 555 6 0:2508
Error 12 10:8 0:9 * *
Total 14 13:6 * * *
(g) Retain H0 at the ˛ D 0:05 level. Alice, Bob, and John appear to have the same
performance level on installing windshields.
5.42 H0 W P D L D N vs. H1 W means are not equal.

Source DF SS MSS F-statistic p-Value


Treatment 2 1484:933 742:4667 f D 11:26657 :001761
Error 12 790:8 65:9 * *
Total 14 2275: 7 * * *
Reject H0 at the ˛ D 0:01 level. Dosage levels seem to have an effect on depression.
224 A. ANSWERS TO PROBLEMS
5.43 H0 W M D S D L vs. H1 W the means are not equal.

Source DF SS MSS F-statistic p-Value


Treatment 2 10:0882 5:0441 f D 2:76338 :099985
Error 13 23:7293 1:8253 * *
Total 15 33: 818 * * *
We retain H0 at the ˛ D 0:05 level. It appears the dairies have the same level of
Strontium-90 contamination in their milk.
5.44 The solution is broken up into several steps.
(a) dfSSTR D 3:
0:708
(b) f D 0:75
D 0:9444.
(c) SSE D .20/.0:9444/ D 18:88.
(d) n 1 D 23 ) n D 24 and n k D 20 ) 24 k D 20 ) k D 4.
(e) p-valueD P .F .3; 20/  0:75/ D 0:5351:
(f ) Completed table:

Source DF SS MSS F-statistic p-Value


Treatment 3 2:124 0:708 f D 0:75 0:5351
Error 20 18:88 9:44 * *
Total 23 21: 004 * * *

5.45 H0 W 5Y D 10Y D 15Y D 20Y vs. H1 W the means are different.
(a) x  D 50:35; x 5Y D 58:4; x 10Y D 57:4; x 15Y D 43:6; x 20Y D 42:0:
(b) SSE D 201:6; SSTR D 1149:0; SST D SSE C SSTR D 1350:6:
SSTR 1149:0 SSE 201:6
(c) MSTR D 3
D 3
D 383:0, MSE D 16
D 16
D 12:6.
(d) f D 30:397:
(e) p-valueD P .F .3; 16/  30:397/ D 7:6621  10 7 .
(f )
Source DF SS MSS F-statistic p-Value
7
Treatment 3 1149: 0 383:0 f D 30: 397 7: 662 1  10
Error 16 201: 6 12: 6 * *
Total 19 1350: 6 * * *

(g) Reject H0 at the ˛ D 0:01 level. Marksmanship skill differs according to how many
years served.
A.6. ANSWERS TO CHAPTER 6 PROBLEMS 225
A.6 ANSWERS TO CHAPTER 6 PROBLEMS
6.2 .x x/ D r ssXY .y y/: bO D r ssXY ; aO D x O Finally, f .a;
by: O D .n
O b/ 1/.1 r 2 /sX2 :
Y X
6.3 We have b D  X
; d D Y
H) b  d D 2 : Then using the two lines given,  D
0:7027:
p
6.4 (a) Y1 D 8:6 C "1 ; Y2 D 11:1 C "2 H) Y1 Y2  N. 2:5; 1:7 2/:
p
(b) P .Y1 > Y2 / D normalcdf.0; 1; 2:5; 1:7 2/ D 0:1492.
6.5 Since x D 4:45; sx D 2:167; y D 5:375; sy D 1:2104; r D 0:7907 we have
1:2104
.y 5:375/ D 0:7907 .x 4:45/ D 0:4416.x 4:45/
2:167
2:167
and .x 4:45/ D 0:7907 .y 5:375/ D 1:4156.y 5:375/: The minimum of f
1:2104
with dependent
p variable x is 7.1 0:79072 /1:21042 D 3:844; and with dependent vari-
able y is 7 1 0:79072 2:1672 D 12:32:
120
6.6 (a) We have VSAT 610 D 0:73 110 .MSAT 570/: Therefore, if MSAT D
690; VSAT D 705:56:
(b) MSAT D 570 C 0:73.110=120/.700 610/ D 630:23:
(c) 0:738846 D .MSAT 570/=110 D 0:73.VSAT 610/=120 H) z D .VSAT
610/=120 D 1:01211 H) normalcdf. 1; z/ D 0:842.
p
(d) Se D 1 r 2 SDVSAT D 82:0136:
p p
6.7 (a) We have the model S  N.a C b B;  / and   se D 31=30SDS 1 0:262 D
2:159: So, normalcdf.68; 1; 62; 2:2:159/ D 0:00273.
(b) The regression equation is S D 45:7933 C 0:2383 B: If B D 72; then S D 62:9509
and then  N.62:95; se /; so normalcdf.68; 1; 62:9509; 2:1259/ D 0:00967.
6.8 (a) 16.8. (b) 15.2.
6.9 (a) 19% (b) 63.20% (c) 52.50% (d) 50%
6.10 Consider f .b/ D E.Y b X /2 : We have f 0 .b/ D 2E.Y b X /X D 0 PH) b D
E.XY /=E.X 2 /: Also f 00 .b/ D 2EX 2 > 0: Using observations we have, bO D Pxxi y2 i :
i
0 P P 21 0 1 0 P 1
n x x a y
P P 2i P i3 P i
6.11 In matrix form @ xi x x A @b A D @ xi yi A. For the given data, the
P 2 P i3 P i4 P 2
xi xi xi c xi yi
equation of the least squares quadratic is y D 0:57 C x C 1:107x 2 :
n 2
6.12 E.S 2 = 2 / D n 2 so an unbiased estimator for  2 using the s 2 of the problem is n 2
s :
226 A. ANSWERS TO PROBLEMS
6.14 Let x denote gas CPI and y be the food CPI. The data gives x D 4:836; y D
2:182; sX D 26:95; sY D 1:88: The regression equation is food CPI D 2:323
0:0293 gas CPI: The correlation coefficient is r D 0:42:
6.17 Here is the ANOVA table
Source of Variation DF SS MS F-statistic p-Value
Regression 1 570.04 MSR D 570:04 77:05 D F .1; 18/ 0.0
Residuals 18 133.16 MSE D 7:4
Total 19 703.2

The last column gives the p-value for the observed F value of 77.05. The null is rejected
and there is strong evidence that the slope of the regression line is not zero. Since r ssyx
is the slope, we can use this to conclude that there is strong evidence the correlation is
not zero.
6.18 We get the regression line and correlation coefficient y D 3:10091 C 2:02656x; r D
:99641. We want to test H0 W ˇ D 0; H1 W ˇ ¤ 0: We get the test statistic t D 33:2917:
With n 2 D 8 degrees of freedom, the p-value is basically 0.
6.24 We have y.750/
O D 14:95 and the 95% PI 14:95 ˙ 1:492:
57122
6.25 We have Syy D 2243266 17
D 324034 and

6602 .660  5712/


Sxx D 35990 D 10366:5; Sxy D 188429 D 33331:
17 17
2
Sxy
Sxy
Then bO D Sxx
D 3:22 and ˛O D y O D 461:01: Finally, s 2 D
bx 1
.S
n 2 yy Sxx
/ D
14457:7:
227

Authors’ Biographies
EMMANUEL BARRON
Professor Barron received his B.S. (1970) in Mathematics from the University of Illionois at
Chicago and his M.S. (1972) and Ph.D. (1974) from Northwestern University in Mathematics
specializing in partial differential equations and differential games. After receiving his Ph.D.,
Dr. Barron was an Assistant Professor at Georgia Tech, and then became a Member of Tech-
nical Staff at Bell Laboratories. In 1980 he joined the Department of Mathematical Sciences
at Loyola University Chicago, where he is a Professor of Mathematics and Statistics. Professor
Barron has published over 70 research papers, and he has also authored the book Game Theory:
An Introduction, 2nd Edition in 2013. Professor Barron has received continuous research fund-
ing from the National Science Foundation and the Air Force Office for Scientific Research.
Dr. Barron has taught Probability and Statistics to undergraduates and graduate students since
1974.

JOHN DEL GRECO


A native of Cleveland, Ohio, Dr. John G. Del Greco holds a B.S. in Mathematics from John
Carroll University, an M.A. in Mathematics from the University of Massachusetts, and a Ph.D.
in Industrial Engineering from Purdue University. Before joining Loyola’s faculty in 1987,
Dr. Del Greco worked as a systems analyst for Micro Data Base Systems, Inc., located in
Lafayette, Indiana. His research interests include applied graph theory, operations research,
network flows, and parallel algorithms, and his publications have appeared in such journals as
Discrete Mathematics, Discrete Applied Mathematics, the Computer Journal, Lecture Notes in Com-
puter Science, and Algorithmica. He has been teaching Probability and Statistics at all levels in
the Department of Mathematics and Statistics at Loyola for the past 20 years.
229

Index

2 with one degree of freedom, 38 confidence level, 77


2-way table, 6 contingency table, 6
50th percentile, 32 continuity correction, 30
Continuous Distributions, 28
Bayes’ Rule, 9 Exponential, 28
Best of seven series, 25 Normal, 28
bootstrap method, 85 Uniform, 28
correlation coefficient, 170
Central Limit Theorem, 45
Counting, 12
Chebychev’s Inequality, 47 Combinations, 13
coefficient of determination, 172 Permutations, 12
conditional independence, 11 covariance, 170
conditional probability critical values for 2 , 80
definition, 5 cumulative distribution function (cdf ), 22
confidence interval
lower, 88 Discrete Random Variables, 23
upper, 87 distributions
Confidence Intervals Binomial, 23
mean,  known, 81 Discrete Uniform, 23
confidence interval for p , 85 Geometric, 23
definition, 77 Hypergeometric, 24
mean,  unknown, 83 Multinomial, 24
one-sided, 87 Negative Binomial, 24
proportion, 84 Poisson, 23
sample size, 82, 86
summary for Normal, 84 equally likely outcomes, 4
Two samples event, 1
expected value, 31
variances known, 89
Variance, 83 goodness-of-fit, 128
confidence intervals for slope and intercept,
182 hypothesis testing
230 INDEX
tests for mean, 110 p-value approach, 106
alternative, 107 percentile, 32
and confidence intervals, 108 pivotal quantities, 79
critical region, 109 power of a test, 123
null, 107 probability density function (pdf ), 20
one sided for proportions, 117 probability function, 2, 7
one- and two-sided tests, 109 probability mass function (pmf ), 19
p-value approach, 113 properties of correlation coefficient, 172
paired samples, 119 quartiles, 32
power, 123
proportions, 115 random sample, 59
test for median, 112 random variables, 19
tests for variance, 110 Bernoulli and Binomial, 19
two populations, 117 Binomial, 23
Type I and Type II errors, 108 continuous, 20
variances of two samples, 119 discrete, 19
Discrete Uniform, 23
independence, 8 Normal, 21
independent random variables, 41 regression
line using data points, 174
joint cumulative distribution function, 38 slope-intercept form, 174
joint distributions, 38 best estimate, 170
correlation coefficient, 170
Law of Total Probability, 3, 7 distributions of slope and intercept, 180
least squares fit line, 171 error sum of squares, 178
level of significance, 106, 108 errors, 176
likelihood function, 196 mean square error of regression, 171
linear model, 172 prediction interval, 187
regression line equation, 171
marginal cdf, 39 regression sum of squares, 178
marginal density, 39 slope of regression line, 170
median, 32 SSE, SSR, SST, 178
moment-generating functions, 34 switching variables, 194
moments of a rv, 34 total sum of squares, 178
Multiplication rule, 5 regression fallacy, 174
mutually exclusive, 2 residuals, 177

Normal approximation to Binomial, 30 sample correlation coefficient, 174


sample size, 86
outlier, 179 sample space, 1
INDEX 231
partition, 9 Sum rule
scatter plot, 169 disjoint, 3
sensitivity, 10 general, 3
significance
p-values, 106 Type I and Type II errors, 108
slope regression line, 170 Type II error, 123
specificity, 10
SST, SSE, SSR, 176 uncorrelated, 43, 172
standard deviation, 32
Standard Normal rv, 29 variance, 32

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy