0% found this document useful (0 votes)

279 views116 pages

StudyGuide001 2015 4 B STA1502

This document provides an orientation and study guide for STA1502 Statistical Inference I. It includes: 1) An introduction to the course, which focuses on statistical inference using information from samples to draw conclusions about populations. 2) An outline of the 5 study units covering chapters from the course textbook on topics like comparing populations, analysis of variance, chi-squared tests, regression, and time series analysis. 3) Recommendations to read the textbook and study guide together, complete exercises independently before checking solutions, and maintain a positive attitude toward regularly studying statistics.

Uploaded by

Jason

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

279 views116 pages

StudyGuide001 2015 4 B STA1502

Uploaded by

Jason

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 116

STA1502/1

Department of Statistics

STA1502
Statistical inference I

Study guide for STA1502

i STA1502/1

CONTENTS
ORIENTATION iii

STUDY UNIT 1
1.1 Introduction 1
1.2 Inference about the Difference Between Two Population Means: 1
Independent Samples
1.3 Observational and Experimental Data 9
1.4 Inference about the Difference Between Two Population Means: 9
Matched Pairs Experiment
1.5 Inference about the Ratio of Two Variances 19
1.6 Self-correcting Exercises for Unit 1 22
1.7 Solutions to Self-correcting Exercises for Unit 1 23
1.8 Learning Outcomes 27

STUDY UNIT 2
2.1 Introduction 28
2.2 Inference about the Difference Between Two Population Proportions 28
2.3 One-Way Analysis of Variance 34
2.4 Multiple Comparisons 43
2.5 Analysis of Variance experimental designs (read only) 47
2.6 Randomized Block(two-way) Analysis of Variance 47
2.7 Self-correcting Exercises for Unit 2 51
2.8 Solutions to Self-correcting Exercises for Unit 2 52
2.9 Learning Outcomes 55

STUDY UNIT 3
3.1 Chi–square test 57
3.2 Chi-squared goodness-of-fit test 58
3.3 Chi-squared test of a Contingency Table 62
3.4 Summary of test on nominal data 64

STUDY UNIT 4
4.1 Simple linear regression and correlation 70
4.2 Estimating the coefficients 70
4.3 Error variable: required conditions 75
4.4 Assessing the model 76
4.5 Using the regression equation 77
4.6 Regression diagnostics 77
ii

STUDY UNIT 5
5.1 Non parametric statistics 82
5.2 Wilcoxon Rank Sum Test 82
5.3 Sign test and Wilcoxon signed rank sum test 86

STUDY UNIT 6
6.1 Time series analysis and time series forecasting 96
6.2 Components of time series and smoothing possibilities 96
6.3 Smoothing techniques 97
6.4 Trend and seasonal effects 100
6.5 Introduction to forecasting 102
6.6 Forcasting models 102
iii STA1502/1

ORIENTATION
Welcome
Welcome to STA1502. This module is the second one of the first-year statistics courses. STA1501
and STA1502 form the first year Statistics course for students from the College of Economic and
Management Sciences. If you are a BSc student in the College of Science, Engineering and
Technology, the three modules STA1501 and STA1502 and STA1503 form the first year in Statistics.

In the preceding module STA1501, we treated probability and probability distributions, and unless
one has a proper understanding of the laws of probability, the mechanisms underlying statistical data
analysis will not be understood properly. Probability theory is the tool that makes statistical inference
possible. In STA1502, we consider to the applications of the probability distributions. You have
learned in STA1501 that the shape of the normal distribution is determined by the value of the mean
µ and the variance σ 2 , whilst the shape of the binomial distribution is determined by the sample size
n and the probability of a success p. These critical values are called parameters. We most often
don’t know what the values of the parameters are and thus we cannot "utilise" these distributions (i.e.
use the mathematical formula to draw a probability density graph or compute specific probabilities)
unless we somehow estimate these unknown parameters. It makes perfect logical sense that to
estimate the value of an unknown population parameter, we compute a corresponding or comparable
characteristic of the sample.

The objective of this module is to focus on the issues related to prediction and inference in statistics
and therefore it is called Statistical Inference and the "I" in the title indicates that it is a module at
the first level. We draw inference about a population (a complete set of data) based on the limited
information contained in a sample. In dictionary terms, inference is the act or process of inferring;
to infer means to conclude or judge from premises or evidence; meaning to derive by reasoning.
In general, the term implies a conclusion based on experience or knowledge. More specifically in
statistics, we have as evidence the limited information contained in the outcome of a sample and
we want to conclude something about the unknown population from which the sample was drawn.
The set of principles, procedures and methods that we use to study populations by making use of
information obtained from samples is called statistical inference.

Learning outcomes
There are very specific outcomes for this module, listed below. Throughout your study of this module
you must come back to this page, sit back and reflect upon them, think them through, digest them
into your system and feel confident in the end that you have mastered the following outcomes:
iv

· Describing the behaviour of sample statistics in repeated sampling, focussing on sampling

distributions of the sample mean and the sample proportion.
· Evaluating the reliability of estimates of the population parameters with the use of the Central
Limit Theorem and the sampling distributions of the corresponding sample statistics.
· Considering point and interval estimators for single or compound population parameters.
· Basic concepts of large-sample statistical estimation and hypothesis testing involving population
means and proportions.
· Small-sample tests and confidence intervals for population means and proportions
· Employ three diferent non-parametric test to compare two populations of ordinal or interval data
when normality cannot be accepted.
· Applying the classical time series and its decomposition into trend, seasonal and random
variation.
· Measuring long-term trend using regression analysis and seasonal variation by computing
seasonal indexes.
· Describing four forecasting techniques, including the autoregressive model.

The prescribed textbook

For this module you have to study certain sections from six chapters of the prescribed textbook:

Keller, G. (2009) International Student Edition (8th edition) Managerial Statistics,

South Western, a part of Cengage Learning.

Chapter 13: INFERENCE ABOUT COMPARING TWO POPULATIONS

Chapter 14: ANALYSIS OF VARIANCE (not 14.5 and 14.6)
Chapter 15: CHI-SQUARED TESTS
Chapter 16: SIMPLE LINEAR REGRESSION AND CORRELATION
Chapter 19: NONPARAMETRIC STATISTICS (only 19.1 and 19.2)
Chapter 20: TIME SERIES ANALYSIS AND FORECASTING

The study guide

The study guide may be better describes as a textbook guide because it guides you through the
textbook in a systematic way. It is no substitute for the textbook, where the different topics are
explained in detail. You have to use the two together as the guide supplements with additional
exercises and longer explanations, but is not repeating the basic theoretical knowledge.This study
guide serves as an interactive workbook, where spaces are provided for your convenience. Should
v STA1502/1

you so prefer, you are welcome to write and reference your solutions in your own book or file, if the
space we supply is insufficient or not to your liking.

Study Units and workload

We realise that you might feel overwhelmed by the volumes and volumes of printed matter that
you have to absorb as a student! How do you eat an elephant? Bite by bite! We have divided
the 6 chapters of the textbook into 5 study units or "sessions". Make very sure about the sections
indicated in each study unit since some sections of the textbook are excluded and we do not want
you frustrated by working through unnecessary work. Regular contact with statistics will ensure that
your study becomes personally rewarding.

Try to work through as many of the exercises as possible

Doing exercises on your own will not only enhance your understanding of the work, but it will give you
confidence as well. Feedback is given immediately after the activity to help you check whether you
understand the specific concept. The activities are designed (i.e. specific exercises are selected) so
that you can reflect on a concept discussed in the textbook. You can only obtain maximum benefit
from this activity-feedback process if you discipline yourself not to peep at the solution before you
have attempted it on your own!

Final word: Attitude

We know that many of you have some "math anxiety" to deal with, but we will do our best to make
your statistics understandable and not too theoretic. Studying statistics is sometimes not "exciting"
or "fun" but keep in mind that the considerable effort to master the content of this module can be very
rewarding. We claim that knowledge of statistics will enable you to make effective decisions in your
business and to conduct quantitative research into the many larger and detailed data sources that
are available. Statistical literacy will enable you to understand statistical reports you might encounter
as a manager in your business.

We are there to assist you in a process where you shift yourself from a supported school learner to
an independent learner. Studying through distance education is neither easy nor quick. There will
be times when you feel frustrated and discouraged and then only your attitude will pull you through!
You are the master of your own destiny.

In a paper by Sue Gordon1 (1995) from the University of Sydney, the following metaphor is given:
"The learning of statistics is like building a road. It’s a wonderful road, it will take you to places you
did not think you could reach. But when you have constructed one bit of road you cannot sit back and
think ‘Oh, that’s a great piece of road!’ and stop at that. Each bit leads you on, shows the direction
to go, opens the opportunity for more road to be built. And furthermore, the part of the road that
1
Gordon, Sue (1995) A theoretical Approach to Understanding Learners of Statistics. Journal of Statistics Education
v. 3, n.3 University of Sydney.
vi

you built a few weeks ago, that you thought you were finished with, is going to develop pot holes
the instant you turn your back on it. This is not to be construed as failure on your part, this is not
inadequacy. This is just part of road building. This is what learning statistics is about: go back and
repair, go on and build, go back and repair."

A few logistical problems

(You can skip the following section if you have read through it when you did STA1501.)

Decimal comma or point?

We realise that in the South African schooling system commas are used to indicate the decimal digit
values. You have been penalised at school for using a point. Now we sit between two fires: the
school system and common practice in calculators and computers! Most computer packages use
decimal points (ignoring the option to change it) and Keller (the author) also uses the decimal point
in our textbook (Statistics for Management and Economics). Therefore we use the decimal point in
our study guide, assignments and examination.
vii STA1502/1

Role of computers and statistical calculators:

The emphasis in the textbook is well beyond the arithmetic of calculating statistics and the focus is
on the identification of the correct technique, interpretation and decision making. This is achieved
with a flexible design giving both manual calculations and computer steps.

Every statistical technique that needs computation is illustrated in a three-step approach:

Step 1 MANUALLY
Step 2 EXCEL
Step 3 MINITAB

It is a good idea that you initially go through the laborious manual computations to enhance your
understanding of the principles and mathematics but we strongly urge you to manage the Excel
computations because using computers reflects the real world outside. The additional advantage of
using a computer is that you can do calculations for larger and more realistic data sets. Whether
you use a computer program or a statistical calculator as tool for your calculations is irrelevant to us.
However, the emphasis in this module will always be on the interpretation and how to articulate the
results in report writing.

CD Appendixes and A Study Guide are provided on the CD-ROM (included in the textbook) in pdf
format . The slide shot below is just to give you an idea of some of the topics covered. Although it will
not be to your disadvantage if you do not use the CD, we encourage you to try your best to have at
least a few sessions on a computer. Statistical Software makes Statistics exciting - so, play around
on the computer should you have access!
viii

Some Key Terms/Symbols

Sampling distribution of the sample proportion

Standard error of the proportion
Sampling distribution of the difference between two sample means
Standard error of the difference between two means
Pooled variance estimator
Matched pairs experiment
Degrees of freedom
Pooled proportion estimator
Response variable
Sum of squares for error
Multinomial experiment
Least squares method
Distribution–free methods
Random variation
Trend analysis
1 STA1502/1

STUDY UNIT 1
1.1 Introduction
You should not attempt to do this module without knowledge of the contents of STA1501 as it is a
continuation in the same textbook of the follow-up chapters. Chapters 2 and chapters 4 - 12 were
covered in STA1501 and we now continue with Chapter 13. In chapter 12 you learnt about statistical
inference for a single population and derived hypothesis tests and confidence intervals from the
information contained in a single sample. You did this for

• the population mean µ

• the population variance σ 2

• the population proportion p

In this study unit we will focus on statistical inference for two populations and derive hypothesis
tests and confidence intervals from the information contained in two separate samples. Recall how
a confidence interval is derived for (µ1 − µ2 ) using the sampling distribution of (X 1 − X 2 ). Similar
to the practical problems with inference for a single population mean, µ, you will understand that we
again work with a t-distribution because of the more realistic set-up where we assume that both the
population variances are unknown and we have to estimate them.

1.2 Inference about the difference between two means:

Independent samples

STUDY
Keller Chapter 13 Inference about comparing two populations
13.1 Inference about the difference between two means: independent samples

Make sure that you understand figure 13.1 of Keller: Note that we need subscripts to distinguish
between the parameters of two different variables!
We are now sampling from two independent populations where the means of the populations are
our focus.
2

The derivation of the test statistic is based on the three assumptions:

1. We have two independent populations from which we draw small random samples.
2. Both populations have normal distributions.
3. Both populations have the same variance, i.e. σ 21 = σ 22 = σ 2 .

In statistical notation we summarise this as follows:

If we have a random sample of size n1 from a n(µ1 ; σ 21 ) population and an independent random
Σ(x1i − x1 )2 + Σ(x2i − x2 )2
e2 =
sample of size n2 from a n(µ2 ; σ 22 ) population, σ = s2pooled
n1 + n2 − 2
is the pooled estimate of the unknown common variance assuming that σ 21 = σ 22 .

[ :-) I like to add the subscript "pooled" to remind me that it is a combined/composed variance
and not the subscript consisting only of "p" as Keller does!]

(x1 − x2 ) − (µ1 − µ2 )
The test statistic is t(x1 −x2 ) = t u which has a t-distribution with υ = (n1 +n2 −2)
2 1 1
spooled +
n1 n2
degrees of freedom.

[ :-) I like to add the subscript " (x1 − x2 )" to t to remind me that it is a different t-statistic from
what we used in chapter 12 of Keller ]

The test statistic can be used directly to perform a hypothesis test or be manipulated to create a
lower and an upper bound for the confidence interval.

The null hypothesis H0 : (µ1 − µ2 ) = D0 may be tested at the α% level of significance against one
of the following alternatives:

(i) H1 : (µ1 − µ2 ) = D0 or
(ii) H1 : (µ1 − µ2 ) < D0 or
(iii) H1 : (µ1 − µ2 ) > D0

The symbol D0 implies a known, specified difference under H0 and is usually (mostly!)
the value 0, indicating that we are testing H0 : µ1 = µ2 .
3 STA1502/1

To obtain a (1 − α)100% confidence interval estimate for the difference between the two
populations means, (µ1 − µ2 ), we compute
u
1 1
(x1 − x2 ) ± t α2 ;(n1 +n2 −2) s2pooled ( + )
n1 n2

t α2 ;(n1 +n2 −2) is obtained from Table 4 (see Appendix B, Keller) as

P (−t α2 ;(n1 +n2 −2) ≤ t ≤ t α2 ;(n1 +n2 −2) ) = (1 − α)

tn +n -2
1 2

"/2 1-" "/2

-t"/2;n +n -2
1 2
0 t"/2;n +n -2
1 2

After you have studied section 13.1 of chapter 13 of the textbook you should try and work through
activities 1.1 and 1.2 to enhance your understanding of a hypothesis test for the difference between
two population means.

Activity 1.1
Say whether the following statements are correct or incorrect and try to rectify the incorrect
statements to make them true.

(n1 − 1)s21 + (n2 − 1)s22 Σ(x1i − x1 )2 + Σ(x2i − x2 )2

(a) s2pooled = =
n1 + n2 − 2 n1 + n2 − 2

.............................................................................. ..............................................................................

.............................................................................................................................................................

v
s21 s2
(b) If we derive a confidence interval for (µ1 − µ2 ) we use SE = + 2
n1 n2
u
1 1
but if we test H0 : µ1 = µ2 we use SE = s2pooled ( + ) .
n1 n2

.............................................................................................................................................................

.............................................................................................................................................................
4

(c) In a one-tailed test for the difference between two population means, (µ1 − µ2 ), if the null
hypothesis is rejected when the alternative hypothesis, H1 : µ1 < µ2 is false, a Type I error
is committed.

.............................................................................................................................................................

Feedback Feedback

(a) Correct. With a little algebraic manipulation it follows from the definitions of
Σ(x1i − x1 )2 Σ(x2i − x2 )2
s21 = and s22 = that (n1 − 1)s21 = Σ(x1i − x1 )2
n1 − 1 n2 − 1
and that (n2 − 1)s22 = Σ(x2i − x2 )2 .
u
1 1
(b) Incorrect. We use SE = s2pooled ( + ) for both the hypothesis test and the confidence
n1 n2
interval!

You will find that in most of the exercises on this section, whether they are for an assignment, the
examination or exercises in Keller, the information you have to work with will either be

• · raw data for two samples, or

• summarised data given in a table format as

Population
1 2
Sample size n1 n2
Sample mean x1 x2
Sample variance s21 s22

There could be "variations" on the theme of summarised data where computed sums are given
instead of sample statistics, e.g. Σx1i instead of x1 or Σx21i and Σx1i instead of s21 .
In the case of raw data, you must try to have at least a Scientific Pocket Calculator with Statistical
Functions that will enable you to compute the sample statistics:
5 STA1502/1

Activity 1.2
Psychologists have claimed that the scores on a tolerance measurement scale have a normal
distribution. Suppose that this scale is administered to two independent random samples of males
and females and their tolerance towards other road users is measured. (The higher the score, the
more tolerant you are.) The following scores were obtained:

Males: 12 8 11 14 10
Females: 15 12 14 11 13 14 12

(a) Test H0 : µmales = µf emales against the alternative H1 : µmales = µf emales .

Use α = 0, 01 and assume that σ 21 = σ 22

............................... ............................... ............................... ............................... ...............................

.............................................................................................................................................................

............................... ............................... ............................... ............................... ...............................

.............................................................................................................................................................

.............................................................................................................................................................
6

(b) Compute a 99% confidence interval for the difference (µmales − µf emales ). How do you interpret
this interval?

.............................................................................................................................................................

............................... ............................... ............................... ............................... ...............................

.............................................................................................................................................................

.............................................................................................................................................................
7 STA1502/1

Feedback Feedback

(a) Step 1:
We have to test H0 : µmales = µf emales =⇒ H0 : (µ1 − µ2 ) = 0
against H1 : µmales = µf emales =⇒ H1 : (µ1 − µ2 ) = 0

Step 2:
(x1 − x2 ) − (µ1 − µ2 )
We use the test statistic t(x1 −x2 ) = u ∼ t(n1 +n2 −2) .
2 1 1
spooled ( + )
n1 n2

Σx1i
x1 = = 55
5 = 11 ;
n1

Σx2i
x2 = = 91
7 = 13 ;
n2

(n1 − 1)s21 + (n2 − 1)s22 Σ(x1i − x1 )2 + Σ(x2i − x2 )2 20 + 12 32

s2pooled = = = = = 3.2
n1 + n2 − 2 n1 + n2 − 2 5+7−2 10

(x1 − x2 ) − (µ1 − µ2 ) (11 − 13) − 0 −2

Hence, t = u = t =√ = −1.9094.
2 1 1 3.2( 1
+ 1
) 1.097 1
spooled ( + ) 5 7
n1 n2

Step 3:
Find the critical values.
From Table 4 (see Appendix B, Keller) we find t α2 ; (n1 +n2 −2) = t 0.01 ; (5+7−2) = t0.005; 10 = 3.169 which
2

means we will reject H0 if t ≥ 3.169 or if t ≤ −3.169.

Since −3.169 < −1.9094 < 3.169 we cannot reject the null hypothesis, and conclude that there is not
a significant difference between the means of the males and the females.
v
1 1 t
(b) (x1 − x2 ) ± t α2 ;(n1 +n2 −2) 2
Spooled + = (11 − 13) ± (3.169) 3.2 15 + 17
n1 n2
√
= −2 ± (3.169) 1.097 1

= −2 ± 3.3194

= (−5.3194 ; 1.3194).
8

We are 99% confident that the unknown difference (µmales − µf emales ) will be between −5.3194
and 1.3194. We see that (−5, 3194; 1, 3194) includes the null value, which implies that we are 99%
confident that the mean for the males is the same as the mean for the females.

[Extra explanation: We translate the phrase "the mean for the males is the same as the mean for
the females" as µmales = µf emales which is in general µ1 = µ2 . But, if µmales = µf emales it implies
that (µ1 − µ2 ) = 0.
So, to conclude that µmales = µf emales we have to check whether zero is included in the confidence
interval. ]

(c) We conclude from questions (a) and (b) that using a two-sided confidence interval and performing
a two-sided hypothesis test must always lead to the same conclusion because it is a different
"juggle" of the same information! This is indeed the case with this exercise!

You will find that in most of the exercises on this section, whether they are for an assignment, the
examination or exercises in Keller, we will simply state: " Assume that.....blah-blah-blah" and then we
conveniently take care of the assumptions of normality and equal variances! But, strictly speaking,
we should have first checked whether these conditions are met before we proceed with the test.
There exist additional preliminary tests where we can formally test for normality and for the equality
of variances. The tests for normality are covered in detail in your second-year statistics syllabus.
Most statistical packages will provide you with a statistical test to formally test H0 : σ 21 = σ 22 . In
the module STA2601 you will be formally introduced to the statistical package JMP. In case you do
not continue with statistics but anyhow apply your first-year knowledge using a statistical package of
your own choice, be aware that most statistical software packages will automatically include a test
for the equality of variances when you request to do a test for means! (This also happens when you
request to do an ANOVA test for means – a procedure you will learn about in the following study unit.)
The output for the test for the equality of variances will be a so-called F -test. An F-test, in general, is
basically the ratio of two quantities – in this application two variances. The p-value associated with
the F -test could be interpreted exactly like you have learned to do for any other test. If it is significant
(i.e. p-value < α) you will reject H0 : σ 21 = σ 22 .
9 STA1502/1

1.3 Observational and Experimental Data

STUDY
Keller Chapter 13 Inference about comparing two populations
13.2 Observational and experimental data

Although this is a section of less than two pages, it is vitally important to grasp what Keller wants to
convey and to always keep this in mind whenever you interpret results.

1.4 Inference about the Difference Between Two

Population Means: Matched Pairs Experiment

STUDY
Keller Chapter 13 Inference about comparing two populations
13.3 Inference about the Difference Between Two Population Means:
Matched Pairs Experiment

Have you noticed when we derived the sampling distribution of (x1 − x2 ), we used the fact that
s2 s2
E(x1 − x2 ) = (µ1 − µ2 ) ...........(the minus sign stays), but that var(x1 − x2 ) = ( 1 + 2 ) ...............(the
n1 n2
minus sign disappears)?

(Yes, there is a plus sign even though you might expect a minus sign!) In other words, if we create
a new variable by subtracting two variables, the variance of this new variable will – provided they
are independently distributed – be the sum of the variances of the two original variables.

Strictly speaking there is (in general) a third term that takes care of the dependency between the two
variables. We did not even bother to mention it in section 1.1 because this dependency term falls
away if we assume that X and Y are independent.

However, if we cannot assume that we have two samples from two independent populations, we
have a problem with var(x1 − x2 ).
Σ(x1i − x1 )2 + Σ(x2i − x2 )2
e2 =
Using σ = s2pooled is not valid anymore!
n1 + n2 − 2
So, whenever there is a "connectedness" between one set of values (sample 1) and the second
set of values (sample 2), we could take care of the dependency by treating the data as matched
pairs. We remove the dependency by reducing the two samples to one set of scores. This would
immediately imply that n1 = n2 .
10

Thus, we create a single random sample by taking the paired differences di = x1i − x2i . With a little
adaptation (and imagination) we are now back to the set-up discussed in STA1501 (depending on
whether we consider the sample as having a known or unknown population variance!) i.e. go back
to Keller regarding the topics:

11.2 Testing the Population Mean when the

Population Standard deviation is Known
and
12.1 Inference about a Population Mean
when the Standard deviation is Unknown

Comparing the means of two dependent data sets is always a separate choice (or sub-menu
in computer jargon) of the test procedures available for testing means (main-menu in computer
jargon) in any statistical software package. It is generally known as a “paired samples t-test” and
observations of a single sample, obtained by first taking the differences, are used.

Now the formula for the test statistic is

xD − 0
t= √
sD / n
where

1S
xD = mean difference between the paired observations = di
n
sD = standard deviation of the differences di
nD = number of paired observations.

For dependent observations, the hypothesis test for the difference between the two means therefore
boils down to the hypothesis test for a single sample.
H0 : µX = µY is the same as H0 : µD = 0 .

It is interesting to note that in the paired observations test, the degrees of freedom are half of what
they are if the samples are not paired. (When the samples are not paired two kinds of variation are
present: differences among the groups and differences among the subjects.)
11 STA1502/1

Activity 1.3
Say whether the following statements are correct or incorrect and try to rectify the incorrect
statements to make them true.

(a) Repeated measurements from the same individuals constitute an example of data collected from
matched pairs experiment.

.............................................................................. ..............................................................................

.............................................................................................................................................................

(b) The number of degrees of freedom associated with the t-test, when the data are gathered from a
matched pairs experiment with 8 pairs, is 7.

.............................................................................................................................................................

(d) In comparing two population means of interval data, we must decide whether the samples are
independent (in which case the parameter of interest is µ1 − µ2 ) or matched pairs (in which case
the parameter is µD ) in order to select the correct test statistic.

.............................................................................................................................................................

(e) When comparing two population means using data that are gathered from a matched pairs
experiment, the test statistic for µD has a Student t-distribution with ν = nD − 1 degrees of
freedom, provided that the differences are normally distributed.

.............................................................................................................................................................

.............................................................................................................................................................
12

Feedback Feedback

(a) Correct.
(b) Correct.
(c) Incorrect. We may say that the matched pairs produce a smaller estimated SE because we
eliminate the often considerable variability due to individual variation in the seperate samples.
(d) Correct.
(e) Correct.
13 STA1502/1

Activity 1.4
Suppose that person A believes that sons, upon maturity, are in general taller than their fathers.
Person B, on the other hand, argues that the opposite is true. In order to investigate this issue, we
measure the heights of a random sample of nine father-son pairs. The following are the results (in
cm):

Pair 1 2 3 4 5 6 7 8 9
Son 185 173 168 178 188 173 165 183 175
Father 180 175 160 178 183 175 160 173 178

(a) Perform the appropriate test to solve this issue. Use α = 0, 05 .

.............................................................................................................................................................

(b) Find a 95% confidence interval estimate for (µ1 − µ2 ), the mean difference in heights of fathers
and sons.

.............................................................................................................................................................

.............................................................................................................................................................
14

Feedback Feedback

We have dependent (paired) observations and we need to work with the differences of the pairs,

di = length of son − length of father.

(a) Hypothesis test: We have to test

H0 : µD = 0 vs H1 : µD = 0

where

di : 5 −2 8 0 5 −2 5 10 −3

The test statistic is

xD − 0
t= √
sD / n
where
S
di 26
xD = = = 2.889 ;
9 9
S 2
di − nx2D 256 − 9(2.889)2
s2D = = = 22.611 ;
(n − 1) 8
√
sD = 22.611 = 4.755 .
Therefore,

2.889 − 0
t = √
4.755/ 9

= 1.8227 .

Decision rule
Since t ∼ tn−1 we will reject H0 if t ≤ −t0.025; 8 or if t ≥ t0.025; 8 .
From Table 4 (see Appendix B, Keller) t0.025; 8 = 2.306.
Since 1.8227 < 2.306 we cannot reject H0 . The height of sons and fathers do not differ
significantly at the 5% level of significance.
15 STA1502/1

(b) For a 95% confidence interval we need t α2 ;(n−1) = t 0.05 ; (9−1) = t0.025; 8 = 2.306.
2

The interval is computed as

sD 4.755
xD ± (t α2 ;(n−1) ) √ = 2.889 ± (2.306) √
n 9

= 2.889 ± 3.655

= (−0.766; 6.544).

We are 95% confident that the mean difference in heights of fathers and sons is between −0.766
and 6.544. (Sons seem to be taller than their fathers but not significantly.)

Activity 1.5
Question 1
In testing the hypothesis H0 : µD = 5 vs. H1 : µD > 5, two random samples from two
dependent normal populations produced the following statistics: xD = 9, nD = 20, and sD = 7.5.
What conclusion can we draw at the 1% significance level?

.............................................................................................................................................................

Question 2
Promotional Campaigns
The general manager of a chain of fast food chicken restaurants wants to determine how effective
their promotional campaigns are. In these campaigns “20% off” coupons are widely distributed.
These coupons are only valid for one week. To examine their effectiveness, the executive records
the daily gross sales (in R1000’s) in one restaurant during the campaign and during the week after
the campaign ends. The data is shown below.
16

Sales during Sales after

Day
Campaign Campaign
Sunday 18.1 16.6
Monday 10.0 8.8
Tuesday 9.1 8.6
Wednesday 8.4 8.3
Thursday 10.8 10.1
Friday 13.1 12.3
Saturday 20.8 18.9

(a) Can they infer at the 5% significance level that sales increase during the campaign?

.............................................................................................................................................................

(b) Find the 95% confidence interval for the difference in sales during the week.

.............................................................................................................................................................

.............................................................................................................................................................
17 STA1502/1

Feedback Feedback

Question 1
xD − µD
t = √
sD / nD

9−5
= √
7.5/ 20

= 2.385

Decision rule
Since t ∼ tnD −1 we will reject H0 if t ≥ t0.01; 20−1 = 2.539 (from Table 4, Appendix B, Keller).
Since 2.385 < 2.539 we cannot reject H0 at the 1% level of significance.

Question 2
We have dependent (paired) observations and we need to work with the differences of the pairs.

di = sales during campaign − sales after campaign .

di : 1.5 1.2 0.5 0.1 0.7 0.8 1.9

(a) Hypothesis test: We have to test

H0 : µD = 0 vs H1 : µD > 0.

The test statistic is

xD − 0
t= √
sD / n
where
S
di 6.7
xD = = = 0.957 14 ;
7 7
S 2
di − nx2D 8.69 − 7(0.957 14)2
s2D = = = 0.37953 ;
(n − 1) 6
√
sD = 0.37953 = 0.616 06.
18

Therefore,

0.957 14 − 0
t = √
0.61606/ 7

= 4.111 .

Decision rule
Since t ∼ tn−1 we will reject H0 if t ≥ t0.05; 6 .
From Table 4 (see Appendix B, Keller) t0.05; 6 = 1.943.
Since 4.111 > 1.943 we reject H0 . Yes, they may infer at the 5% significance level that sales
increase during the campaign.

(b) For a 95% confidence interval we need t α2 ;(n−1) = t 0.05 ; (7−1) = t0.025; 6 = 2.447.
2

The interval is computed as

sD 0.61606
xD ± (t α2 ;(n−1) ) √ = 0.957 14 ± (2.447) √
n 7

= 0.957 ± 0.57

= (0.387; 1.527).

We are 95% confident that the mean difference in sales is between 0.387 and 1.527 thousand
rand.

(c) We estimate that the daily sales during the campaign increase on average between 0.387 and
1.527 thousand rand.
19 STA1502/1

1.5 Inference about the Ratio of Two Variances

STUDY
Keller Chapter 13 Inference about comparing two populations
13.4 Inference about the Ratio of two variances

The interest in this section is on variablilty in two populations, using the F -tables. It is a small but
significant section in Keller.

You have to know that

· the sample variance is an unbaised, consistent estimator of the population variance.

· sampling took place independently from two normal populations.
S12 σ 21
· the statistic is the estimator of the parameter .
S22 σ 22

S12
· is F distributed.
S22

· with some mathematical manipulation of previous knowledge on chi-squared distributed quantities

S12 /σ 21
we derive that has an F distirbution with ν 1 = n1 − 1 and ν 2 = n2 − 1 degrees of freedom.
S22 /σ 22

The hypothesis testing follows the same pattern as you have had in previous sections, namely
- define the null and alternative hypotheses according to the information given in the question (they
σ2
have to involve the parameter 12 )
σ2

S12
- know that F = is the test statistics with ν 1 = n1 − 1 and ν 2 = n2 − 1 degrees of freedom
S22

S12 1
- the LCL is with ν 1 = n1 − 1 and ν 2 = n2 − 1
S22 F α ,ν 1 ,ν 2
2

S12
- theUCL is F α ,ν 2 ,ν 1 with ν 1 = n1 − 1 and ν 2 = n2 − 1
S22 2

- Find the cutt-off value for the rejection region from the F -table.

Note that the table gives values for F α ,ν 1 ,ν 2 = F α ,ν 2 ,ν 1 .

2 2
You must therefore make sure that you know what to use for the upper and lower limits in the
examination and read off the correct value from the table.
20

If you are not sure about finding these table-values, page back to Chapter 8, 8.4 Other Continuous
Distributions in Keller, where the F-distribution is explained in detail. This section formed part of the
STA1501 (STS1113) syllabus. The advantage of using the same textbook for the modules STA1501
and STA1502 is that you can go back to previous knowledge whenever needed.

Study the example Testing the quality of two-bottle filling in detail so you can understand the
procedure of this ratio of variances test.

Activity 1.6
Question 1
σ 21
In constructing a 90% interval estimate for the ratio of two population variances, , two independent
σ 22
samples of sizes 40 and 60 are drawn from the populations. If the sample variances are 515 and 920,
then the lower confidence limit is:

1. 0.244
2. 0.352
3. 0.341
4. 0.890
5. 0.918

Question 2

An experimenter is concerned that variability of responses using two different experimental

procedures may not be the same. He randomly selects two samples of 16 and 14 responses from
two normal populations and gets the statistics: S12 = 55, and S22 = 118, respectively.

a) Do the sample variances provide enough evidence at the 10% significance level to infer that the
two population variances differ?

b) Estimate with 90% confidence the ratio of the two population variances.

c) Describe what the interval estimate tells you and briefly explain how to use the interval estimate
to test the hypotheses.
21 STA1502/1

Feedback Feedback

Question 1

S12 1
The formula for the LCL is and you have to substitute the correct values into this
S22 F α ,ν 1 ,ν 2
2

formula.

S12 515
=
S22 920
= 0.55978..

α
Go to the F -table with heading 0.05 (because α = 0.1 and you need ) and where the values for 40
2
and 60 meet, you will read off the value 1.59.

S12 1 1
· = 0.55978 ·
S22 F α ,ν 1 ,ν 2 1.59
2
= 0.352, which is option 2

Question 2
σ 21 σ 21
a) H0 : = 1 versus H0 : =1
σ 22 σ 22

1 1
Rejection region:F > F0.05,15,13 = 2.53 or F < F0.95,13,15 = = ≈ 0.408
F0.05,13,15 2.45

55
Test statistics: F = = 0.466
118

Conclusion: Don’t reject the null hypothesis. No, the sample variances don’t provide enough
evidence at the 10% significance level to infer that the two population variances differ
22

b) The 90% confidence interval for the ratio of the two population variances:
2
S1 1
LCL = 2 ·
S2 F α
,ν 1 ,ν 2
2
55 1
= ·
118 F0.05,15,13
1
= 0.466 ·
2.53
= 0.1842
2
S1
U CL = · F α ,ν 2 ,ν 1
S22 2

55
= · F0.05,13,15
118
= 0.466 · 2.45
= 1.1417

σ 21
c) We estimate that the ratio lies between 0.1842 and 1.1417. Since the hypothesized value 1 is
σ 22
included in the 90% interval estimate, we fail to reject the null hypothesis at α = 0.10.

1.6 Self-correcting Exercises for Unit 1

Question 1
Do EXERCISE 13.1 of chapter 13 Keller.

Question 2
Do EXERCISE 13.7 of chapter 13 Keller.

Question 3
Do EXERCISE 13.41 of chapter 13 Keller.

Question 4
Do EXERCISE 13.43 of chapter 13 Keller.
23 STA1502/1

1.7 Solutions to Self-correcting Exercises for Unit 1

Question 1
Solution to 13.1

Assume equal variances.

To obtain a (1 − α)100% confidence interval estimate for the difference between the two
populations means, (µ1 − µ2 ), we compute
u
1 1
(x1 − x2 ) ± t α
;(n1 +n2 −2) s2pooled ( + )
2 n1 n2

(a) t α2 ;(n1 +n2 −2) = t0.025; 25+25−2 = 2.009

(n1 − 1)s21 + (n2 − 1)s22 (25 − 1)1292 + (25 − 1)1412

s2pooled = = = 18261
n1 + n2 − 2 25 + 25 − 2
u u
2 1 1 1 1
(x1 − x2 ) ± t α
;(n1 +n2 −2) spooled ( + ) = (524 − 469) ± 2.009 18261( + ) = 55 ± 76.7869
2 n1 n2 25 25

(n1 − 1)s21 + (n2 − 1)s22 (25 − 1)2552 + (25 − 1)2602

(b) s2pooled = = = 66312.5.
n1 + n2 − 2 25 + 25 − 2
u u
2 1 1 1 1
(x1 − x2 ) ± t α2 ;(n1 +n2 −2) spooled ( + ) = (524 − 469) ± 2.009 66312.5( + ) = 55 ± 146.33
n1 n2 25 25

(c) The interval widens if we increase the standard deviations.

(d) Now t α2 ;(n1 +n2 −2) = t0.025; 100+100−2 = 1.972 and

(n1 − 1)s21 + (n2 − 1)s22 (100 − 1)1292 + (100 − 1)1412

s2pooled = = = 18261.
n1 + n2 − 2 100 + 100 − 2
u u
2 1 1 1 1
(x1 − x2 ) ± t α2 ;(n1 +n2 −2) spooled ( + ) = (524 − 469) ± 1.972 18261( + ) = 55 ± 37.686
n1 n2 100 100

(e) The interval narrows if we increase the sample sizes.

Question 2
Solution to 13.7

Step 1:
We have to test H0 : (µ1 − µ2 ) = 0
against H1 : (µ1 − µ2 ) < 0.

Step 2:
We use the test statistic

(x1 − x2 ) − (µ1 − µ2 ) (351.5 − 381.83) − 0

t(x1 −x2 ) = v =u = −0.43
1 1 (6 − 1)6767.5 + (6 − 1)6653.4 1 1
s2pooled + +
n1 n2 6+6−2 6 6

Step 3:
Find the critical values.
From Table 4 (see Appendix B, Keller) we find tα; (n1 +n2 −2) = t0.10; (6+6−2) = t0.10; 10 = 1.372 which
means we will reject H0 if t < −1.372.
Since −0.43 < −1.372 we reject the null hypothesis, and conclude that the manager should
choose to use guards.
(Please note: Using statistical software you will find the p-value = 0.0795. Since p < α = 0.10 we
reject H0 .)
(Also note: Using statistical software you will find the Two-tail F-test: F = 1.24, p-value = 0.8194; =⇒
cannot reject H0 : σ 21 = σ 22 =⇒ it is valid to use the equal-variances test statistic.)

Question 3
Solution to 13.41

We have dependent (paired) observations and we need to work with the differences of the pairs.

di = ABS − non ABS .

speeds 20 25 30 35 40 45 50 55
di : 0.2 0.1 −0.3 −0.2 −0.5 −0.2 −0.2 −0.3

Hypothesis test: We have to test

H0 : µD = 0 vs H1 : µD < 0.

(Note that this depends on how you defined the difference: If ABS brakes are more effective
(implying less seconds!) than non-ABS brakes, it implies that (ABS − non ABS) would be a negative
value under the alternative hypothesis.)
25 STA1502/1

The test statistic is

xD − 0
t= √
sD / n
where
S
di −1.4
xD = = = −0.175 ;
8 8
S 2
di − nx2D 0.60 − 8(−0.175)2
s2D = = = 0.050714 ;
(n − 1) 7
√
sD = 0.050714 = 0.2252.

Therefore,

−0.175 − 0
t = √
0.225/ 8

= −2.199 .

Decision rule
Rejection region: We will reject H0 if t < − tα; n−1 .
From Table 4 (see Appendix B, Keller) t0.05; 7 = 1.895.
Since −2.199 < −1.895 we reject H0 . There is enough evidence at the 5% significance level that
ABS brakes are more effective (implying less seconds!) than non-ABS brakes.

Question 4
Solution to 13.43

We have dependent (paired) observations and we need to work with the differences of the pairs.

di = current fertilizer − new fertilizer .

plot 1 2 3 4 5 6 7 8 9 10 11 12
di : −4 −4 2 −1 2 2 −4 −5 2 −3 3 −2

(a) Hypothesis test: We have to test

H0 : µD = 0 vs H1 : µD < 0.

(Note that this depends on how you defined the difference: If the new fertilizer is more effective
than the current fertilizer, it implies that the difference in crop yields of (current fertilizer − new
fertilizer) would be negative under the alternative hypothesis.)
26

The test statistic is

xD − 0
t= √
sD / n
where
S
di −12
xD = = = −1.00 ;
12 12
S 2
di − nx2D 112 − 12(−1.00)2
s2D = = = 9.0909 ;
(n − 1) 11
√
sD = 9.0909 = 3.0151.
Therefore,

−1.00 − 0
t = √
3.02/ 12

= −1.15 .

Decision rule
Rejection region: We will reject H0 if t < − tα; n−1 since t ∼ tn−1 .
From Table 4 (see Appendix B, Keller) t0.05; 11 = 1.796.
Since −1.15 > −1.796 we cannot reject H0 . They may not infer at the 5% significance level that
the new fertilizer is more effective than the current fertilizer.

(b) For a 95% confidence interval we need t α2 ;(n−1) = t 0.05 ; (12−1) = t0.025; 11 = 2.201.
2

The interval is computed as

sD 3.02
xD ± (t α2 ;(n−1) ) √ = −1.00 ± (2.201) √ 12
n

= −1.00 ± 1.92

= (−2.92; 0.92).

We are 95% confident that the mean difference in crop yields is between −2.92 and 0.92.
(c) The differences are required to be normally distributed.
(d) No, the histrogram of the differences is bimodal.
(e) The data are experimental.
(f) The experimental design should be independent samples.
27 STA1502/1

1.8 Learning Outcomes

Use the following learning objectives as a checklist after you have completed this study unit to
evaluate the knowledge you have acquired.

Can you

· calculate the small-sample SE of (x1 − x2 ) under the assumption that σ 21 = σ22 ?

· perform a small-sample statistical test for the difference between two population means in the
case of independent random samples?

· derive a small-sample confidence interval for the difference between two population means
(µ1 − µ2 ) in the case of independent random samples?

· explain the difference between independent samples and dependent samples?

· apply Student’s t-distribution to a paired difference test?

· perform a small-sample statistical test for the difference between two population means in the
case of dependent random samples?

· derive a small-sample confidence interval for the difference between two population means
(µ1 − µ2 ) in the case of dependent random samples?

· use a confidence interval estimator to test hypotheses for the ration of two variances when two
independent samples are drawn from normal populations.

Key Terms/Symbols
t-distribution
F-distribution
degrees of freedom
dependent and independent random samples
paired difference test
28

STUDY UNIT 2
2.1 Introduction
In this study unit we tie some loose ends. We continue our inference about comparing two
populations, but we shift from means and comparing two variances to proportions. In the last section
we move back to means but extend it to more than two populations.

2.2 Inference about the Difference Between

Two Population Proportions

STUDY
Keller Chapter 13 Inference about comparing two populations
13.5 Inference about the Difference Between Two Population Proportions

We are now sampling from two independent populations where the proportions of the populations
have a certain attribute.

x1
If pe1 = is the proportion in a random sample of size n1 from a population with parameter p1 and
n1
x2
pe2 = is the proportion in a random sample of size n2 from a second independent population with
n2
x1 x2
( − ) − (p1 − p2 )
n n2
parameter p2 , we use the test statistic Z = u1 which has an approximate n(0; 1)
1 1
p(1 − p)( + )
n1 n2

distribution, to test the null hypothesis H0 : p1 = p2 .

p is called the "pooled estimate" and we compute it as

total number of successes in both samples x1 + x2

ppooled = = .
n1 + n2 n1 + n2

Please note that similar to the argument for the one-sample case, which we treated in STA1501
(STS1113) and Keller, chapter 12, the expression for the hypothesis test is not the same as the
SE expression which we will use when we derive a confidence interval for p1 − p2 . Computing a
pooled estimate makes sense only under the assumption that p1 = p2 (in other words "case 1" or
the hypothesis H0 : p1 − p2 = 0) which is absent when we construct a confidence interval.
29 STA1502/1

You must not be confused by the very "rare case" or "case 2" which Keller talks about. For this case
2 scenario the null hypothesis is H0 : p1 − p2 = D and the SE expression for the hypothesis test is
exactly the same as the SE expression which we will use when we derive a confidence interval for
p1 − p2 .

Please also note that in the one-sample case in STA1501 (STS1113) our rule of thumb was that np
and n(1 − p) must be greater than 5 for the inference to be valid. We extend these conditions to two
samples meaning that n1 p1 ; n1 (1 − p1 ) ; n2 p2 and n2 (1 − p2 ) must all be greater than 5 for the
inference to be "good".

After you have studied section 13.5 of chapter 13 of the textbook you should try and work through
activities 2.1 and 2.2 to enhance your understanding of a large sample test of hypotheses for the
difference between two binomial proportions.

Activity 2.1
Say whether the following statements are correct or incorrect and try to rectify the incorrect
statements to make them true.
u
pe1 (1 − pe1 ) pe2 (1 − pe2 )
(a) If we derive a confidence interval for (p1 − p2 ) we use SE = +
n1 n2
u
1 1 X1 + X2
but if we test H0 : p1 = p2 we use SE = p(1 − p)( + ) with p = .
n1 n2 n1 + n2

.............................................................................. ..............................................................................

.............................................................................................................................................................

(b) In testing a hypothesis about the difference between two population proportions (p1 − p2 ) , the z
test statistic measures how close the computed sample difference between two proportions has
come to the hypothesized value of zero.

.............................................................................................................................................................

.............................................................................................................................................................
30

(c) In a one-tailed test for the difference between two population proportions (p1 − p2 ), if the null
hypothesis is rejected when the alternative hypothesis, H1 : p1 > p2 , is false, a Type I error is
committed.

.............................................................................................................................................................

u
pe1 (1 − pe1 ) pe2 (1 − pe2 )
(d) If we derive a confidence interval for (p1 − p2 ), we use SE = +
n1 n2
u
pe1 (1 − pe1 ) pe2 (1 − pe2 )
and if we test H0 : p1 − p2 = 0.15, we will also use SE = + for the z test
n1 n2
statistic.

.............................................................................. ..............................................................................

.............................................................................................................................................................

Feedback Feedback

(a) Correct.
(b) Correct.
(c) Correct.
(d) Correct.
31 STA1502/1

Activity 2.2
A seed distributer, called Easy Grow Seeds, claims that 75% of a specific variety of maize, called
Golden Glow, will germinate. A random sample of n1 = 300 seeds was selected from this batch
and 207 germinated. Denote the population proportion of seeds that germinate as p1 . Suppose that
a second, independent seed distributer, called Seeds of All Kinds claims that 80% of their stock of
the same variety of maize, called Golden Glow, will germinate. (Denote this population proportion of
seeds that germinate as p2 .) From this population we draw a random sample of size n2 = 200 and
the number seeds that germinate in this sample is 153.

Test H0 : p1 = p2 against H1 : p1 = p2 at the 10% level of significance.

To draw a final conclusion show

(a) the use of critical values

(b) computation of the p-value

............................... ............................... ............................... ............................... ...............................

.............................................................................................................................................................

............................... ............................... ............................... ............................... ...............................

.............................................................................................................................................................

.............................................................................................................................................................
32

Feedback Feedback

To test the null hypothesis H0 : p1 = p2 ⇒ H0 : (p1 − p2 ) = 0.

x1 x2
( − ) − (p1 − p2 )
n n2
We use the test statistic Z = u1 , which has an approximate n(0; 1) distribution.
1 1
p(1 − p)( + )
n1 n2

total number of successes in both samples

ppooled =
n1 + n2
x1 + x2
ppooled =
n1 + n2
207 + 153
=
300 + 200

= 0.72
u
1 1
SEpooled (p1 − p2 ) = p(1 − p)( + )
n1 n2
t
1 1
= 0.72(1 − 0.72)( 300 + 200 )

= 0.040988
x1 x2
( − ) − (p1 − p2 )
n n2
Z = u1
1 1
p(1 − p)( + )
n1 n2

( 207 153
300 − 200 ) − 0
=
0.040988
−0.075
=
0.040988

= −1.8298

(a) Find the critical values:

For a two-tailed test with α = 0.10, we will reject H0 if |Z| > z α2 = z0.05 . (This implies we will reject
H0 if Z ≥ 1.645 or if Z ≤ −1.645.)
From TABLE 3 we find z α2 = z0.05 = 1.645.
33 STA1502/1

Since |Z| = |−1.8298| = 1.8298 > 1.645 =⇒ we reject H0 . It seems likely that the two populations
do not have the same proportions.

Extra explanation:
With a confidence interval our focus is on the inside of the probability statement and with a
hypothesis test our focus is on the outside of the probability statement. For example, for a 90%
confidence interval

P (−1.645 ≤ Z ≤ 1.645) = 0.90

which implies that

P (Z ≤ −1.645) + P (Z ≥ 1.645) = α

0.05 0.05

-1.645 0 1.645
Rejection region Rejection region
Two-sided hypothesis test using α = 0.10

(b) Compute the p-value:

Since the alternative hypothesis is two-tailed we need to double the probability of observing a
value of the test statistic or more extreme.
p-value= 2 × P (Z ≤ −1.8298) = 2(0.0336) = 0.0672
Since 0.0672 < 0.10 (the p-value < α) we reject H0 and come to the same conclusion!
34

2.3 One-Way Analysis of Variance

In section 1.2 of study unit 1 we compared the means of two independent samples. What happens
when we have more than two independent samples? We perform a test for means called analysis
of variance, or AN-O-VA! Keller explains this technique very well in section 14.1 of the textbook. If
you are happy with the technique and understand how to apply it, you can go directly to the activities
to assess your understanding of ANOVA. If not, you can work through my alternative explanation.

STUDY
Keller Chapter 14&& Analysis of Variance (not 14.5 and 14.6)
14.1 One-Way Analysis of Variance

The idea behind ANOVA

What many students find confusing is why a test for means is called analysis of variance! The secret
of ANOVA is that it was developed from the fact that we can make three different estimates of σ 2
from the data. What do we mean by this?

Suppose we mix the data observations of the k different groups in one big box and disregard for a
moment which score belongs to which group. Even as one big sample, the scores are not the same!
They vary from a smallest score to a largest score – hence we have variability.

Consider (n1 + n2 + n3 + ...nk ) = n as one big sample of n values. Now, suppose that H0 : µ1 = µ2 =
µ3 = .... = µk is true! Then the k "mixed-together" groups can be considered as one big random
sample from the same population! This can of course only be true under the original assumption
of equal variances, i.e. that σ 21 = σ 22 = .... = σ 2k for the k populations. We denote this common
population variance by σ 2 (say).
From estimation theory we learn that the most efficient, unbiased point estimator of a population
variance, σ 2 in general, is given by the sample variance.

Σ(xi − x)2 Σ(xi − x)2

In other words if s2 = , it follows that E( ) = σ2.
(n − 1) (n − 1)

So, how do we apply this to our new set-up?

Sorting out the statistical notation:

If we combine the k samples to form a single sample of size n, then the variation of the n individual
scores from a single overall sample mean is called the "Total Variance". How must we write this
down?
35 STA1502/1

Σ(xi − x)2
An expansion of the notation to elegantly accommodate different xi -values from different
(n − 1)
samples would be to use an x with a double subscript ij instead of just i as well as double
summation instead of single summation. You might wonder why on earth we would like to do this,
but the beauty of this notation is that it allows us to keep track of every single observation from every
possible sample!

So, for a k sample data set we write xij where j = 1 or 2 or 3 or ...k (the number of samples) and
i = 1; 2; 3...nj (the size of the individual sample).

[:-) This means if j = 1 we list the first group as x11 ; x21 ; x31 ; ...up to xn1 1 (for our first data set) and
if j = 2 we list the second group as x12 ; x22 ; x32 ; ...up to xn2 2 (for our second data set) etc. up to
x1k ; x2k ; x3k ; ...up to xnk k (for our k -th data set).]

Even though we momentarily consider the data as one sample, we can still calculate (k + 1) possible
different means. There is of course the overall mean of all the observations, indicated as x, and then
there are also x1 and x2 etc. up to xk for the k respective group means.
n
Σkj=1 Σi=1
j
xij
x=
n
Σni=1
1
xi1
x1 =
n1

Σni=1
2
xi2
x2 =
n2

↓
..
.

Σni=1
k
xik
xk =
nk

Three possible estimates for the population variance σ 2 :

[1] The total sum of squares of deviations from the overall mean is given as SST otal.
n
Let SST otal = Σkj=1 Σi=1
j
(xij − x)2 then

SST otal
E = σ2 .
n−1

n
[2] Let SSW ithin = Σkj=1 Σi=1
j
(xij − xj )2 then

SSW ithin
E = σ2 .
n−k
36

SSW ithin
provides an accurate estimate of σ 2 , whether or not the sample means are
n−k
equal.

[ :-) Please note that if we rewrite Spooled

2 (which was defined in section 1.2 of study unit 1) in
terms of our "new double notation" we will have
n
2
Σ2j=1 Σi=1
j
(xij − xj )2
Spooled = .
(n1 + n2 − 2)

SSW ithin
But if n1 + n2 = n then Spooled
2 = .]
n−2

In many applications σ 2 is considered as a measure of "error" hence SSW ithin = SSError and
SSW ithin SSError
if we divide by the degrees of freedom we call = the Mean Square Error.
n−k n−k

[3] Let SSBetween = Σkj=1 nj (xj − x)2 then

SSBetween
E = σ2 .
k−1

Why the third expression is a possible estimate of σ 2 is more tricky to explain and it makes intuitive
sense (and it simplifies matters) if the sample sizes are equal (i.e. the same). Assume that
n1 = n2 = ...nk = (say)nj . Under the assumption of the null hypothesis, H0 : µ1 = µ2 = ...µk ,
the means of the different groups are actually k estimates of the overall population mean µ but the
means (when considered as variables) have a smaller variance than individual observations when
we compute their deviations from the overall mean. [ :-) Think back of what you learned about the
σ2
sampling distribution of a sample mean: It has variance .] In other words, when we compute a
n
sample variance for the k observed means (which are now considered as a sample of size k), this
σ2
sample variance is an estimate of the value .
n

Σkj=1 (xj − x)2 σ2

This means that is an estimate of which we write as
(k − 1) nj

Σkj=1 (xj − x)2 σ2

E[ ]= and multiplying both sides with nj leads us to our final estimate,
(k − 1) nj

Σkj=1 nj (xj − x)2

i.e. E[ ] = σ2
(k − 1)

In the true jargon of experimental design, the different groups/samples are considered to be different
levels of a treatment, hence SSBetween = SST reatment which measures the variation between
samples. If we divide by the degrees of freedom we call SST reatment
(k−1) the Mean Square Treatment.
This estimate only provides an accurate estimate of σ 2 if the sample means are equal.
37 STA1502/1

Where does the F -distribution get into the picture? If there is no difference between the means we

would expect the ratio estimate 3
estimate 2
to be equal to one. According to statitical distribution theory the
estimate 3
ratio estimate 2 has a so-called F -distribution. A-ha, and here we have the makings of a hypothesis
test! We can compare the computed F -value with a critical value obtained from a Critical Values of
F-table. (See Keller, Table 6 Appendix B.) If the computed value of the test statistic deviates "too
much" from 1 we will become suspicious of H0 .

estimated population variance based on
the variation among the sample means M ST reatment
F = = ∼ Fυ1 ; υ2 .
estimated population variance based on M SError
the variation within each of the samples

υ1 = k − 1
υ2 = n − k

The F-distribution
The F-distribution has two parameters, also called degrees of freedom. In any F-table with critical
values you will need to know these two values, often indicated as υ1 and υ2 . For very small values
of υ 1 and υ 2 the density function does not look like the typical "skewed-to-the-right-normal" density
function.

1.2 F2; 11 Distribution

1
0.8
0.6
0.4
0.2
0
0 1 2 3 4 5

Example of an F-distribution with υ 1 = 2 and υ 2 = 11 df

0.8
F4; 55 Distribution
0.6

0.4

0.2

0
0 1 2 3 4 5

Example of an F-distribution with υ 1 = 4 and υ 2 = 55 df

[ :-) Please Note:

For an ANOVA test your critical region will always look like a right-sided test even though it
is a two-sided test! This means you use "all of α” on the right side.

This principle, where the focus is on variances but the test statistic is actually sensitive for
differences between means, applies even to two groups. It is important to note that the ANOVA
test for the case where k = 2, i.e. when we test H0 : µ1 = µ2 , is only valid for a two-sided
alternative, i.e. H1 : µ1 = µ2 .(For a specific application "k" will be replaced with "2" or will be
replaced with "3" or whatever, where "k" =number of samples.) ]

Thus, F = MST reatment

MSError ∼ Fυ1 =k−1; υ2 =n−k and H0 : µ1 = µ2 = µ3 ...... = µk will be rejected at the
α-level of significance if F > Fα; υ1 ; υ2 .

If we reject the null hypothesis, we conclude that at least two means differ. The "extension of
ANOVA" to be able to conclude which means are responsible for the differences, is called multiple
comparisons. This is treated in section 14.2 of Keller. We will discuss this soon.

Activity 2.3
The marketing manager of a pizza chain is in the process of examining some of the demographic
characteristics of her customers. In particular, she would like to investigate the belief that the ages
of the customers of pizza parlors, hamburger huts, and fast-food chicken restaurants are different.
As an experiment, the ages of eight customers randomly selected of each of the restaurants are
recorded and listed below. Assume that we know from previous analyses that the ages are normally
distributed with the same variances.

Customers’ Ages
Pizza Hamburger Chicken
23 26 25
19 20 28
25 18 36
17 35 23
36 33 39
25 25 27
28 19 38
31 17 31
39 STA1502/1

(a) State whether the following calculations are correct or incorrect.

(i) x = 26.833; x1 = 25.5; x2 = 24.125; x3 = 30.875
(ii) SST otal = 1067.344
(iii) SSW ithin = 863.760
(iv) SSBetween = 203.584

[:-) Always keep in mind that small differences could be due to rounding errors!]

............................... ............................... ............................... ............................... ...............................

.............................................................................................................................................................

(b) Set up an ANOVA table

............................... ............................... ............................... ............................... ...............................

.............................................................................................................................................................

.............................................................................................................................................................
40

(c) Do these data provide enough evidence at the 5% significance level to infer that there are
differences in ages among the customers of the three restaurants?

.............................................................................................................................................................

Feedback Feedback

Σ3j=1 Σ8i=1 xij 204 + 193 + 247

(a) (i) Correct. x = = = 26.833
8+8+8 24

Σ8i=1 xi1 204 193 247

x1 = = = 25.5; x2 = = 24.125; x3 = = 30.875
n1 8 8 8

(ii) Correct. SST otal = Σkj=1 Σni=1

i
(xij − x)2 = 1067.344

(iii) Correct. SSW ithin = SSError = Σkj=1 Σni=1

i
(xij − xj )2 = 268.0 + 332.88 + 262.88 = 863.76

(iv) Correct. SSBetween = SST reatment = Σkj=1 nj (xj − x)2 = 8(25.448) = 203.584

(b)
Source of Variation SS df MS F F0.05;2;21
Treatments 203.584 2 101.792 2.475 3.47
Error 863.760 21 41.131
Total 1067.344 23

M STr 101.792
(c) F = = = 2.475 < F0.05;2;21 = 3.47 =⇒ we cannot reject H0 : µ1 = µ2 = µ3 .
M SE 41.131
The data do not provide enough evidence at the 5% significance level to infer that there are
differences in ages among the customers of the three restaurants.

:-) Do you agree that doing an ANOVA manually is usually arduous work?
To appreciate the assistance of a computer even more, and to understand the workings of ANOVA,
you can try to do the next activity.
You will notice that this activity challenges you to manipulate your computational formulae implying
that you understand what you do!
41 STA1502/1

Activity 2.4
Do Keller exercise 14.1.

............................... ............................... ............................... ............................... ...............................

.............................................................................................................................................................

............................... ............................... ............................... ............................... ...............................

.............................................................................................................................................................

............................... ............................... ............................... ............................... ...............................

.............................................................................................................................................................
42

Feedback Feedback

5(10) + 5(15) + 5(20)

(a) x = = 15
5+5+5

ST reatment = Σkj=1 nj (xj − x)2 = 5(10 − 15)2 + 5(15 − 15)2 + 5(20 − 15)2 = 250

SSError = Σkj=1 Σni=1

i
(xij − xj )2 = Σkj=1 (nj − 1)s2j = (5 − 1)50 + (5 − 1)50 + (5 − 1)50 = 600

ANOVA Table

Source of
Sum of squares df Mean Squares F
Variation
SSTr 250 M STr 125
Treatments 250 k−1=2 = = 125 = = 2.50
k−1 2 M SE 50
SSE 600
Error 600 n − k = 12 = = 50
n−k 12
Total 850 n − 1 = 14

10(10) + 10(15) + 10(20)

(b) x = = 15.0 (the same value!)
10 + 10 + 10

SST reatment = Σkj=1 nj (xj − x)2 = 10(10 − 15)2 + 10(15 − 15)2 + 10(20 − 15)2 = 500 (this value
increased).

SSError = Σkj=1 Σni=1

i
(xij − xj )2 = Σkj=1 (nj − 1)s2j = (10 − 1)50 + (10 − 1)50 + (10 − 1)50 = 1350
(this value increased even more).

ANOVA Table

Source of
Sum of squares df Mean Squares F
Variation
SSTr 500 M STr 250
Treatments 500 k−1=2 = = 250 = = 5.00
k−1 2 M SE 50
SSE 1350
Error 1350 n − k = 27 = = 50
n−k 27

Total 1850 n − 1 = 29

(c) The F -statistic increased!

43 STA1502/1

2.4 Multiple comparisons.

STUDY
Keller Chapter 14 Analysis of variance
14.2 Multiple comparisons

Performing an anaylsis of variance test to detemine whether differences exist between two or more
population means is a good start, but not nearly enough for a practical application where it is
necessarty to identify which treatment means are responsible for the differences. The statistical
method used to determine this is called multiple comparisons. We will consider three methods for
this purpose, namely

· Fisher’s least significant diference method (LSD) which is used of you want find areas for further
investigation.
· The Bonferroni method which is used of you want to identify two or three pairwise comparisons.
· Tukey’s method is used when you want to consider all possible population-combinations.

These three methods are discussed in Keller. Make sure that you understand them and can apply
the knowledge. The formulas for the three methods are different, but you need not remember them.
In fact, rather go through activity 2.5 and its solution to see how the three methods are applied.

As your knowledge of statistics expands, lengthy calculations will interest you less and less, seeing
that your interest should move to the actual statistical analysis. There is a very delicate balance
between the importance of the calculation and the statistical analysis: if the calculation is incorrect,
the analysis has no meaning. Still,you are being trained to make a meaningful and correct analysis.
Once you understand the method applied in the calculation, that part can be taken over by statistical
software. This is why most statisticians start to use statistical software for their calculations at an
early stage. We are introducing students at second level in STA2601 to the software package JMP. It
is therefore advisable for you to take note of the given Excel and Minitab printouts in Keller. Try to do
them yourself if you have access to Excel or Minitab and if you do not have access, study them and
note what information they supply and how to interpret it. No professional statistician can function
properly without knowledge of and using statistical software.
44

Activity 2.5
Question 1
An investor studied the percentage rates of return of three different types of mutual funds. Random
samples of percentage rates of return for four periods were taken from each fund. The results appear
in the table below:
Mutual Funds Percentage Rates
Fund 1 Fund 2 Fund 3
12 4 9
15 8 3
13 6 5
14 5 7
17 4 4
Use Tukey’s method with α = .05 to determine which population means differ.
............................... ............................... ............................... ............................... ...............................

.............................................................................................................................................................

............................... ............................... ............................... ............................... ...............................

Question 2
Do Keller exercise 14.21.

............................... ............................... ............................... ............................... ...............................

.............................................................................................................................................................

.............................................................................................................................................................
45 STA1502/1

.............................................................................................................................................................

............................... ............................... ............................... ............................... ...............................

.............................................................................................................................................................

Feedback Feedback

Question 1
ω = 2.684 x̄1 = 14.2 x̄2 = 5.4 x̄3 = 5.6

Fund Fund |xi − xj | Significant?

1 2 8.8 Yes
3 8.6 Yes
2 3 0.2 No
It is clear that the mean percentage rate of return for mutual fund 1 is significantly different from that
of the other two mutual funds.

Question 2
a) α = .05 tα/2,n−k = t.025,27 = 2.052

v
1 1
LSD = t α M SE +
ni nj
2 v
1 1
= 2.052 700 +
10 10
= 24.28
46

Treatment Means Difference

_________________________________________________
i = 1, j = 2 128.7 101.4 27.3
i = 1, j = 3 128.7 133.7 − 5.0
i = 2, j = 3 101.4 133.7 − 32.3
_________________________________________________

Conclusion: µ2 differs from µ1 and µ3 because| 27.3 |> 24.28 and | −32.3 |> 24.28

3(2) αE
b) C = = 3, αE = .05, α = = 0.0167 tα/2,n−k = t.0083,27 = 2.552 (from Excel)
2 C
v
1 1
LSD = t α M SE +
ni nj
2 v
1 1
= 2.552 700 +
10 10
= 30.20

Treatment Means Difference

_________________________________________________
i = 1, j = 2 128.7 101.4 27.3
i = 1, j = 3 128.7 133.7 − 5.0
i = 2, j = 3 101.4 133.7 − 32.3
_________________________________________________

Conclusion: µ2 differs from µ3 because only | −32.3 |> 30.20

v u
M SE 700
c)qα (k, v) = q.05 (3, 37) ≈ ω = qα (k, v) =3.44 = 28.781
ng 10

Treatment Means Difference

_________________________________________________
i = 1, j = 2 128.7 101.4 27.3
i = 1, j = 3 128.7 133.7 − 5.0
i = 2, j = 3 101.4 133.7 − 32.3
_________________________________________________

Conclusion: µ2 differs from µ3 because only | 32.3 |> 28.781

47 STA1502/1

2.5 Analysis of variance experimental designs

READ
Keller Chapter 14 Analysis of variance
14.3 Analysis of variance experimental designs

In this section an overview is given of two experimental designs and different concepts are described.
Read through the three paragraphs - most probably a few times to get a proper overview of single and
multifactor designs; independent samples; randomized block designs; repeated measures; two-way
analysis of variance for fixed and random effects.

2.6 Randomized Block(two-way) Analysis of Variance

STUDY
Keller Chapter 14 Analysis of variance
14.4 Randomized block(two-way) analysis of variance

The calculations for this type of analysis are time consuming that Keller gives only computer printouts
in the explanations. This way you can learn about the method and its application. In the examination
will not be testing your calculation skills, but your knowlegde about the process and the analysis
itself.
When moving from considering within treatments variation to looking at the treatment means and the
differences between them we are designing a randomized block experiment. Total variation is then
partitioned into three different sources, namely

SS = SST + SSB + SSE

Total variation = Sum of squares for treatments Sum of squares for blocks Sum of squares for error

With this design, testing if the treatment means differ can also be used to test if there are differences
in the block means. Of course, if the block means do not differ, it implies that specific analysis was
not the correct one!
48

M ST
Compare the two test statisics F = with ν 1 = k − 1; ν 2 = n − k − b + 1 degrees of freedom
M SE

and
M SB
F = with ν 1 = b − 1; ν 2 = n − k − b + 1 degrees of freedom.
M SE

Study the example in Keller and give special attention to the interpretation of the results.

Activity 2.6
Question 1
Do question 14.31 in Keller
............................... ............................... ............................... ............................... ...............................

.............................................................................................................................................................

............................... ............................... ............................... ............................... ...............................

.............................................................................................................................................................

.............................................................................................................................................................
49 STA1502/1

Question 2
A partial ANOVA table in a randomized block design is shown below, where the treatments refer to
different high blood pressure drugs, and the blocks refer to different groups of men with high blood
pressure. Use the given ANOVA table to answer the questions:

Source of Variation SS df MS F
__________________________________________________________
Treatments 6,720 4 1,680 14.6087
Blocks 3,120 6 520 4.5217
Error 2,760 24 115
__________________________________________________________
Total 12,600 34

a) Can we infer at the 5% significance level that the treatment means differ?

b) Can we infer at the 5% significance level that the block means differ?

............................... ............................... ............................... ............................... ...............................

.............................................................................................................................................................

............................... ............................... ............................... ............................... ...............................

.............................................................................................................................................................

.............................................................................................................................................................
50

Feedback Feedback

Questions 1

ANOVA Table
Source Degrees of Freedom Sum of Squares Mean Squares F .
Treatments 2 100 50.00 24.04
Blocks 6 50 8.33 4.00
Error 12 25 2.08
Total 20 175

a) Rejection region: F > Fα,k−1,n−k−b+1 = F0.5,2,12 = 3.89

Conclusion: F = 24.04, p−value = .0001.
There is enough evidence to conclude that the treatment means differ.

b) Rejection region: F > Fα,b−1,n−k−b+1 = F0.5,6,12 = 3.00

F = 4.00, p-value = .0197.
There is enough evidence to conclude that the block means differ.

Questions 2

a) H0 : µ1 = µ2 = µ3 = µ4 = µ5 versus:
Ha : At least two means differ
Rejection region: F > Fα,ν 1 ,ν 2 = F0.05,4,24 = 2.78
Test statistics: F = 14.6087
Conclusion: Reject the null hypothesis. Yes, at least two of the treatment means differ.

b) H0 : µ1 = µ2 = µ3 = µ4 = µ5 = µ6 = µ7 versus:
Ha : At least two block means differ.
Rejection region: F > Fα,ν 1 ,ν 2 = F0.5,6,24 = 2.51
Test statistics: F = 4.5217
Conclusion: Reject the null hypothesis. Yes, at least two of the block means differ.
51 STA1502/1

2.7 Self-correcting Exercises for Unit 2

Question 1
Do Keller: Exercise 13.68.

Question 2
Do Keller: Exercise 13.73.

Question 3
Consider the following ANOVA table:

Source of
Sum of squares df Mean Squares F
Variation
Treatments 128 4 32 2.963
Error 270 25 10.8
Total 398 29

Say whether the following statements are true or false.

(a) The total number of observations in all the samples is 30.

(b) The within-treatments variation stands for the sum of squares for error.
(c) In one-way analysis of variance, if all the sample means are equal, then the sum of squares for
treatments will be zero.
(d) Rejection region, at the 1% level of significance, for this one-way analysis of variance is where
F > Fα;k−1;n−k = F0.01;4;25 .

(e) Assume that the above ANOVA is applied to independent samples taken from normally distributed
populations with equal variances. If the null hypothesis is rejected, then we can infer that at least
two population means differ.

Question 4
Do Keller: Exercise 14.5
52

2.8 Solutions to Self-correcting Exercises for Unit 2

Question 1
(Solution to Keller: Exercise 13.68)

Test the null hypothesis H0 : (p1 − p2 ) = 0 vs H1 : (p1 − p2 ) = 0.

x1 + x2 n1 (e
p1 ) + n2 (e
p2 ) 225(0.60) + 225(0.55)
ppooled = = = = 0.575
n1 + n2 n1 + n2 225 + 225

[:-) This was tricky and mean and something you simply had to figure out on your own!]

p1 − pe2 ) − (p1 − p2 )
(e (0.60 − 0.55) − 0
Z=u =u = 1.0728
1 1 1 1
p(1 − p)( + ) 0.575(1 − 0.575)( + )
n1 n2 225 225

(a) p-value = 2P (Z > 1.07) = 2(1 − 0.8577) = 0.2846

x1 + x2 n1 (e
p1 ) + n2 (e
p2 ) 225(0.95) + 225(0.90)
(b) Now ppooled = = = = 0.925
n1 + n2 n1 + n2 225 + 225

p1 − pe2 ) − (p1 − p2 )
(e (0.95 − 0.90) − 0
Z=u =u = 2.0135
1 1 1 1
p(1 − p)( + ) 0.925(1 − 0.925)( + )
n1 n2 225 225

=⇒ p-value = 2P (Z > 2.01) = 2(1 − 0.9778) = 0.0444.

(c) The p-value decreases.

x1 + x2 n1 (e
p1 ) + n2 (e
p2 ) 225(0.10) + 225(0.05)
(d) ppooled = = = = 0.075
n1 + n2 n1 + n2 225 + 225

Note that p(1 − p) is the same value as in (b) =⇒ z is the same for both expressions =⇒ the
p-value will be exactly the same as in question (b).

p-value = 2P (Z > 2.01) = 2(0.5 − 0.4778) = 0.0444.

(e) The p-value decreases.

53 STA1502/1

Question 2
(Solution to Keller: Exercise 13.73)

(a) Test the null hypothesis H0 : (p1 − p2 ) = 0 vs H1 : (p1 − p2 ) > 0. (If popularity decreases
=⇒ p1 > p2 .)

x1 + x2 n1 (e
p1 ) + n2 (e
p2 ) 1100(0.56) + 800(0.46)
ppooled = = = = 0.517 89
n1 + n2 n1 + n2 1100 + 800

[:-) Keep in mind that an observed percentage is always pe!]

p1 − pe2 ) − (p1 − p2 )
(e (0.56 − 0.46) − 0
Z=u =u = 4.3070
1 1 1 1
p(1 − p)( + ) 0.517 89(1 − 0.517 89)( + )
n1 n2 1100 800

Reject H0 if z > z0.05 = 1.645. Since 4.3070 > 1.645 =⇒ we reject the null hypothesis and
conclude his popularity decreased.

(b) For this question we have to test the null hypothesis H0 : (p1 −p2 ) = 0.05 vs H1 : (p1 −p2 ) > 0.05.

(If popularity decrease by more than 5% =⇒ p1 − p2 > 0.05.)

Now ppooled does not exist and SE must be computed differently because under H0 p1 = p2 :
u u
pe1 (1 − pe1 ) pe2 (1 − pe2 ) 0.56(1 − 0.56) 0.46(1 − 0.46)
SE = + = + = 0.02311 9
n1 n2 1100 800

p1 − pe2 ) − (p1 − p2 ) (0.56 − 0.46) − 0.05

(e
Z=u = 2.1627
pe1 (1 − pe1 ) pe2 (1 − pe2 ) 0.02311 9
+
n1 n2

Reject H0 if z > z0.05 = 1.645. Since 2.1627 > 1.645 =⇒ we reject the null hypothesis and
conclude his popularity decreased by more than 5%.

u
pe1 (1 − pe1 ) pe2 (1 − pe2 )
(c) If we derive a confidence interval for (p1 − p2 ) we use SE = + =
n1 n2
0.02311 9.

(p1 − p2 ) ± zα/2 SE = (0.56 − 0.46) ± 1.96(0.02311 9) = 0.10 ± 0.04531 3 = (0.05468 7; 0.14531)

Question 3
The statements are all correct.

Question 4
(Solution to Keller 14.5)

H0 : µ1 = µ2 = µ3

H1 :At least two means differ.

Brand 1 Brand 2 Brand 3

Mean 1.333 2.50 2.667
Variance 1.87 2.30 1.47

Grand mean = 2.167

SSTr = Σkj=1 nj (xj − x)2 = 6(1.333 − 2.167)2 + 6(2.5 − 2.167)2 + 6(2.667 − 2.167)2 = 6.339

SSE = Σkj=1 (nj − 1)s2j = (6 − 1)(1.87) + (6 − 1)(2.30) + (6 − 1)(1.47) = 28.2

ANOVA Table

Source of Sum of
df Mean Squares F
Variation squares
SSTr 6.339 M STr 3.1695
Treatments 6.339 k−1=2 = = 3.1695 = = 1.686
k−1 2 M SE 1.88
SSE 28.200
Error 28.200 n − k = 15 = = 1.8 8
n−k 15

Total 34.539 n − 1 = 17

Rejection region: F > Fα;k−1;n−k = F0.01;2;15 = 6.36.

Cannot reject H0 : µ1 = µ2 = µ3 . There is not enough evidence to conclude that differences exist
between the three brands.
55 STA1502/1

2.9 Learning Outcomes

Use the following learning outcomes as a checklist after you have completed this study unit to
evaluate the knowledge you have acquired.

Can you

· define SE for (e
p1 − pe2 ) under the assumption that p1 = p2 ?

· perform a Large-sample statistical test for p1 − p2 ?

· derive a Large-sample confidence interval for p1 − p2 ?

· demonstrate an understanding of the different parts of a statistical test:

- null hypothesis
- alternative hypothesis
- test statistic and its p-value
- rejection region =⇒ critical values
- significance levels
- conclusion

· demonstrate an understanding of the connections between the concepts significance level and
p-value?

· interpret computer output regarding inferences about an F-test for two population variances

· define the following concepts

- within-treatments variation
- sum of squares for error
- between-treatments variation
- rejection region =⇒ critical values for an ANOVA test

· differentiate between one- and two-way analysis of variance experimental designs as well as
randomized block designs?

· perform statistical tests for H0 : µ1 = µ2 = µ3 = ......µk

· understand the three multiple comparison methods

· interpret computer output regarding inferences about an ANOVA test for more than two population
means
56

Key Terms/Symbols
degrees of freedom
F-test for two population variances
ANOVA-test
within-treatments variation
sum of squares for error
between-treatments variation
SS Within
SS Between
SS Blocks &&&
SS Error
SS Treatment
overall mean
57 STA1502/1

STUDY UNIT 3
3.1 Chi–square test

It is just as important to consider the sampled population as it is to know the data type of your
sample. What do you want to know about a specific population or populations? In the earlier study
units we were always interested in the parameters of the population, which implied that we had some
information about the population (e.g. we knew that it was normally, or approximately so, distributed).
What we have discussed so far implied so-called parametric techniques, where we considered the
statistics of a sample to predict the parameters of the distribution describing the population. In the first
part of this study unit we consider other very important parametric techniques, namely chi-squared
tests. In the second part of this unit we then venture into something new, addressing the dilemma
when one cannot make assumptions about the shape of the sampled population. As statisticians
we are often faced with this reality. Do you think that it is still possible to use a random sample
drawn from such a population and make a sensible analysis and even predictions about that sampled
population? Yes! You are going to see that there are also nonparametric techniques that you can use
if you do not know about the distribution of the sampled population. As usual, apart from explaining
the methods, the necessary conditions under which these alternatives apply, will also be described,
Of course, the correct technique for the particular data type stays important.
The first part of this study guide covers two applications of the continuous chi-squared distribution,
which is the technique applicable if the data is nominal. In STA1501 you heard about this distribution
and here hypothesis tests will be discussed and the conditions for their application. Only the chi-
squared goodness-of-fit test and the chi-squared test of a contingency table form part of the contents
of this module (the test for normality is therefore not included). In the second part of this study unit
you will be introduced to three nonparametric techniques. You will see that the sampled populations
are nonnormal and that dependence and independence of the samples play an important role. The
techniques you have to know for this module are the Wilcoxon rank sum test for ordinal or interval
data from two independent samples, the sign test for ordinal data in the form of matched pairs and
lastly the Wilcoxon signed rank test for interval data, also in the form of matched pairs. There are
other nonparametric tests in the prescribed book, but they are not included in the contents of this
module. Remember about them because you never know if you may need to use one of them in
future. Then you simply take Keller and read up about them!
As you study these different tests, please do not be discouraged by all the different definitions that
are given and are used in the manual examples. Remember that we are statisticians and we do not
want to test your memory, but your knowledge of the different procedures and their conditions. In the
examination you will be given a list of formulas from which you can select the one you need (should
we ask a question in an examination paper where you need a formula).
58

3.2 Chi-squared goodness-of-fit test

STUDY
Keller Chapter 15 Chi-squared tests
15.1 Chi-Squared Goodness-of-Fit Test
◦ Test statistic
◦ Required condition

In distance learning the pronunciation of words or symbols is often a problem. If you wonder about
the word "chi" or its symbol χ, think of the words "pie" or "sky" in English, because "chi" rhymes with
it. The ch is pronounces as a k, which means that you actually say "kai".
For the symbol χ2 you say "kai-square".

Recall the knowledge given to you in STA1501 about a binomial experiment and the binomial
distribution. Just a reminder - the prefix bi- refers to two, while the prefix multi- refers to many.

Chi-square is a family of distributions commonly used for significance testing. A chi-square test
(also chi-squared or χ2 test) is any statistical hypothesis test in which the sampling distribution of
the test statistic is a chi-square distribution when the null hypothesis is true, or any in which this
is asymptotically true, meaning that the sampling distribution (if the null hypothesis is true) can be
made to approximate a chi-square distribution as closely as desired by making the sample size large
enough. A number of tests exist, but you are required to focus only on this one.

Below is a table illustrating the similarities and differences between a binomial and a multinomial
experiment.
Binomial experiment consists of Multinomial experiment consists of
a fixed number n of trials a fixed number n of trials
two possible outcomes per trial k categories (cells) of outcomes per trial
constant probability outcomes p and 1 − p constant probabilities pi for each cell i
two probabilities p (success) and 1 − p (failure) k probabilities pi and p1 + p2 + ...pk = 1
different independent trials different independent trials
x successes in n trials observed frequencies fi of outcomes in cell i
expected value µ = np expected frequencies ei = npi

The discussion in STA1501 on the chi-squared distribution was very brief. In this section you are
going to learn more about different tests where the test statistic has a chi-squared distribution.
59 STA1502/1

The Chi-squared distribution

· is a family of continuous probability distributions

· is represented by a different positively skewed curve of which the shape is determined by the
number of degrees of freedom
· ranges between 0 and ∞
· is used to describe nominal data (you can make a mental link between the nomial as in binomial
and multinomial if you have difficulty to remember that χ2 analysis is on nominal data)

There are many interesting and practical applications of the chi-squared distribution. Researchers
are also very keen to use a chi-squared test and we hope that you will now study research results
and see if the conditions for application of this distribution are satisfied. The purpose of an analysis
can be to determine if the sample is from a specified population or the interest can be to determine
if there is a relationship between two populations, e.g. between predicted values and actual values.
An example of the latter: suppose a telecommunications company, interested in customer care, is
uncertain about the continuation or not of a specific product. They decide to ask customers if they
would like the service to continue for the next year or not (this would be categorical or nominal data).
The recorded data (two categories of ’yes’ and ’no’) can be saved and the product continued for a
year. Then data (’yes’ or ’no’) can again be collected and a chi-square analysis can be made to see if
there is a relationship between what the people said and what they actually did. If the null hypothesis
is rejected, it indicates that there is a relationship between the two populations. In this scenario the
managers can then decide to use data where customers say what they are going to do, the data are
reliable enough for their planning.
If you study the examples in the book and in the activities, see if you understand the following
comment: Samples should not be too large for applications of the chi-squared test, and in practice,
analysts carefully study the distribution of the items in the chi-square table and do not only rely on
the numerical value of the test.

Goodness-of-fit test
Make sure that you understand the hypothesis testing procedure and the sampling distribution of the
test statistic for the goodness-of-fit test.

Test statistic
How would you express the formula for the test statistic of the goodness-of-fit test in your own words?

k
[ (fi − ei )2
χ2 =
ei
i=1
60

The procedure is:

Square the difference between the observed and expected frequency and divide it by the expected
frequency for each cell. Add all these answers and it gives you the formula for the test statistic of the
chi-squared goodness-of-fit test.

Is that not easier to remember than the formula itself? It tells you exactly what to do. Can you explain
it to someone else?
If you are still not so sure, we illustrate with the words:

Square (...)2 the difference (. − .)2 between the observed fi and expected frequency ei

and divide it ·· by the expected frequency ei for each cell.
k
[
Add all these answers and it gives you the formula for the test statistic of the
i=1
chi-squared goodness-of-fit test.

Activity 3.1
Question 1
Employee absenteeism has become a serious problem which cannot be ignored. The personnel
department at a university decided to record the weekdays during which lecturers in the Faculty of
Humanities in a sample of 300 called in sick over the past several months. Determine if the given
data suggests that absenteeism is higher on some days of the week than on others.
From existing medical evidence the following information is specified in the null hypothesis for the
consecutive days of the week:
Monday p1 = 0.3, Tuesday p2 = 0.1, Wednesday p3 = 0.2, Thursday ṗ4 = 0.2, Friday p5 = 0.2

Day of
Monday Tuesday Wednesday Thursday Friday
the week
Number
84 24 56 64 72
absent

Question 2
In a goodness-of-fit test, suppose that a sample showed that the observed frequency fi and expected
frequency ei were equal for each cell i. Then, the null hypothesis is

1. rejected at α = 0.05 but is not rejected at α = 0.25

2. not rejected at α = 0.05 but is rejected at α = 0.25
3. rejected at any level
61 STA1502/1

4. not rejected at any level

5. the same as the difference between fi and ei

Question 3
The critical value in a goodness-of-fit test with 6 degrees of freedom, considered at the 5%
significance level, is

1. equal to 18.5476
2. equal to 12.6
3. equal to 0.872085
4. always greater than the test statistic
5. always less than the test statistic

Question 4
A chi-squared goodness-of-fit test is always conducted as

1. a lower-tail test
2. an upper-tail test
3. a two-tailed test
4. a measure of the size of the cells
5. any of the above

Question 5
Five statements are given below. Only one of them is a true statement. Which option is true?

1. For a chi-squared distributed random variable with 10 degrees of freedom and a level of
significance of 0.025, the chi-squared table value is 20.4831. The computed value of the test
statistic is 16.857. This will lead us to reject the null hypothesis.
2. Whenever the expected frequency of a cell is less than 5, one remedy for this condition is to
decrease the size of the sample.
3. For a chi-squared distributed random variable with 12 degrees of freedom and a level of
significance of 0.05, the chi-squared value from the table is 21.0. The computed value of the
test statistics is 25.1687. This will lead us to reject the null hypothesis.
4. The chi-squared goodness-of-fit test can be used for any type of data.
5. In a multinomial experiment the probability pi that the outcome will fall into cell i can change from
one trial to the next.
62

3.3 Chi-squared test of a Contingency Table

STUDY
Keller Chapter 15 Chi-squared tests
15.2 Chi-Squared Test of a Contingency Table
◦ Test statistic
◦ Rejection region and p-value
◦ Rule of five

You need to realize that there are many similarities between the two χ2 -tests in this chapter, and that
there are also definite differences.

In statistics, contingency tables are used to record and analyse the relationship between two or
more variables, most usually categorical variables. Suppose that we have two variables, sex (male
or female) and handedness (right- or left-handed). We observe the values of both variables in a
random sample of 100 people. Then a contingency table can be used to express the relationship
between these two variables, as follows:
Right-handed Left-handed TOTAL
Male 43 9 52
Female 44 4 48
TOTAL 87 13 100
The figures in the right-hand column and the bottom row are called marginal totals and the figure
in the bottom right-hand corner is the grand total. The table allows us to deduce at a glance that
the proportion of men who are right-handed is about the same as the proportion of women who are
right-handed. However the two proportions are not identical and the statistical significance of the
difference between them can be tested statistically using one of a number of available methods. In
our case we will use a nonparametric method called a Pearson’s chi-square test. In this case the
entries provided in the table must represent a random sample from the population contemplated in
the null hypothesis. If the proportions of individuals in the different columns vary between rows (and,
therefore, vice versa) we say that the table shows contingency between the two variables. If there is
no contingency, we say that the two variables are independent.
If we make a table of comparisons it might help you to remember the different principles involved and
the calculation methods.
63 STA1502/1

χ2 Goodness-of-Fit Test χ2 Test of a Contingency Table

Only applicable for nominal data produced Only applicable for nominal data
by a multinomial experiment. arranged in a contingency table.

Expected value for each cell > 5 (Rule of five).

Test for evidence to conclude (infer) that

◦◦ two classifications of a population are
Test if two variables are related.
statistically independent, i.e. unrelated
◦◦ two or more populations are related.

k
[ k
[
(fi − ei )2 (fi − ei )2
Test statistic: χ2 = Test statistic: χ2 =
ei ei
i=1 i=1

Contingency table with r rows and

Data are classified into k categories.
c columns consists of k cells.

Expected frequency of cell in row i and

Expected frequency for each category is column j is
ei = npi . total row i × total column j
eij = .
sample size

Degrees of freedom: ν = k − 1. Degrees of freedom: ν = (r − 1) (c − 1) .

Probabilities pi are given. pi are calculated assuming Ho as true.

Ho lists values for the probabilities pi . Ho states the two variables are independent.

The manual calculation of the χ2 -values for the contingency table is rather cumbersome, but not that
complex!
Make sure that you understand the process of

· calculating the expected frequencies for each cell - multiply total of row and total of column and
divide by the grand total
· writing the given (observed) frequencies and calculated (expected) frequencies next to each other
for each cell in a new contingency table
· calculation of the test statistic, which involves only this last contingency table for each cell: subtract
the two frequencies, square the answer, then divide by the calculated (expected) frequency

If you calculate these values with Excel or Minitab it is of course not so complex, but remember that,
at this first-year level, you have to know the "how" of the process itself and not only the interpretation
of the χ2 and p−values.
64

3.4 Summary of tests on nominal data

STUDY
Keller Chapter 15 Chi-squared tests
15.3 Summary of tests on nominal data

This section emphasises the contexts in which the various chi–square tests apply. Study the entire
section in the prescribed book, and especially understand Table 15.1 of the prescribed book.

Activity 3.2
Question 1
Do question 15.22 in Keller.

Question 2
The number of degrees of freedom for a contingency table with 5 rows and 7 columns is

1. 35
2. 12
3. 10
4. 24
5. 30

Question 3
In a chi-squared test of a contingency table, the test statistic value was χ2 = 12.678, and the critical
value at α = 0.025 was 14.4494. Thus,

1. the number of degrees of freedom was not 6

2. we fail to reject the null hypothesis at α = 0.025
3. we reject the null hypothesis at α = 0.025
4. we don’t have enough evidence to accept or reject the null hypothesis at α = 0.025
5. we should decrease the level of significance in order to reject the null hypothesis
65 STA1502/1

Question 4
Which of the following statements is/are false?

1. A chi-squared test for independence is applied to a contingency table with 3 rows and 4 columns
for two qualitative variables. The degrees of freedom for this test must be 12.
2. A chi-squared test for independence with 10 degrees of freedom results in a test statistic of 17.894.
Using the chi-squared table, the most accurate statement that can be made about the p-value for
this test is that 0.05 < p-value< 0.10.
3. In a chi-squared test of independence, the value of the test statistic was 15.652, and the critical
value at α = 0.025 was 11.1433. Thus, we must reject the null hypothesis at α = 0.025.
4. A chi-squared test for independence with 6 degrees of freedom results in a test statistic of 13.25.
Using the chi-squared table, the most accurate statement that can be made about the p-value for
this test is that p-value is greater than 0.025 but smaller than 0.05.
5. The chi-squared test of a contingency table is used to determine if there is enough evidence to
infer that two nominal variables are related, and to infer that differences exist among two or more
populations of nominal variables.

Activity 3.3
Question 1
A statistics professor posted the following grade distribution guidelines for his elementary statistics
class:
8% A, 35% B, 40% C, 12% D, and 5% F.

A sample of 100 elementary statistics grades at the end of last semester showed
12 A’s, 30 B’s, 35 C’s, 15 D’s, and 8 F’s.

Suppose that you test at the 5% significance level to determine whether the actual grades deviate
significantly from the posted grade distribution guidelines. Compare your calculations with the step
by step calculations given below. Indicate in which step the first error was made.

1. H0 : p1 = 0.08, p2 = 0.35, p3 = 0.40, p4 = 0.12, p5 = 0.05.

H1 : At least two proportions differ from their specified values.
2. Rejection region: χ2.050,4 = 9.49
3. Test statistic: 5.889
4. Conclusion: Reject the null hypothesis.
5. The actual grades do not deviate significantly from the posted grade distribution guidelines.
66

Question 2
Which of the following tests is appropriate for nominal data if the problem objective is to compare two
or more populations and the number of categories is at least 2?

1. The z -test for one proportion, p, or difference of two proportions

2. The chi-squared goodness-of-fit test
3. The chi-squared test of a contingency table
4. All of the above
5. Not one of the above

Feedback Feedback

Activity 3.1
Question 1
H0 : p1 = 0.3, p2 = 0.1, p3 = 0.2, ṗ4 = 0.2, p5 = 0.2
H1 : At least one pi is not equal to its specified value.

(fi − ei )2
Cell i fi ei (fi − ei )
ei
1 84 300(.3) = 90 −6 0.40
2 24 300(.1) = 30 −6 1.20
3 56 300(.2) = 60 −4 0.27
4 64 300(.2) = 60 4 0.27
5 72 300(.2) = 60 12 2.40
Total 300 300 χ2 = 4.54

Rejection region: χ2 > χ2α,k−1 = χ2.01,4 = 13.2767

χ2 = 4.54, p-value = 0.3386.
There is not enough evidence to infer that absenteeism is higher on some days of the week.

Question 2
Answer: 4
The chi-squared goodness-of-fit test involves the difference between the expected and observed
frequencies. In this question there is never a difference between the two, with the result that the null
hypothesis will never be rejected.

Question 3
Answer: 2
From the χ2 − table in Keller, find the cell where the column under χ2.050 in the first row meets the row
with 6 in the first column. The value written there is 12.6.
67 STA1502/1

Question 4
Answer: 2
If you are not sure, look at the little picture at the top of the page listing the χ2 − table (Keller) and
you will see that the shaded area lies on the right-hand side.

Question 5
Answer: 3

1. False, because the table is correct, but the value 16.857 does not fall in the critical region and
therefor the null hipothesis will not be rejected.
2. False. The remedy is to combine cells should any expected value in a cell be less than 5.
3. True. 25.1687 is greater than the test statistic and the null hypothesis would be rejected.
4. False. Only nominal data may be used in applications of the test.
5. False. These probabilities have to remain constant for each trial of a multinomial experiment.

Activity 3.2
Question 1
It is sometimes convenient to distinguish between employees doing more physical work ("blue collar"
workers) and those who are doing desk work ("white collar" workers). In this problem they wanted to
find out if the job description of an employee has an influence on their choice of opinion.

H0 : The variables are independent.

H1 : The variables are dependent.

Grand total = 200

130 50 20
P (blue collar) = , P (white collar) = , P (manager) = .
200 200 200

Responses Blue collar White collar Managers Totals

130 × 110 50 × 110 20 × 110

For revision = 71.5 = 27.5 = 11 110
200 200 200
130 × 90 50 × 90 20 × 90
Against revision = 58.5 = 22.5 =9 90
200 200 200

130 50 20 200

Responses Blue collar White collar Managers

For revision 67(71.5) 32(27.5) 11(11)
Against revision 63(58.5) 18(22.5) 9(9)
68

(67 − 71.5)2 (32 − 27.5)2 (63 − 58.5)2 (18 − 22.5)2

χ2 = + + + +0+0
71.5 27.5 58.5 22.5

= 0.2832 + 0.7364 + 0.3462 + 0.9

= 2.2658

Degrees of freedom: (3 − 1) (2 − 1) = 2

From the χ2 −table, for 2 degrees of freedom and significance level α = 0.050 the χ2 −value is 5.99.
This is more than the value of the test statistic and therefore the null hypothesis cannot be rejected.
There is not enough evidence that the response to the proposed revision plan depends on the group
(according to job description in the company) of the employee.

Question 2
Answer: 4
(5 − 1)(7 − 1) = 4 × 6 = 24

Question 3
Answer: 2
The number of degrees of freedom was 6, as can be seen from the χ2 table if you find the cell under
χ2.025 with the value 14.4 written in it. Furthermore, because 14.4 is larger than the calculated 12.678,
the null hypothesis cannot be rejected. For option 5, if you look at the table and you decrease the
significance level to χ2.010 the critical value is 16.8 and the null hypothesis would still not be rejected
because 12.678 < 16.8.

Question 4
Answer: 1
Option 1 is false in the number of degrees of freedom. It is not 3 × 4 but 2 × 3 = 6.
Option 2 is true because the p-values can only be determined accurately with computer software.
However, we can have some indication from the χ2 −table. 17.894 lies between the table values 16.0
and 18.3, which correspond respectively with significance levels of 0.100 and 0.050. Therefore the
comment about the range of the p-value is true.
Option 3 is true because the test statistic’s value 15.652 is more than the table value 11.1433, which
places it in the rejection region at level α = 0.025.
Option 4 is true for the same reasons as option 2 is true.
Option 5 is true.
69 STA1502/1

Activity 3.3
Question 1
Answer: 4
H0 : p1 = 0.08, p2 = 0.35, p3 = 0.40, p4 = 0.12, p5 = 0.05.
H1 : At least one pi is not equal to its specified value.

(fi − ei )2
Cell i fi ei (fi − ei )
ei
1 12 100(0.08) = 8 4 2.0
2 30 100(0.35) = 35 −5 0.714
3 35 100(0.40) = 40 5 0.625
4 15 100(0.12) = 12 3 0.75
5 8 100(0.05) = 5 3 1.80
Total 300 300 χ2 = 5.889

Rejection region: χ2 > χ2α,k−1 = χ2.05,4 = 9.49.

The test statistic does not fall in the rejection region, therefore the null hypothesis cannot be rejected
The error lies in the interpretation of the calculated value. The last comment is correct as the null
hypothes is not rejected (as should have been the case).

Question 2
Answer: 3
70

STUDY UNIT 4
4.1 Simple linear regression and correlation
In this study unit the discussion is about the relationship between interval variables. In regression
analysis involving two variables, one of the variables is used to make predictions about the other
variable. Recall that interval data are real numbers, such as heights, weights, incomes and distance,
as was said in chapter 2 (or STA1501), where you were told that interval data can also be referred to
as quantitative or numerical data. In this unit the so-called probabilistic model for regression analysis
is described, with initial interest in the first-order linear model (also called the simple linear regression
model). In this model an error variable is introduced. Finding the equation of the regression line is
the first step, but this has to be followed by an assessment of the fit of the line to the data as well as
looking into the relationship between the dependent and independent variables. The importance of
the error variable and the conditions that apply to it, forms the basis of many of the discussions that
follow.

Please read through the discussion on section 16.4 to get the feeling of what regression analysis
entails. You will not be examined on all the sections of chapter 16. The topics covered in these
sections are very important and should you continue with statistics, you will surely learn about them
in a second-level module.

4.2 Estimating the coefficients

STUDY
Keller Chapter 16 Simple linear regression and correlation
16.1 Model
16.2 Estimating the coefficients
◦ Interesting facts about the coefficients b0 and b1

The model
The graph of a straight line and its equation is introduced in school mathematics in grade 8/9. This
means that, even if you did not choose mathematics as a grade 12 subject, the equation of a standard
form of a straight line should not be new to you!!! The notation in school and that given in Chapter
16 may be different, but the meaning of the variables and the constants is the same, The only new
concept in Keller’s line equation is that extra term epsilon (ε) , but that we will explain to you shortly.
71 STA1502/1

First look at this comparison:

School: y = m·x + c
Keller: y = β1 · x + β0 + ε
Slope times inde- constants
Explanation: dependent variable
pendent variable (numbers)

Do not allow the abstract form of these equations to mislead you. If we give the general form of the
equation as y = β 0 + β 1 x + ε, it is in symbolic terms. For the equation of each particular straight line
the β 0 and the β 1 will not be there, but there will be numbers in their places, e.g. y = 2 + 3x + ε. The
x and y will, however, always be there. They are the variables and in particular x is the independent
and y the dependent variable. The two go together as a pair (x, y).

Let us see what you know and what is new:

· The number with the x (in the two equations above it is m and β 1 ) indicates the slope of the line.

· The number without an x (but not the ε ) is often referred to as a constant and it indicates the
value on the vertical axis (or y-axis) where the line passes through it.

· The new symbol in Keller’s line equation is the epsilon ε, written there to accommodate the
possible error in the model, making it a probabilistic model instead of a deterministic model. An
easy way to remember that ε is the symbol for error is to think in terms of the first letter of the
word "e"rror.

· It is customary in regression to write the terms in the particular order of β 0 + β 1 x, which is the
other way round as the school form of mx + c, but you will get used to that as well.

Estimating the coefficients

The equation y = β 0 + β 1 x + ε is called the first-order linear model and the word coefficient
refers to the β 0 and the β 1 . These coefficients are population parameters, impractical to determine
as you know by now that populations are too large. You also know by now that population
parameters are estimated using information obtained from data in a random sample drawn from
that population. Sample data is recorded in the form of pairs (xi , yi ), used to fit a straight line of the
ˆ
form y = b0 + b1 x through these co-ordinates. This is not any line passing through the data; it is the
least squares line. Let us explain:

· For every xi -value in the data set, there was a linked yi -value in the pair (xi , yi ). The line that must
pass through these data points does not go through all of the (xi , yi ) points. It may even be that
the ’best’ line does not pass through any of these observed points! How do we find this ’best’ line
and its equation?
72

· The name of this ’best’ line is the least squares line for a specific reason. Think abstract and
imagine that you have (in some way) determined the equation of a line passing through the sample
data. Take each (xi , yi ) pair and substitute each xi -value into the least squares line equation and
find a calculated yi -value for it. To distinguish between the observed yi and the calculated (or
estimated) yi , this last one is given a hat and it becomes ŷi . You have then, for each xi -value two
y -values: the one is the observed yi and the other is the the estimated ŷi .

· The correct least squares line must be determined such that for each observed pair (xi , yi ) and
its calculated pair (xi , ŷi ) the differences between yi and ŷi , namely (yi − ŷi ) must be squared
[n
2
(yi − ŷi ) and the sum of all these squared differences (yi − ŷi )2 must be as small as possible!
i+1
Do you think that is an easy task? We do not! Mathematics has to be used to calculate the
equation of this least squares line.

· Many statisticians talk about ŷ = b0 + b1 x as the least squares regression line. You see, the least
squares criterion is applied in the calculation of what we call the regression line. Keller uses least
squares line or regression line. From now on we will call it the regression line.

· Once the equation of the regression line ŷ = b0 + b1 x is known, the slope b1 and y -intercept b0 are
used to predict the values of the population parameters β 0 and β 1 in the first-order linear model
equation y = β 0 + β 1 x + ε.

Keller’s example 16.1 states the aim, then illustrates how a data set consisting of pairs of interval
variables is used to find the equation of the regression line. Figures 16.1 and 16.2 show the data
points, the calculated regression line passing through them and then the little verticle lines, called
the residuals. Make sure that you understand that the equation ŷ = 0.934 + 2.114x was calculated
using the data set.
As an example, look at the residual y4 − ŷ4 . The value of y is 5 (from the data set the y with
x = 4). The value of ŷ we have to calculate from y4 − ŷ4 = 0.934 + 2.114(4) = 9.39. Therefore
y4 − ŷ4 = 5 − 9.39 = −4.39. Although the particular value of 9.39 was not indicated in Figure 16.2, it is
there on the line and possible to calculate. The reason why the residuals are squared (removing the
possible negativity of a residual) is because our interest lies in the distance between the data point
and the calculated y -value and not whether it lies above or below the line.

Interesting facts about the coefficients b0 and b1

The slope b1
This number with the x indicates the slope of the line. Remember the characteristic that it occurs with
the x and is independent of the position where it is written in the equation: ŷ = 0.934 + 2.114x and
ŷ = 2.114x + 0.934 is the same line and for both the slope is 2.114.
73 STA1502/1

· The value of b1 can be either positive or negative – nothing wrong with that!

· If b1 is positive, the value of the two variables both increase and the direction of the line is . If
b1 = 2.114, it implies that for each year of increase in service, the annual bonus will increase with
2.114 of the previous bonus. Some books say there is a direct relationship between the variables
if b1 is positive.

· If the value of b1 is negative, the one variable increases when the other decreases and the direction
of the line is .

The y-intercept b0
The number b0 indicates where the line passes through the y-axis, which is the value of y when x = 0.
In our example it should therefore indicate the amount of the bonus when a person starts working.
Does that make sense? Not really, because it is a ’service bonus’, which implies that it is only paid
out after a term of service! Maybe it would have been less misleading if the author did not draw the
intercept of the line on the y-axis, but ’started’ to draw the line from above the value of x = 1! You
must be careful in the interpretation of the y -intercept – it depends on the nature of the variables.
Keller also comments on this topic with reference to the example about the relationship between the
odometer reading and the selling price of a vehicle. (Have you noticed the error on p 625, where the
sentence reads "The slope coefficient b0 is −0.0669, .."? The slope coefficient is b1 .)

We hope you note that calculating these coefficients in the regression line involves a large amount
of arithmetic. Remember about the shortcut formulae and do not hesitate to use your scientific
calculator – that is if you do not have a computer handy! Remember once again about us testing
insight rather than your calculation skills in the examination.

Activity 4.1
Question 1
The regression line ŷ = 3 + 2x has been fitted to the data points (4, 8), (2, 5), and (1, 2). The sum of
the squared residuals will be

1. 7
2. 15
3. 8
4. 22
5. 7.5
74

Question 2
If an estimated regression line has a y -intercept of 10 and a slope of 4, then when x = 2 the actual
value of y is

1. 15
2. 24
3. 18
4. 14
5. unknown

Question 3
Given the least squares regression line ŷ = 5 − 2x, choose the correct statement:

1. The relationship between x and y is positive.

2. The relationship between x and y is negative.
3. As x increases, so does y.
4. As x decreases, so does y.
5. The formula gives the equation of the population regression line.

Question 4
A regression analysis between weight y (in kilogram) and height x (in centimetre) resulted in the
following least squares line: ŷ = 70 + 2x. This implies that if the height is increased by 1 centimetre,
the weight, on average, is expected to

1. increase by 1 kilogram
2. decrease by 2 kilogram
3. increase by 2 kilogram
4. decrease by an unknown amount
5. increase with an unknown amount.

_________________________________________________________________
75 STA1502/1

4.3 Error variable: required conditions

STUDY
Keller Chapter 16 Simple linear regression and correlation
16.3 Error variable: required conditions

The residuals are considered as observations of the error variable. There are special requirements
for this error variable in order that the regression equation may be used for estimation or predictions.
These are explicitly given in Keller, but in short they stipulate that the error variable must be normally
distributed, with mean zero, constant variance and independence of all errors. The paragraph where
observational and experimental data are compared, you need only read.

Activity 4.2
Question 1
In regression analysis, the residuals represent the

1. difference between the actual y -values and their predicted values

2. difference between the actual x-values and their predicted values
3. square root of the slope of the regression line
4. change in y per unit change in x
5. sum of the squares for error, denoted by SSE

Question 2

In a simple linear regression problem, the following statistics are calculated from a sample of 10
S S S
observations: (x − x) (y − y) = 2250, sx = 10, x = 50, y = 75. The least squares estimates

of the slope and y -intercept is respectively

1. 2.2 and −3.5

2. 2.5 and 1.5
3. −5.5 and 2.5
4. 2.5 and −5.0
5. 25 and −117.5

__________________________________________________________________________
76

4.4 Assessing the model

STUDY
Keller Chapter 16 Simple linear regression and correlation
16.4 Assessing the model

Regression analysis looks at the relationship between two variables; usually to determine how the
independent variable relates to the dependent variable. It can also be applied simply to determine
whether two variables are related. An inferential method is used to go beyond the presentation of a
linear regression equation (based on sample data) to the estimation of the coefficients of the linear
regression model that fits the population.

It is logical that a relationship between two variables need not be linear. What about a quadratic
relationship? Then the graph representing the relationship is a parabola and not a straight line. The
statistician should determine the strength of the linear relationship before accepting it as correct.
This implies that the sum of the squares for error must be determined and used to determine the
standard error of estimate, the t-test of the slope and the coefficient of determination.
We would really like you to read the paragraphs "Developing and Understanding of Statistical
Concepts" and "Cause-and-Effect Relationship". The discussion is very informal and something
to note for future reference.

In STA1501 you learnt about the correlation coefficient for a sample or a population, which is a
numerical description (a value between −1 and +1) of the strength of the relationship between two
variables. Now a description is given of how it can also be used to test for a relationship between two
variables, as described in a short paragraph about the difference between the t-test of the population
correlation coefficient ρ and the t-test of the population slope β 1 .

The reason why you should read through all these sections is for future use. You might be confronted
with choices like these in your job situation and you will be surprised about the human brain and its
memory potential. Consider to keep your Keller prescribed book as it can be a very helpful reference
for basic practical stastistics!
77 STA1502/1

4.5 Using the regression equation

STUDY
Keller Chapter 16 Simple linear regression and correlation
16.5 Using the Regression Equation

Activity 4.3
A random sample of 11 statistics students produced the following data where x is the third test score,
out of 100, and y is the final exam score, out of 300. Can you predict the final exam score of a
random student if you know the third test score?

x third exam score 65 67 71 71 66 75 67 70 71 69 69

y final exam score 175 133 185 163 126 198 153 163 159 151 159

You can easily show by estimating the slope and gradient that the best fit line for the third exam/final
exam example has the equation: ŷ = −173.51 + 4.83x.

What would be the expected final scores for students who obtained third exam scores of (i) 68, (ii)
78 and (iii) 94?

4.6 Regression diagnostics

Study
Keller Chapter 16 Simple linear regression and correlation
16.6 Regression diagnostics – I

In a diagnostic analysis the requirements for the error variable and the influence of very large or
small observations must be investigated. In 16.6 you need not be able to apply the different tests,
but you have to know about them and what they mean.
78

Concept Meaning of the concept Test

Normality Bell-shaped symmetrical curve Draw histogram of the residuals
Heteroscedasticity The variance is not constant Plot the residuals and interpret
Homoscedasticity The variance is constant Plot the residuals and interpret
Independence of Looking at the relationship Graph the residuals against
error variables among the residuals the time periods - no pattern
Dependence of Looking at the relationship Graph the residuals against the
error variables among the residuals time periods - pattern exist
Error in recording of values,
Outliers wrong sample data point, Clear from a scatter diagram
incorrectly recorded value
Looks like outlier, but has
Influential observation Scatter diagram inspection
big influence on statistic

Please read the procedure of regression diagnostics. You must understand the consecutive steps,
but need not memorize them!

Activity 4.4
Question 1
Do question 16.1 in Keller.

Question 2
Which value of the coefficient of correlation r indicates a stronger correlation than 0.65?

1. 0.55
2. −0.75
3. 0.60
4. 0.05
5. −0.65

Question 3
In a regression problem the following pairs of (x, y) are given:(3, 1), (3, −1), (3, 0), (3, −2) and (3, 2).
That indicates that the

1. correlation coefficient has no limits

2. correlation coefficient is 1
3. correlation coefficient is 0
4. correlation coefficient is −1
5. changes in y caused no change in the values of x
79 STA1502/1

Feedback Feedback

Activity 4.1
Question 1
Answer: 4
Substitute the values x = 4, 2 and 1 into the equation and determine the corresponding values of .
Then determine the difference between these calculated values and the given y -values of 8, 5, and 2
(these are the residuals). Finally square these answers and add them:
ŷ = 3 + 2x ŷ = 3 + 2x ŷ = 3 + 2x
= 3 + 2(4) = 3 + 2(2) = 3 + 2(1)
= 11 =7 =5

(11 − 8)2 + (7 − 5)2 + (5 − 2)2

= 9+4+9
= 22

Question 2
Answer: 5
We can say nothing about the actual value of y , because the interpretation of the calculated values
only refer to the sample.

Question 3
Answer: 2
In the least squares regression line ŷ = 5 − 2x the value of the slope is −2, which is negative;
therefore the relationship is negative (if the one increases, the other will decrease).

Question 4
Answer: 3
The relationship can be expressed based on the slope. From the equation ŷ = 70 + 2x we know the
slope of the line is 2, which implies that ratio rise/run is 2/1.For each move forward (x− height) the
movement up (y− weight) will be double of that.
80

Activity 4.2
Question 1
Answer: 1

Question 2
Answer: 4
S
(x − x) (y − y)
sxy = s2x = 102
n−1
2250
= = 100
10 − 1

= 250

sxy
b1 = b0 = y − b1 x
s2x

250 75 50
= = − 2.5 ·
100 10 10

= 2.5 = 7.5 − 12.5

= −5

Actvity 4.3
We are give the equation: for this estimation. Thus, for those who obtained third exam scores of (i)
68, (ii) 78 and (iii) 94 we would expect the final exam scores of:

(i) when x = 68, then y = –173.51 + 4.83(68) = 154.93

This means that for a student who obtained 68 out of 100 in the third test, we expect him/her to
obtain about 155 out of 300 in the examination.
(ii) when x = 78, then y = –173.51 + 4.83(78) = 203.23
For one who obtained 78 out of 100 in the third test we expect him/her to obtain 203 out of 300 in
the examination.
(iii) when x = 94, then y = –173.51 + 4.83(94) = 280.51
Thus, one who obtained 94 out of 1200 in the third test is expected to obtain 281 out of 300 in the
examination.

Activity 4.4
Question 1
We advice you to read that paragraph of statistical history!

Compare the given equation of the regression line with the standard form of the regression line:
81 STA1502/1

ŷ = b0 + b1 x
Son’s height = 33.73 + 0.516· Father’s height

This implies that the dependent variable y represents the son’s height and the independent variable
x represents the number of inches that the father is taller or less than 33.73.inches We assume that
both father and son are measured when they are fully grown.

Does anything in this equation bother you? You should be worried about these very TALL people!
Can they all be 33.73metre plus about half of the father’s height? Of course not! The prescribed book
as well as the scenario described in this question is from America and the Americans still measure
height in the imperial system of inches, feet or yards. The older people in South Africa know these
non-metric measures and will be able to tell you that an inch is little more than 2 cm, a feet little more
than 25 cm and a yard little less than a metre (ask a granddad or grandmother if they know about
these measurements).How many metres will 33.73 inches be?

Let us answer the questions:

(a) The intercept b0 = 33.73 is where the regression line and the y-axis intersect and at that point
x = 0. As argued earlier, be careful in the interpretation of this. It does not mean that when the
father’s height is 0 (not born yet ??) the son’s height is 33.73 inches. You can see that makes no
sense - it is meaningless!
The slope coefficient b1 = 0.516 implies that for each additional inch of the father’s height the
son’s height increases on average by 0.516 inches.

(b) 33.73 inches is taken as the cut-off value: ’tall’ fathers are supposedly taller than 33.73 and ’short’
fathers are shorter than 33.73. Therefore, if the father is tall, the son would on average be shorter
than his father.

Question 2
Answer: 2
Remember that we said that the closer the value of r is to either +1 or −1, the stronger the
relationship between the variables. The fact that we compare positive and negative values is
irrelevant if the only issue is the strength of the relationship. A value of r close to zero indicates
a very weak relationship. (See 4.4 in Keller "Measures of linear relationship" under the heading
Coefficient of Correlation.)

Question 3
Answer: 3
82

STUDY UNIT 5
5.1 Non parametric statistics
Non-parametric (or distribution-free) inferential statistical methods are mathematical procedures for
statistical hypothesis testing which, unlike parametric statistics, make no assumptions about the
probability distributions of the variables being assessed.

5.2 Wilcoxon Rank Sum Test

STUDY
Keller Chapter 19 Nonparametric statistics
19.1 Wilcoxon rank sum test
◦ procedure and test statistic
◦ understanding the required conditions

There are different nonparametric methods that can be used, but not at random. For each test there
are specified conditions about the nature of the data that must be satisfied. You are given a summary
of the different tests, their conditions and their parametric counterparts at the end of this study unit.
We do not expect you at first-year level to know all these tests, so we only discuss three of them: the
rank sum test, the sign test and the signed rank sum test.

The rank sum test for two independent samples of either ordinal or interval data is the nonparametric
counterpart of the two-sample pooled t-test. If there is doubt about the interval scale of data, the
normality of the sampled populations or equality of the variances, this rank sum test should be used.
The sizes of the two samples can be small and need not necessarily be equal. Furthermore, with
both sample sizes ≥ 10, there is a normal approximation of the Wilcoxon rank sum test which can be
used.
This test determines the differences between the placement (location) of two independent
populations, using the median as measure of location, and therefore it is preceded by a ranking
process for the data. The name of the test is leading, don’t you think so? You rank and then you sum
the data! Once that is done, you have also calculated the test statistic - as easy as that!

The brain link you must make is rank + sum + independence.

83 STA1502/1

Procedure and test statistic

· Make sure that the two populations are independent.

· For equal sized samples any one can be called sample 1, but if the sample sizes differ, the smaller
one should be called sample 1 with sample size n1 .The other sample is called sample 2 of size
n2 .
· Combine both data sets for the sake of ranking. Rank 1 is given to the smallest value and
rank(n1 + n2 ) for the largest value. If there are ties (equal values) these ranks vary a little as
the avarage rank is given to all numbers in a tie. For example, ranking the values 8, 5, 0, 2, 5, 0, 4, 5
would be as follows:

Given numbers 8 5 0 2 5 0 4 5
Ranked numbers 0 0 2 4 5 5 5 8
Rank allocations 1.5 1.5 3 4 6 6 6 8

Instead of allocating rank 1 to the smallest value (0), and rank 2 to the other smallest value (0), both
are given the rank 1.5. Two identical values cannot have different ranks. The average rank of 1.5 is
halfway between rank 1 and rank 2. With similar reasoning the three 5’s must have the same rank.
Instead of placing one 5 in rank 5, another 5 in rank 6 and the third 5 in rank 7, they are all given the
average rank of 6. Note that you have to "skip" rank 5 and rank 7 because they have already been
"used".

· Re-group the data and their ranks into the original samples and sum the ranks for the data in each
sample.
· The sample with the smallest total is then named "sample 1". Further calculations and
interpretations are based on "total sample 1", which is the observed value of the test statistic.
· Make sure that you can formulate the hypotheses and use Table 9 containing the critical values
for this rank sum test according to the formulation of the alternative hypothesis. Specification can
simply be that the locations are different which implies a two-tailed test, while specification of the
relative position of the two populations implies a one-tailed test.

Sampling distribution of the test statistic

Keller illustrates the sampling distribution of the test statistic in detail and then leads us to a table of
critical values for this Wilcoxon rank sum test. You must be able to use this table. Make sure that you
understand that n1 is the number of observations in the data set with the smallest rank-total (which
need not be the one given as "Sample 1” or "Course 1”, or....). Furthermore, take note that you use
the right table for the right test. One table, Table 9(a), is used for either α = 0.025 one-tail or=α = 0.05
two-tail. Do you still know why? The critical values read from Table 9(a) places 0.025 of the area in
each tail, so if you use both tails, you are considering 0.025 + 0.025 = 0.05 of the total probablity
area as critical region. If you are only using one tail you only use 0.025 of the total probablity area as
critical region.
84

The formula given to use for sample sizes larger than 10 is a normal approximation and is calculated
without the tables (because they do not list values larger than 10!!) and only use the sizes of the two
independent samples and the test statistics.

Activity 5.1
Question 1
Consider the following data set: 14, 14, 15, 16, 18, 19, 19, 20, 21, 22, 23, 25, 25, 25, 25,and 28.
The rank assigned to the four observations of value 25 is

1. 12
2. 12.5
3. 13
4. 13.5
5. 14

Question 2
The Wilcoxon rank sum test statistic T is approximately normally distributed whenever the sample
sizes are

1. larger than 10
2. smaller than 10
3. between 5 and 15
4. larger than 20 but smaller than 30
5. smaller than 20

Question 3
A Wilcoxon rank sum test for comparing two populations involves two independent samples of sizes
5 and 7. The alternative hypothesis is stated as: The location of population 1 is different from the
location of population 2. The appropriate critical values at the 5% significance level are

1. 20 and 45
2. 22 and 43
3. 33 and 58
4. 35 and 56
5. 12 and 32
85 STA1502/1

Question 4
Consider the following two independent samples:

Sample A: 16 17 19 22 47

Sample B: 27 31 34 37 40

The value of the test statistic for a left-tail Wilcoxon rank sum test is

1. 6
2. 20
3. 35
4. 55
5. 121

Question 5
Two observers are placed on two different observation points (randomly chosen) for a specified
period of time. They have to observe the drivers of the cars passing by and count the number of
them driving by while talking on a cell phone. Data given below was recorded at Point A for 6 days
and at Point B for 7 days. At the 0.10 level, can we conclude that the number of drivers talking on cell
phones at the two locations have the same median occurrence?

Point A 74 61 73 67 80 89
Point B 90 73 97 81 77 61 79

Question 6
A Wilcoxon rank sum test for comparing two populations involves two independent samples of sizes
15 and 20. The unstandardized test statistic (that is the rank sum) is T = 210. The value of the
standardized test statistic z is

1. 14.0
2. 10.5
3. 6.0
4. 0.7
5. −2.0
86

5.3 Sign test and Wilcoxon signed rank sum test

STUDY
Keller Chapter 19 Nonparametric statistics
19.2 Sign test and Wilcoxon signed rank sum test
◦ sign test
◦ Wilcoxon signed rank sum test

The sign test

The sign test is the nonparametric test to apply if you want to compare two samples forming matched
pairs of values, provided the data is ordinal and the populations are nonnormal. We say the two
samples are dependent. Typical of this is that one person is tested "before and after", or one person
is asked to make two different observations. Of course, this means that the size of the two dependent
samples will always be equal.
In ordinal data, numbers are often allocated to the different ranked categories, simply because it is
convenient. You were earlier told about a similar argument for nominal data where we could indicate
male =⇒ 1 and female =⇒ 0, because the ’0’ and ’1’ is easier to work with than the words ’female’
and ’male’. Keller explains how rating of a product (ordinal values) can be assigned any numbering
system. Please understand that if numbers are used for this purpose their placement in the number
line is not relevant. They are just symbols - maybe little goodies (, , }, xo, ...) would have been
less confusing, but then less convenient!
The sign test, true to its name, considers only the sign (positive or negative) of the difference between
the pair of observations, and the size of the difference is of no significance. Think of the procedures
to follow in these nonparametric tests as the rules of a game.

For the sign test the rules are as follows:

· Name the one sample 1 and the other one 2.

· Determine the difference between the data value in sample 1 and the data value in sample 2 for
each pair.
· Count the number of positive and number of negative differences and ignore the zero differences.
· The number of positive differences is the value of the test statistic.
· The sample size is the number of pairs with either a positive or a negative difference. (Do not
count the zero differences.)
· If n < 10, use the binomial table with p = 0.5, x = total of positive differences and n = total of
nonzero differences.
· If n ≥ 10, use the normal approximation of the binomial.
· Null hypothesis: the two populations locations are the same.
87 STA1502/1

· Alternative hypothesis: the population locations are different (can be one-or two-sided).

Activity 5.2
Question 1
It is important to sponsors of television shows that viewers remember as much as possible about
the commercials. The advertising executive of a large company is trying to decide which of two
commercials to use on a weekly half-hour sit-com. To help make a decision she decides to have 12
individuals watch both commercials. After each viewing, each respondent is given a quiz consisting
of 10 questions. The number of favourable responses is recorded and listed below. Assume that the
quiz results are not normally distributed.
Quiz Scores
Respondent Commercial 1 Commercial 2
1 7 9
2 8 9
3 6 6
4 10 10
5 5 4
6 7 9
7 5 7
8 4 5
9 6 8
10 7 9
11 5 6
12 8 10
(a) Which test is appropriate for this situation?
(b) Do these data provide enough evidence at the 5% significance level to conclude that the two
commercials differ?

Question 2
In a normal approximation to the sign test, the standardized test statistic is calculated as z = -1.58.
To test the alternative hypothesis that the location of population 1 is to the left of the location of
population 2, the p-value of the test is

1. 0.1142
2. 0.2215
3. 0.0571
4. 0.2284
5. 0.4429
88

The Wicoxon signed rank sum test

If the matched pairs of observations from the two dependent nonnormal populations are interval and
not ordinal, the signed rank sum test of Wilcoxon is the appropriate test to use. Think about this - the
requirements for the sign test and this signed rank sum test are the same except for the type of data.

For the
- sign test the data is ordinal
- signed rank sum test the data is interval

For the signed rank sum test the rules are as follows:

· Name the one sample 1 and the other one 2.

· Determine the difference between the data value in sample 1 and the data value in sample 2 for
each pair. Write these values in a column next to the relevant pair of values.
· ’Throw away’ (ignore) all the pairs where the observations from the two samples were the same
(difference was zero).
· Make another column and in this one you write down the absolute value of the differences. This
means that you ignore the fact that some differences were negative - make them positive.
· Rank this column of absolute values from 1 to n, where n is the number of nonzero differences.
· Now you need two more columns: in the one you rewrite the ranks of the differences that were
originally positive and in the next column you rewrite the ranks of the differences that were
originally negative.
· The value of the test statistic is the same as the total of the ranks of the original positive
differences.
· If n < 30, use Table 10 which lists a lower and upper cut-off value for one or two-tailed tests,
depending on four different significance levels and n = total of nonzero differences.
· If n ≥ 30, use the normal approximation as explained in Keller.
· Null hypothesis: the two population locations are the same.
· Alternative hypothesis: the population locations are different (can be one-or two-sided).

Study the manual computations in example 19.4 and you will find these ’rules of the game’ given
above easy to remember.
89 STA1502/1

Activity 5.3
Question 1
Do question 19.22 in Keller.

Question 2
Do question 19.23 in Keller.

Question 3
In a Wilcoxon signed rank sum test, the test statistic is calculated as T = 91. There are 18 observation
pairs of which 3 have zero differences and a two-tail test is performed at the 5% significance level.
Choose the correct option below:

1. The critical cut-off values are T ≥ TU = 90 and T ≤ TL = 30.

2. The critical cut-off values are T ≥ TU = 131 and T ≤ TL = 40.
3. The null hypothesis is rejected.
4. The null hypothesis will not be rejected.
5. The test results are inconclusive.

Question 4
In a Wilcoxon signed rank sum test with n = 30, the rank sums of the positive and negative
differences are 198 and 165, respectively. The value of the standardized test statistic z is

1. 232.50
2. -0.7096
3. -2.8125
4. 48.6107
5. 0.6425
90

Feedback Feedback

Activity 5.1
Question 1
Answer: 4
The data set is already ranked (we wanted to test something else than ranking)
14, 14, 15, 16, 18, 19, 19, 20, 21, 22, 23, 25, 25, 25, 25, and 28.

Data 14 14 15 16 18 19 19 20 21 22 23 25 25 25 25 28
Ranks 1.5 1.5 3 4 5 6.5 6.5 8 9 10 11 13.5 13.5 13.5 13.5 16

Question 2
Answer: 1
In the discussion about the sampling distribution of the Wilcoxon rank sum test statistic it is stated
that T is approximately normally distributed whenever the sample sizes are larger than 10.

Question 3
Answer: 1
n1 = 5 and n2 = 7. The values for n1 are listed in the first row and those for n2 in the first column.
The statement in the alternative hypothesis about the location of the populations being different does
not imply that the location of population 1 lies to the left or the right of population 2. It is a two-tailed
statement. The appropriate critical values at the 5% (two-tailed) significance level are 20 and 45.

Question 4
Answer: 2

Ranked data 16 17 19 22 27 31 34 37 40 47
Ranks 1 2 3 4 5 6 7 8 9 10

Total ranks of Sample A: 1 + 2 + 3 + 4 + 10 = 20

Total ranks of Sample B: 5 + 6 + 7 + 8 + 9 = 35

Question 5
(This is not a multiple choice question.)
Point A 74 61 73 67 80 89
Point B 90 73 97 81 77 61 79

Ranked data 61 61 67 73 73 74 77 79 80 81 89 90 97
Ranks 1.5 1.5 3 4.5 4.5 6 7 8 9 10 11 12 13

Total ranks of Sample A: 6 + 1.5 + 4.5 + 3 + 9 + 11 = 35

Total ranks of Sample B: 12 + 4.5 + 13 + 10 + 7 + 1.5 + 8 = 56
91 STA1502/1

Sample A has the smallest total, so the test statistic is equal to 35.
If we are only testing for a "difference" in the data from the two points, it is a two-sided test. From
Table 9(b) the limits for n1 = 6 and n2 = 7 are 30 and 54. The test statistic of 35 falls between
these limits, so the null hypothesis cannot be rejected at the 10% level. We conclude that the median
number of persons talking on their cell phones while driving could be the same at points A and B.

Question 6
Answer: 5
This answer is simply substitution into formulae.
u
n1 (n1 + n2 + 1) n1 n2 (n1 + n2 + 1)
E(T ) = σT =
2 12
u
15(15 + 20 + 1) 15 · 20(15 + 20 + 1)
= =
2 12

= 270 = 30

T − E(T )
z =
σT

= −2.0

Activity 5.2
Question 1
Quiz Scores
Respondent Commercial 1 Commercial 2 Difference
1 7 9 -2
2 8 9 -1
3 6 6 0
4 10 10 0
5 5 4 1
6 7 9 -2
7 5 7 -2
8 4 5 -1
9 6 8 -2
10 7 9 -2
11 5 6 -1
12 8 10 -2

(a) The appropriate test for this situation is the sign test.
(b) Do these data provide enough evidence at the 5% significance level to conclude that the two
commercials differ?
92

ANSWER:

H0 : The two population locations are equal.

H1 : The two population locations are not equal.

Rejection region: |z| > z0.025 = 1.96 (two-sided test)

χ − 0.5n 1 − 0.5 (10) −4

Test statistic: z = √ = √ = = −2.53
0.5 n 0.5 10 1.5811

Two cells have zeros and are not counted for the sample size. Therefore n = 10 and x = 1 (only one
plus).
Conclusion: Reject the null hypothesis. Yes, these data provide enough evidence at the 5%
significance level to conclude that the two commercials differ.

Question 2
The standardized test statistic is calculated as z = -1.58. The p-value should then be such that
p-value: P (z < −1.58) = P (z > 1.58) = 1 − 0.9429 = 0.0571
Answer: 3

Activity 5.3
Question 1
H0 : The two population locations are the same.
H1 : The location of population 1 is to the right of the location of population 2.

T = T + = 3457
u
n(n + 1) n(n + 1)(2n + 1)
E(T ) = σT =
4 24
108 · 109 √
= = 106438.5
4

= 2943 = 326.25

Rejection region:
T − E(T ) 3457 − 2943
z= = = 1.5754
σT 326.25

p−value = P (Z > 1.58) = 0.5 − 0.4429 = 0.0571.

There is not enough evidence to conclude that population 1 is located to the right of the location of
population 2.
93 STA1502/1

Question 2
H0 : The two population locations are the same.
H1 : The location of population 1 is different from the location of population 2.

Rejection region: T ≥ TU = 19 or T ≤ TL = 2.
Pair Sample 1 Sample 2 Difference |Difference| Ranks
1 9 5 4 4 5.5
2 12 10 2 2 3.5
3 13 11 2 2 3.5
4 8 9 -1 1 1.5
5 7 3 4 4 5.5
6 10 9 1 1 1.5
Totals T + = 19.5 T − = 1.5
T = 19.5. There is enough evidence to infer that the population locations differ.

Question 3
Answer: 4
The value of the test statistic is calculated as T = 91, therefore the test statistic lies inside the ’safe’
region of [25, 95]. for a two-tailed test at the 5% significance level. The null hypothesis is therefore
not rejected.

Question 4
Answer: 2

T = T + = 198
u
n(n + 1) n(n + 1)(2n + 1)
E(T ) = σT =
4 24
30 · 31 √
= = 2363.75
4

= 232.5 = 48.6184

Rejection region:
T − E(T ) 198 − 232.5
z= = = −0.7096
σT 48.6184
94

Summary of the different tests

Summary of tests on data from a normal or approximately normal distribution
Ascertain yourself with the

· flow chart of different techniques applied in inference as set out in Figure A13.1
· summary of the different statistical techniques for nominal data given in Table 15.1

Deciding which test to use is the task of the statistician in practice and it is our aim to supply you
with the tools to make such a decision. This is not always so straightforward as it may seem and
that is why Keller gave us the flow chart in Figure A13.1. The significance of data type is obvious,
but note how the study objective gives direction. Even now, while you are still studying, make a point
of looking at published statistical information and determine if it involves "lying with statistics" or not.
Two tables made from the information in the above-mentioned summaries are given below. Look at
them, but try to make your own. Making such a summary is a very valuable method of studying.

Parameter/ Descriptive Statistical

Data type Problem objective
Categories measure technique
2 categories Describe a
Nominal proportion z -test
p population
Describe a
Nominal ≥ 2 categories χ2 goodness-of-fit
population
χ2 contingency
2 categories Compare two
Nominal proportions table
p1 − p2 populations
z test
Analyze relationship
χ2 contingency
Nominal ≥ 2 categories between two
table
variables
Single normal
Interval µ(σ known) population Central location z -test
(or n ≥ 30)
Describe a
Interval µ(σ unknown) population Central location t-test
(normal or n ≥ 30)
2 Describe a
Interval σ Variability χ2 -test
population
Compare two
µ1 − µ2
Interval populations Central location t-test
(inpendent)
(difference of means)
Compare two
µD
Interval populations Central location t-test
(matched)
(difference of pop. means)
Compare two
σ21
Interval σ22
populations Variability F -test
(ratio of two variances)
95 STA1502/1

Parametric versus non-parametric tests

At this stage you should be quite familiar with both parametric and nonparametric tests. The table
below lists some obvious similarities and differences between the two types of tests.

Parametric testing Nonparametric testing

Basic principles of hypothesis testing apply Basic principles of hypothesis testing apply

Population must be normally or Population need not be normally

approximately so distributed distributed or approximately so

Sample size need to be large Sample size can be very small

Calculations can become very tedious Calculations usually simpler

because of large sample sizes because of small sample sizes

Data dependent on specific test Data dependent on specific test

One sample: One sample:

t−test Wilcoxon signed rank test

Two independent samples: Two independent samples:

t−test Wilcoxon rank sum test

Two dependent samples: Two dependent (paired) samples:

t−test for paired samples Wilcoxon signed rank sum test
96

STUDY UNIT 6
6.1 Time series analysis and forecasting
In this study unit we will, as Keller explains it in clear language "only scratch the surface of this topic".
Time series and forecasts based on time series are very relevant and significant in modern times. At
first year level, fortunately these concepts are simple and easy to explain. If you sit and think about
it, you can make a long list of events that you can observe at regular time intervals. If you drive to
work in a car or taxi or train, you can record the traffic every first day of the month, or every Friday,
or every day of the week, or...; if you have a favourite take-away food store you can record the length
of the queue at regular time intervals; an obvious example is to record the monthly rainfall at your
home. The list never ends, as government bodies, researchers, economists, etc. all record different
phenomena over short and long periods of time. These scores, collected at regular time intervals
are known as time series.
The question is – what do we do with the time series? Do you record the data simply to look at it, is
it just for the sake of fun, or what? As statisticians we are going to teach you how to look at, interpret
and even ’smooth’ the time series data, but is that the end of the process? That would have been a
sad day if everything stopped just there! The point is that what we observe as a pattern in the past
could well be repeated in the future and therefore a technique has been developed where the data
of a time series is used and the characteristics of that particular phenomenon is used to predict what
can be expected in the future. Of course, statisticians are always very careful not to say that anything
is certain (think in terms of hypothesis testing!), so they use models in their predictions. We will only
look at three elementary models, but there are many other models, some of which are much more
complex.

6.2 Components of time series and

smoothing possibilities

STUDY
Keller Chapter 20 Time series analysis and forecasting
20.1 Time series components

Keller clearly explains the characteristics of long term trend, cyclic, seasonal and random variation as
well as graphs to illustrate the first three components. Random variation (sometimes called ’noise’)
can camouflage the effects of other components in a time series to a great extent and it is important
97 STA1502/1

to minimize their effect. Why? Even if they can significantly influence the time series data (think of
war or a hurricane, or..) they are irregular happenings and their influence should be temporary. It
is therefore necessary that short term fluctuation be ’removed’ from the data using a technique of
smoothing.

Of course, one must make sure that it is really a random happening and we hope that analysts do
think! In a war-torn country or a region known for hurricane occurrences, such events cannot be
considered as irregular. They then form an inherent part of the time series ’pattern’.

6.3 Smoothing techniques

STUDY
Keller Chapter 20 Time series analysis and forecasting
20.2 Smoothing techniques
◦ Moving avarages
◦ Centred moving averages
◦ Exponential smoothing

The first technique of smoothing is to determine moving avarages. Remember that the data
points in a time series are consecutive values, i.e. they are ordered. The idea of an average
is nothing new and in this case you substitute the actual observations of a time series with a
list of averages. You can compute a three-period moving average, which is the average of three
consecutive observations or you can compute a four-period moving average, which is the average
of four consecutive observations, etc. Make sure that you understand how these three, or four, or
... moving averages are calculated. In a three-period moving average each observation (except the
first and last values) are part of three averages.

Suppose we have real observations indicated as A, B, C, D E, F and G, then the three, four and
five-period moving averages would be as follows:
98

Actual Three-period Four-period Five-period

observation moving average moving average moving average
A
A+B+C
B
3
A+B+C +D
4
B+C+D A+B+C +D+E
C
3 5
B+C +D+E
4
C+D+E B+C +D+E+F
D
3 5
C +D+E+F
4
D+E+F C +D+E+F +G
E
3 5
D+E +F +G
4
E+F +G
F
3
G

Note that for these 7 values A, B, ...you could calculate

- 5 three-period moving averages

- 4 four-period moving averages
- 3 five-period moving averages

See if common sense leads you to the following pointers:

· By smoothing observations information is lost.

· The more periods you include in the average, the smoother the graph becomes.
· However, the more periods you include in the average, the fewer observations you have left.
· Smoothing with the method of moving averages removes the random variation but must be
balanced against the importance of maintaining the real character of the time series.

Exponential smoothing
This method is mathematically more complex, but still a ’relatively crude method’ to remove random
variation. However, it removes two of the concerns mentioned above when the method of moving
averages is used for smoothing out random variation. These are the following:

· With every calculation all the observations up to that particular observation form part of the
calculation, in other words give weight to the answer.
· The smoothing process starts from the very first observation and continues up to the very last
observation.
99 STA1502/1

The formula given may look a little complex, but with constant use it is manageable. Application of
the formula smooths values by calculating a weighted average of each observation in the series and
the previously already smoothed observation. The smoothing constatnt w is a number between 0 and
1 and seeing that w is multiplied by the actual observation yt (at time t), you should understand that
the closer w is to 1 the more influence the actual observation y will have. That is the sort of decision
the statistician has to make. Choosing the value of w will therefore depend on the importance of the
actual observations.
Keep in mind that you will receive a list of formulas in the examination. You simply have to recognize
which formula to use where and to know the meaning of the different symbols.

Activity 6.1
Question 1
Test your knowledge.
Link each of the descriptions below to one of the four time series components (long-term trend,
cyclic, seasonal or random variation):

1. The time series component that reflects a long-term, relatively smooth pattern or direction
exhibited by a time series over a long time period (more than one year)
2. The time series component that reflects variability over short repetitive time periods and has
duration of less than one year
3. The time series component that reflects the irregular changes in a time series that are not
caused by any other component, and tends to hide the existence of the other more predictable
components
4. The time series component that reflects a wave-like pattern describing a long-term trend that is
generally apparent over a number of years

Question 2
In exponentially smoothed time series, the smoothing constant w is chosen on the basis of how much
smoothing is required. In general, which of the following statements is true?

1. A small value of w such as w = 0.1 results in very little smoothing, while a large value such as
w = 0.8 results in too much smoothing.
2. A small value of w such as w = 0.1 results in too much smoothing, which a large value such as
w = 0.8 results in very little smoothing.
3. A small value of w such as w = 0.1 and a large value such as w = 0.8 may both result in very little
smoothing.
100

4. A small value of w such as w = 0.1 and a large value such as w = 0.8 may both result in too much
smoothing.
5. It is impossible to have too much or too little smoothing, regardless of the value of w.

Question 3
Monthly sales (in R11,000) of a computer store are shown below.

Month Jan Feb March April May June

Sales 73 65 72 82 86 90

Compute the three-month and five-month moving averages.

________________________________________________________________________

6.4 Trend and seasonal effects

STUDY
Keller Chapter 20 Time series analysis and forecasting
20.3 Trend and seasonal effects
◦ Trend analysis
◦ Seasonal analysis
◦ Deseasonalizing a time series

Once you can see that there is a trend in a time series, you have to determine what the ’nature’ of the
trend is. This we do using mathematics. Do you remember the following from school mathematics?

· A polynomial has many terms (from the prefix ’poly-’)

· A linear equation is of the first power. The regression equation ŷ = b0 + b1 x, is an example of a
linear relationship between x as independent variable and y as dependent variable. In time series
data x will always indicate time.
· A nonlinear equation is of a power greater than 1 and this is where the polynomial comes in. The
equation ŷ = b0 + b1 x + b2 x2 is quadratic; ŷ = b0 + b1 x + b2 x2 + b3 x3 is of the third power.

At this stage you should know enough about the possibility to fit a regression line through given data
and also about the principles involved in such a method. Now, in time series analysis to determine if
there is a trend in the data, such a fitted line can assist you in seeing if there is a trend in the data.
The ŷ then becomes the trend line estimate of the y of the regression model y = β 0 + β 1 t + ε. The
slope of the line indicates the trend. If the slope is positive, you know the trend is positive and the
larger the numerical value of the slope the larger the positive trend.
101 STA1502/1

These arguments about a graph assisting us to find trend in a time series apply if the relationship is
nonlinear. Should a quadratic model be needed to fit the time series, the trend equation relies on the
multiple regression technique (not included in this module).

Seasonal analysis and deseasonalizing a time series

To detect seasonality in a time series, several ’seasons’ must be observed. Seasonal index can be
calculated and used to either inflate or deflate the trend in the series. Depending on the choice, it will
either express the degree to which the seasons differ from one another or it can be used to remove
the seasonal variation. The purpose of removing the seasonality is that other changes in the series
can then be detected. This has many benefits, especially in forecasting.

Activity 6.2
Question 1
Do Question 20.24 in Keller.

Question 2
The Pyramid of Giza is one of the most visited monuments in Egypt. The number of visitors per
quarter has been recorded (in thousands) as shown in the accompanying table:

Year
Quater 2000 2001 2002 2003
Winter 210 215 218 220
Spring 260 275 282 290
Summer 480 490 505 525
Autumn 250 255 265 270
(a) Plot the time series.
(b) Discuss your observations. Would exponential smoothing be recommended for this data?
102

6.5 Introduction to forcasting

STUDY
Keller Chapter 20 Time series analysis and forecasting
20.4 Introduction to forecasting

How accurate is my forecast? This is a question the statistician has to ask him/herself, as there is
a variety of forecasting models available. What can we do to evaluate the accuracy of a forecasting
procedure? We are going to consider the following two measures of accuracy

· Mean Absolute Deviation (MAD). This is a measure of the consistency of moderately accurate
forecasts. The interest is in the size of the error, not the direction, and one chooses the model
with the lowest mean value for the error as the best-fit model.
· Sum of Squares for Forecast Error (SSE). This measure shows how close the forecasts are to the
actual values. This criterion chooses the model with the lowest mean value for the squared errors
(compare this to the least-squares criterion when you determine the regression equation).

Formulas for both MAD and SSE are given in Keller. There is also a worked out example where three
forecasting models are subjected to these measures. These criteria are very useful if you fit more
than one model to the same time series

6.6 Forcasting models

STUDY
Keller Chapter 20 Time series analysis and forecasting
20.5 Forecasting models
◦ Forecasting with exponential smoothing
◦ Forecasting with seasonal indexes

The selected model for forecasting a time series is determined by the components present in the
recorded time series. The choice of model is therefore based on measures of accuracy and precision.
In general, the method used in the particular smoothing method can give you an indication of the type
of forecast. If you think about the method applied in exponential smoothing, you can imagine that for
a time series with a small positive trend, the forecast will be too low and if there is a small negative
trend, the forecast will tend to be too high.
103 STA1502/1

A proper analysis of the given data must underlie the choice and you have to realize that one should
not try to forecast too far in the future as the accuracy decreases with each additional time frame
added.

At first-year level we only introduce you to forecasting and expect you to understand three relatively
elementary forecasing models: Exponential and seasonal models will be easy for you to understand.
Should you feel uncertain about the autocorrelation model, it may be necessary for you to read the
section on Nonindependence of the Error Variable in Chapter 16 again.

A broad outline of the three models follows:

Forecasting
Conditions Forecasting Action
model
Smoothing
Preferably used constant
No trend
Exponential for one time Assume initial
No exponential smoothing
smoothing period forecast forecast
No seasonal variation
but can be more Substitute St
with Ft+1
Regression
equation
Preferably one
Seasonal Long-term trend is used as
season but
indexes Seasonal variation well as
can be more
seasonal index
for period t
Based on
Can be complex
correlation of
Autocorrelation if the time
Autoregressive consecutive
No trend series values are
model terms (first
No seasonality themselves
order
correlated
autocorrelation)
104

Activity 6.3
Question 1
Do question 20.32 in the textbook.

Question 2
The following is the list of mean absolute deviation (MAD) statistics for each of the models you have
estimated from time-series data:

Model MAD
Linear trend 1.38
Quadratic trend 1.22
Exponential trend 1.39
Autoregressive 0.71

Based on the MAD criterion, the most appropriate model is

1. linear trend
2. quadratic trend
3. exponential trend
4. autoregressive
5. not possible to answer

Feedback Feedback

Activity 6.1
Question 1

1. long-term trend
2. seasonal variation
3. random variation
4. cyclical variation
105 STA1502/1

Question 2
Answer: 2

Question 3
Month Sales Moving averages
Three-month Five-month
Jan 73
Feb 65 70
March 72 73 75.6
April 82 80 79.0
May 86 86
June 90

Activity 6.2
Question 1
y
Year Quater Period t y ŷ ŷ
2001 1 1 52 62.9 0.827
2 2 67 64.1 1.046
3 3 85 65.2 1.303
4 4 54 66.4 0.813
2002 1 5 57 67.6 0.843
2 6 75 68.8 1.090
3 7 90 70.0 1.286
4 8 61 71.1 0.857
2003 1 9 60 72.3 0.830
2 10 77 73.5 1.048
3 11 94 74.7 1.259
4 12 63 75.9 0.830
2004 1 13 66 77.0 0.857
2 14 82 78.2 1.048
3 15 98 79.4 1.234
4 16 67 80.6 0.831

Quarter
1 2 3 4 Total
2001 0.827 1.046 1.303 0.813
2002 0.843 1.090 1.286 0.857
2003 0.830 1.048 1.259 0.830
2004 0.857 1.048 1.234 0.831
Average 0.839 1.058 1.271 0.833 4.001
Seasonal index 0.839 1.058 1.270 0.833 4.000
106

Question 2
(a)

Pyramids of Egypt 2000-2003 Data

600
500
Number of Visitors

400
300
200
100
0
2000 2001 2002 2003
Year

We note a distinct pattern of seasonal variation in the series. This could have been detected in
the data, but in the graph one can see it without even thinking!
(b) Exponential smoothing is a method to remove the random variation in a time series and makes
it easier to detect the trend. In the further discussions you will see that exponential smoothing is
not an accurate forecasting method if the time series has clear seasonal effects.

Activity 6.3
Question 1
|57 − 63| + |60 − 72| + |70 − 86| + |75 − 71| + |70 − 60|
M AD =
5
6 + 12 + 16 + 4 + 10
=
5
48
=
5

= 9.6.

SSE = (57 − 63)2 + (60 − 72)2 + (70 − 86)2 + (75 − 71)2 + (70 − 60)2

= 36 + 144 + 256 + 16 + 100

= 552.

Question 2
Answer: 4
107 STA1502/1

Learning Outcomes
Use the chapter summary as a checklist to see if you have mastered the knowledge in this chapter
after you have completed this study unit to evaluate if you have really acquired a good understanding
of the work covered.

Can you

• list and understand principles involved in the general procedures when applying chi-squared
testing?
• apply your knowledge of the chi-square test, for nominal scale variables, to describe a single
population and/or to determine the relationship between two populations?
• apply non-parametric statistical tests?
• employ the Wilcoxon rank sum test, the sign test and the Wilcoxon signed rank sum test to
compare two populations of ordinal data?
• analyse the relationship between two interval variables using simple linear regression?
• explain and decompose the components of a time series?
• explain how trend and seasonal variation are measured?
• describe exponential smoothing, seasonal indexes and the autoregressive model for forecasting
in time series?

References
Keller, Gerald et al. (2005) Instructor’s Suite CD for the Student Edition of Statistics for Management
and Economics, Belmont, CA USA: Duxbury, Thomson.
Weiers, Ronald M. (2005) Introduction to Business Statistics, Brooks/Cole, Duxbury, Thomson.

Statistical Analysis With Software Application Bsa PDF
100% (5)
Statistical Analysis With Software Application Bsa PDF
112 pages
On Case Study Method of Teaching
No ratings yet
On Case Study Method of Teaching
36 pages
SYLLABUS Statistics For Business and Economics
No ratings yet
SYLLABUS Statistics For Business and Economics
17 pages
Onga'nya 24
No ratings yet
Onga'nya 24
23 pages
Agricultural Statistics and Biometry (Agr 304) - 2021.2022
No ratings yet
Agricultural Statistics and Biometry (Agr 304) - 2021.2022
11 pages
Course - Outline - STA-401 Statistical Inference Spring 2025
No ratings yet
Course - Outline - STA-401 Statistical Inference Spring 2025
9 pages
Analysis of Legal Case Document Automated Summarizer
No ratings yet
Analysis of Legal Case Document Automated Summarizer
6 pages
Solution Manual For Essentials of Statistics For The Behavioral Sciences, 9th Edition, Frederick J Gravetter, Larry B. Wallnau, Lori-Ann B. Forzano Download
100% (2)
Solution Manual For Essentials of Statistics For The Behavioral Sciences, 9th Edition, Frederick J Gravetter, Larry B. Wallnau, Lori-Ann B. Forzano Download
45 pages
Ericsson India Private Limited VS Reliance Telecom Limited NCLT MUMBAI
No ratings yet
Ericsson India Private Limited VS Reliance Telecom Limited NCLT MUMBAI
30 pages
Implementation and Analysis of Smart Lamp Using An
No ratings yet
Implementation and Analysis of Smart Lamp Using An
4 pages
STA1502 - Study Guide
No ratings yet
STA1502 - Study Guide
182 pages
Bsem 26 - Chapter 1 1 10
No ratings yet
Bsem 26 - Chapter 1 1 10
10 pages
Merged Presentation 8614
No ratings yet
Merged Presentation 8614
290 pages
Module 8and 9 Stat 2022
No ratings yet
Module 8and 9 Stat 2022
20 pages
Module 1 Introduction To Statistics
No ratings yet
Module 1 Introduction To Statistics
5 pages
Block 1
No ratings yet
Block 1
160 pages
Lesson 1 Basics of Statistics PDF
No ratings yet
Lesson 1 Basics of Statistics PDF
66 pages
Reflective Essay 1
No ratings yet
Reflective Essay 1
2 pages
Freelnace Programmer and Ethical Hacking Know How
No ratings yet
Freelnace Programmer and Ethical Hacking Know How
7 pages
Unit 1 AP World History Powerpoint
No ratings yet
Unit 1 AP World History Powerpoint
55 pages
JD Science Physic Teacher
No ratings yet
JD Science Physic Teacher
4 pages
UP Statistics Lecture
100% (1)
UP Statistics Lecture
102 pages
Module 1 Intro To Stat
No ratings yet
Module 1 Intro To Stat
27 pages
Ynspire Magazin-1-23 EN
No ratings yet
Ynspire Magazin-1-23 EN
48 pages
RSPile Tutorials - 1 - Axially Loaded Piles
No ratings yet
RSPile Tutorials - 1 - Axially Loaded Piles
14 pages
NZ Pa 36 New Zealand Numeracy Stages 1 To 8 Weekly Planning Template English Ver 2
No ratings yet
NZ Pa 36 New Zealand Numeracy Stages 1 To 8 Weekly Planning Template English Ver 2
12 pages
LCD TV: Service Manual
No ratings yet
LCD TV: Service Manual
51 pages
MODULE 1 Stat by PSA
No ratings yet
MODULE 1 Stat by PSA
23 pages
Tendernotice - 1 (5) - 1
No ratings yet
Tendernotice - 1 (5) - 1
4 pages
Simple Carburetor Operation
100% (2)
Simple Carburetor Operation
6 pages
Statistics For People Who Think They Hate Statistics 7th Edition Textbook
No ratings yet
Statistics For People Who Think They Hate Statistics 7th Edition Textbook
25 pages
All About Aiims Mbbs
No ratings yet
All About Aiims Mbbs
23 pages
Timber Formwork Design
No ratings yet
Timber Formwork Design
12 pages
Order Now Whatsapp: Course: Teacher Education in Pakistan (8626) Semester: Spring, 2023 Level: B.Ed. (1.5 Years)
No ratings yet
Order Now Whatsapp: Course: Teacher Education in Pakistan (8626) Semester: Spring, 2023 Level: B.Ed. (1.5 Years)
14 pages
Statics Mahadeo
100% (1)
Statics Mahadeo
347 pages
One-Way ANOVA: (Independent Group and Repeated Measures)
No ratings yet
One-Way ANOVA: (Independent Group and Repeated Measures)
36 pages
Stastics
No ratings yet
Stastics
226 pages
Stat For Comp (CH 1-5)
No ratings yet
Stat For Comp (CH 1-5)
54 pages
STATS Studyguide
No ratings yet
STATS Studyguide
157 pages
Department of Statistics
No ratings yet
Department of Statistics
145 pages
ssc201 Lecture Note-1
No ratings yet
ssc201 Lecture Note-1
62 pages
SEM & Confidence Interval
No ratings yet
SEM & Confidence Interval
39 pages
LR 1 Intro
No ratings yet
LR 1 Intro
24 pages
Plastic University MCQ Merged
No ratings yet
Plastic University MCQ Merged
13 pages
MS 14L1 Introduction To Statistics
No ratings yet
MS 14L1 Introduction To Statistics
30 pages
Introduction To Statistics 1662031282
100% (1)
Introduction To Statistics 1662031282
936 pages
Chapter-1 Data Analysis
No ratings yet
Chapter-1 Data Analysis
14 pages
chainOfResponsibility Example3
No ratings yet
chainOfResponsibility Example3
12 pages
Objection Deadline: March 20, 2012 at 4:00 P.M. (ET) Hearing Date: April 5, 2012 at 10:00 A.M. (ET)
No ratings yet
Objection Deadline: March 20, 2012 at 4:00 P.M. (ET) Hearing Date: April 5, 2012 at 10:00 A.M. (ET)
32 pages
CCW Basics and The Micro 830
No ratings yet
CCW Basics and The Micro 830
52 pages
Stat 334 Course Outline 2021-2022
No ratings yet
Stat 334 Course Outline 2021-2022
3 pages
21UGYS01 - Mapping Techniques
No ratings yet
21UGYS01 - Mapping Techniques
109 pages
Motor, Filter, Kühlsystem Und Auspuff
No ratings yet
Motor, Filter, Kühlsystem Und Auspuff
18 pages
1 - Basic Concepts
No ratings yet
1 - Basic Concepts
71 pages
Stat 1: Prepared by
No ratings yet
Stat 1: Prepared by
60 pages
Business Statistics Outline
No ratings yet
Business Statistics Outline
5 pages
Syllabus For STA 2023 - Introduction To Statistics: Spring 2017 - ONLINE Instructor Information
No ratings yet
Syllabus For STA 2023 - Introduction To Statistics: Spring 2017 - ONLINE Instructor Information
15 pages
Syllabus Intro Statistics 2024-2025
No ratings yet
Syllabus Intro Statistics 2024-2025
4 pages
Math 231
No ratings yet
Math 231
88 pages
Sta 111 Nursing Notes
No ratings yet
Sta 111 Nursing Notes
36 pages
Introduction To Statistics Hand Out 2022 Alebachew A.
No ratings yet
Introduction To Statistics Hand Out 2022 Alebachew A.
41 pages
What Is Statistics
No ratings yet
What Is Statistics
5 pages
Stat 20053 Statistical Analysis With Software Application PDF
100% (1)
Stat 20053 Statistical Analysis With Software Application PDF
141 pages
Inferential Statistics
100% (1)
Inferential Statistics
38 pages
Probability and Statistics 2022 Hand Out
No ratings yet
Probability and Statistics 2022 Hand Out
34 pages
5 Measures of Central Tendency
No ratings yet
5 Measures of Central Tendency
61 pages
Supermarket
No ratings yet
Supermarket
4 pages
Chapter One: Definition of Statistics: The Word "Statistics" Has Different Meanings To Different Person's .When
No ratings yet
Chapter One: Definition of Statistics: The Word "Statistics" Has Different Meanings To Different Person's .When
30 pages
PCC 0410 Twincat3 e
No ratings yet
PCC 0410 Twincat3 e
4 pages
Chapter 1 3
No ratings yet
Chapter 1 3
55 pages
Semis and Finals MMW
No ratings yet
Semis and Finals MMW
40 pages
Introduction To Statistics Walpole, Ronald E 1974 New York, Macmillan
No ratings yet
Introduction To Statistics Walpole, Ronald E 1974 New York, Macmillan
368 pages
What Is Statistics1
No ratings yet
What Is Statistics1
20 pages
Basic Statistics2222
No ratings yet
Basic Statistics2222
52 pages
Matrix of Activities DSPC 1
No ratings yet
Matrix of Activities DSPC 1
4 pages
Chpt1 4
No ratings yet
Chpt1 4
19 pages
Chapter 1
No ratings yet
Chapter 1
22 pages
Nota
No ratings yet
Nota
47 pages
Lecture Note On Basic Business Statistics - I Mustafe Jiheeye-1
No ratings yet
Lecture Note On Basic Business Statistics - I Mustafe Jiheeye-1
81 pages
Lesson 1 Stats
No ratings yet
Lesson 1 Stats
5 pages
Sataticis PDF
No ratings yet
Sataticis PDF
25 pages
Statistics by Henry Garret
No ratings yet
Statistics by Henry Garret
504 pages
Gr11 P2 ECO June 2024 Question Paper - 125612
100% (1)
Gr11 P2 ECO June 2024 Question Paper - 125612
13 pages
CamScanner 10-08-2021 18.03
No ratings yet
CamScanner 10-08-2021 18.03
16 pages
BHRM 242 - Collection, Organisation and Presentation of Data
No ratings yet
BHRM 242 - Collection, Organisation and Presentation of Data
13 pages
Raghuvamsa CantoV English Meaning
No ratings yet
Raghuvamsa CantoV English Meaning
69 pages
Bus 172
No ratings yet
Bus 172
17 pages
Statistics II for Dummies
From Everand
Statistics II for Dummies
Deborah J. Rumsey
3.5/5 (31)
Fundamentals of Biostatistics for Public Health Students
From Everand
Fundamentals of Biostatistics for Public Health Students
S. Mantravadi, PhD
No ratings yet
Quant Developers' Tools and Techniques: Quant Books, #2
From Everand
Quant Developers' Tools and Techniques: Quant Books, #2
Manfred Hindering
No ratings yet

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.